Development/AWS

Building Resilient Serverless Architecture: SQS-Lambda Asynchronous Processing Pattern

kozylife 2025. 7. 7. 19:50

When developing serverless applications, developers often face a critical dilemma: choosing between fast response times and system reliability. While some features require real-time responses, there are situations where safely processing requests without losing them is far more important than response speed.

Today, I'll share the SQS-Lambda asynchronous processing pattern that I chose to enhance system reliability in a real application I developed, and discuss when you should consider using this pattern from a practical perspective.

When Reliability Matters More Than Speed

In serverless application operations, you might encounter these scenarios:

Risk of Request Loss During Traffic Spikes

Lambda functions have concurrency limits. When sudden traffic surges exceed these limits, requests can fail. Unexpected traffic spikes from marketing campaigns or media coverage can happen at any time.

Data Loss Risk from Network Instability

When users submit requests from mobile environments or unstable networks, timeouts can cause valuable user data to disappear. Users think they've submitted their request, but it never actually reached the server.

Business-Critical Requests

Imagine a scenario where "a customer clicks the sign-up button but their account isn't created." Such incidents severely damage customer trust and directly impact business. Payment processing, order submissions, and user registrations are requests that absolutely cannot be lost.

My Real Experience: Why I Chose the SQS-Lambda Pattern

In the application I developed, preventing request loss was far more important than Lambda response speed. Particularly for core features like user creation, the requirement was "it doesn't need to be processed quickly, but it must be processed reliably."

This led me to adopt a design philosophy of "receive safely first, then process reliably later." I chose the SQS-Lambda asynchronous processing pattern to prevent request loss by immediately storing user requests in SQS queues, allowing Lambda to process them reliably.

SQS-Lambda Asynchronous Processing Architecture

The core architecture of this pattern is:

User → WAF → CloudFront → API Gateway → SQS Queue → Lambda
                 ↓                           ↓
           S3 (Static Files)          Dead Letter Queue

Request Processing Flow

  1. Immediate Receipt: API Gateway immediately receives user requests and stores them in SQS queue
  2. Safe Storage: SQS safely stores messages (up to 14 days)
  3. Asynchronous Processing: Lambda functions retrieve messages from the queue to perform actual business logic
  4. Failure Handling: On processing failure, messages remain in SQS and become available for reprocessing after visibility timeout; moved to Dead Letter Queue when maxReceiveCount is exceeded

SQS-Lambda Reprocessing Mechanism

"Retry" in the SQS-Lambda pattern isn't a native Lambda service feature, but rather occurs through SQS message visibility characteristics:

  1. Lambda Execution Failure → Message remains in SQS in an "invisible" state rather than being deleted
  2. Visibility Timeout Expires → Message becomes "visible" again in the queue
  3. Another Lambda Instance Reprocesses → A new Lambda picks up and attempts to process the message
  4. maxReceiveCount Exceeded → Messages automatically move to DLQ when the configured maximum receive count is exceeded

Core Components

WAF (Web Application Firewall)

  • Security layer functionality
  • Rate limiting (e.g., 2000 requests per 5 minutes)
  • Blocks DDoS, SQL injection, and other attacks

CloudFront + S3

  • Global caching for static files (HTML, CSS, JS)
  • Fast frontend loading

API Gateway

  • Entry point for all API requests
  • Immediate response for improved user experience

SQS (Simple Queue Service)

  • Safe message storage
  • Batch processing (up to 10 messages per Lambda invocation)
  • Visibility timeout: 60 seconds (reprocessing wait time on Lambda failure)
  • maxReceiveCount: 3 (maximum processing attempts before DLQ movement)

Lambda Functions

  • Actual business logic processing
  • Auto-scaling
  • Messages become reprocessing candidates in SQS on execution failure

Dead Letter Queue

  • Stores messages when maxReceiveCount (maximum receive count) is exceeded
  • Repository for manual processing or debugging

Real Implementation Case: User Creation System

Let me walk you through the user creation system I implemented:

Step 1: Immediate Receipt Response

// Immediate response from API Gateway
{
  "statusCode": 202,
  "body": {
    "message": "Request received. Processing in progress.",
    "requestId": "uuid-1234"
  }
}

Users immediately receive a "receipt confirmed" response, confirming their request was successfully transmitted.

Step 2: Safe Message Storage in SQS

{
  "userId": "new-user-123",
  "email": "user@example.com",
  "userData": { ... },
  "timestamp": "2025-07-06T10:00:00Z"
}

Step 3: Actual User Creation in Lambda

Lambda functions process messages to:

  • Save user information to database
  • Send email verification
  • Perform necessary follow-up tasks

Step 4: Safety Net for Failures

If errors occur during processing:

  • Lambda execution failure leaves messages in SQS
  • Messages become available for reprocessing after visibility timeout expires
  • Messages move to Dead Letter Queue when maxReceiveCount is exceeded
  • Administrators manually review and process DLQ messages

Benefits of This Reliability-Focused Architecture

1. Request Loss Prevention

The biggest advantage is that user requests never disappear. Once API Gateway receives a request and stores it in SQS, the message remains safe even if subsequent system failures occur.

2. Traffic Spike Resilience

SQS acts as a buffer, enabling stable responses to sudden traffic surges. Lambda processes messages gradually according to its processing capacity.

3. Cost Efficiency

  • Lambda functions only execute when processing messages, saving costs
  • CloudFront caching reduces origin requests
  • SQS charges only for message operations
  • Zero idle infrastructure costs

4. Scalability

  • API Gateway: Handles 10,000+ concurrent requests per second
  • SQS: Unlimited message throughput
  • Lambda: Auto-scales based on queue depth
  • CloudFront: Global edge caching

5. Monitoring and Observability

Key monitoring metrics:

  • Queue Depth (ApproximateNumberOfVisibleMessages): Number of waiting messages
  • Processing Lag (ApproximateAgeOfOldestMessage): Wait time of the oldest message
  • Dead Letter Queue: Messages requiring manual intervention
  • Lambda Errors: Processing failures and throttling frequency

When to Use This Pattern

Suitable Use Cases

E-commerce Platforms

  • Order processing, payment confirmation, and other transactions that cannot be lost
  • Variable traffic (sales events, marketing campaigns)

User Registration/Onboarding Systems

  • Sign-ups, profile creation, and other critical user data
  • Follow-up tasks like email verification and welcome messages

File Upload and Processing Workflows

  • Time-consuming tasks like image resizing and document conversion
  • Large file processing

External Service Integrations

  • Communication with unreliable external APIs
  • Stable processing even during third-party service outages

Applications Requiring Audit Trails

  • Highly regulated fields like finance and healthcare
  • Systems requiring processing records for all requests

Unsuitable Use Cases

Applications Requiring Real-time Responses

  • Search autocomplete, real-time chat
  • Real-time game interactions
  • Strict latency requirements under 100ms

Simple CRUD Operations

  • Simple read/write operations with consistent load
  • Tasks requiring immediate result confirmation

Cases Where Synchronous Processing is Essential

  • Real-time payment authorization (immediate approval/rejection needed)
  • Login/authentication (immediate success/failure response required)

Hands-on Practice Guide

If you want to experience this pattern firsthand, try out the GitHub repository I've prepared.

GitHub Repository: https://github.com/jaeneungsim/cdk-sqs-lambda-pattern

Prerequisites

  • Node.js 18+
  • AWS CLI configured
  • CDK CLI installed: npm install -g aws-cdk

Practice Steps

1. Clone Repository and Install Dependencies

git clone https://github.com/jaeneungsim/cdk-sqs-lambda-pattern.git
cd cdk-sqs-lambda-pattern
npm install

2. CDK Bootstrap (First time only)

cdk bootstrap

3. Deploy All Stacks

cdk deploy --all

4. Test the System After deployment, access the CloudFront URL you received:

  • Click "Test Sample Lambda 1" button
  • Click "Test Sample Lambda 2" button
  • Confirm that messages are immediately stored in queue and processed asynchronously

5. Check Monitoring Verify these metrics in AWS CloudWatch:

  • SQS queue depth
  • Lambda function execution logs
  • Dead Letter Queue status

6. Direct API Testing

curl -X POST https://your-cloudfront-url/api/sample-lambda-1 \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello World"}'

Resource Cleanup

After practice, delete resources to save costs:

cdk destroy --all

Extensibility

Based on this foundation pattern, you can add features like:

  • Database Integration: DynamoDB, RDS connections
  • Notification Systems: Email, SMS, push notifications
  • File Processing: S3 uploads, image resizing
  • External API Integration: Third-party service integrations
  • Environment Separation: Development, staging, production environment configurations

Conclusion

When choosing between fast responses and reliability, you must carefully consider your business characteristics and requirements. If user requests absolutely cannot be lost, the SQS-Lambda asynchronous processing pattern can be an excellent choice. Of course, this pattern isn't perfect. It has limitations like initial setup complexity and lack of immediate feedback due to asynchronous processing. However, the advantages of enhanced system reliability and scalability, plus long-term operational cost savings, more than compensate for these drawbacks. The true value of serverless architecture lies in applying the right pattern to the right situation. If reliable request processing is crucial for your next project, I encourage you to consider this SQS-Lambda pattern.