AWS Lambda and SQS: A Complicated Relationship
Lambda and SQS are two powerful Amazon Web Services tools that, when used together, can greatly improve your workflow. However, using them together can also be quite complicated. In this short summary, we’ll explore the benefits of using Lambda with SQS and discuss some of the challenges you may face when trying to use them together. I'll also provide some tips on how to overcome these challenges.
One of the main benefits of using Lambda with SQS that we'll cover is that it allows you to process messages in batches. Lambda can be configured to process the messages in "batches," resulting in a lower number of executions. Since Lambda is serverless and AWS account owners only pay for the time that lambda is running, this configuration can greatly reduce costs in scenarios where data is processed momentarily, or once a day/week/month. Another benefit to SQS batching is that you can use it to avoid exceeding AWS service quotas, which limits the number of lambda executions that can be running at any point in time before resulting in a denial of service in your account.
Sounds like a fairly straightforward solution right?
Not exactly...
The first caveat of SQS-Lambda batching is that it becomes inconsistent when dealing with a small number of messages. For example, if four messages are sent to the queue, and your lambda is set to pull messages in batches of five, your lambda may invoke more than once. This is because Lambda processes five batches at a time, which means the batch of four messages you were expected may be split into smaller pieces and processed concurrently. So depending on whether or not that matters to you, you may want to consider not using batches for smaller traffic.
The second and much more obvious potential obstacle is timeouts. A hard-working lambda function that takes 15 seconds to process one message will take 15 minutes to process a batch containing sixty messages- at which point lambda will time out. There is also the issue of message size- which is set at a maximum of 256KB per message (or batch if applicable).
Let's take a step back before diving into the final yellow flag of this architecture that we’ll discuss. SQS is designed so that messages are to be "consumed" by the receiving service (in this case lambda). This means that when messages are processed by the receiver, the messages should then be deleted by the receiver, indicating that the message has been successfully consumed. Failing to delete a processed message can lead to infinite loops, which could put a black hole in your expenses.
Circling back to our batching scenario, Lambda by default will process SQS batches with an all-or-nothing response. What that means is that if one of 1,000 messages in a batch throws an unhandled error, Lambda will put all of the messages back. If your architecture requires large batches, you should be very conscious of how errors will effect which messages are deleted, and which are put back. Luckily, AWS has somewhat recently implemented a Partial Error Handling feature for this exact scenario. You can check out the docs for implementing this here.
In short, while SQS and Lambda make a great team for processing larger amounts of data in bursts, there are many considerations to take into account when trying to use them together. Batch size, error handling, and concurrency are a few of many factors that play a role in how well your architecture will function. With the right planning, and possibly a lot of trial and error, you can have resilient, cost effective data processing with SQS and Lambda.
More information on using Lambda with SQS can be found here. If you have any questions or want to chat about your specific use case, feel free to reach out or connect on LinkedIn!