Looking for Senior AWS Serverless Architects & Engineers?
Let's TalkIntroduction
Sometimes running a huge amount of operations in a database is very tough, especially when you need to ensure the order in which operations are performed. One of our clients was struggling with a really slow processing for SQS Fifo queue messages consuming and we made some tests around it to understand what performance improvements could be done.
This post will pass through a simple implementation around Lambda and RDS, created to get more metrics and information about this common pattern used over AWS cloud.
Assets and Resources dependencies
We used a RDS Aurora MySql cluster (1 Read / 1 Write) with db.t3.small instance size, running in a Private VPC.
RDS Cluster Screenshot
We also have Secrets Manager configured to save database credentials and host name.
If you are just here for the code, check out the template, <>.
Project Resources
Database Table
We used MySql Sakila Sample Database as a start point to realize our operations.
For this use case we basically insert some random generated First and Last name values to the actor table.
Consumer
- As the only responsibility for the consumer is to iterate over a batch of messages and run a simple insert query for each of them (it could be a batch query to be faster), we are using Lambda with 128mb memory allocation for it (minimum size allowed) running inside the same VPC as the database.
- To be able to communicate with Secrets Manager we are creating a VPC Endpoint to have it working.
- As we are using Artillery for the load test and it works for HTTP requests only, we have an API Gateway to configure a HTTP endpoint as the message input and sent it to a SQS queue using FIFO mode (First In First Out).
Important Considerations
Dead Letter Queue Required
If the consumer has any error, it should have your queue locked until you send the message(s) that you tried to process to a Dead Letter Queue. If you don't handle this, you can have your entire process locked. That's why we also deploy a DLQ configured to be used with the RedrivePolicy from the main queue.
Another important information here is if you try to process a batch of 10 messages and have an error in any of them, you will have a "failed" scenario for the entire batch. You need to be careful when processing from the Dead Letter Queue, because in this case, it can have messages that were already processed successfully inside the failed batch.
Scaling SQS FIFO
As we are using a FIFO queue, we are not able to have a horizontal scalable consumer for it as concurrent executions will mess it up with the order of the messages. You can process up to 10 messages per lambda execution using batchSize.
As we are running the Lambda function inside a private VPC, we need a VPC Endpoint in order to access the Secrets Manager.
Producer (Load Testing)
To be able to generate a good amount of data over SQS we used the serverless-artillery solution. If you don't know it yet, we have a post in our blog with more information right here.
For our test case, we just created the script.yml file to generate random id's messages and POST it to an API Gateway endpoint to make it available in the SQS Queue. You can find the config file for Artillery here. We are using this randomNumber for ID because our queue doesn't accept duplicated entries.
For this scenario, you can get BASE_PATH from the output of the consumer deploy, like this:
Running Load Test—Test Scenarios
#1 Without MySQL interaction
SQS FIFO—BatchSize 1
- Number of messages received in 30 seconds: 9625
- Total Time to process: ~7 minutes
Lambda Metrics
SQS Metrics
SQS FIFO—BatchSize 10
- Number of messages received in 5 minutes: 90814
- Total Time to process: ~7 minutes
Lambda Metrics
SQS Metrics
With MySQL Queries
In the previous load tests we didn't use mysql queries, we were merely getting a baseline for how many messages could be processed without any operations.
Now we are going to add the MySQL queries in and see how the performance changes.
SQS FIFO—BatchSize 1 - (1 INSERT per message)
- Number of messages received in 30 seconds: 2,288
- Total Time to process: ~6 minutes
SQS FIFO—BatchSize 10 - (1 INSERT per message / 10 per lambda execution)
- Number of messages received in 30 seconds: 2,274
- Total Time to process: ~1min 15s
SQS FIFO—BatchSize 10 - (1 INSERT per message / 10 per lambda execution)
- Number of messages received in 5 minutes: 65,530
- Total Time to process: ~23 minutes
Lambda Metrics
Now that we are running queries to our RDS database, our duration has increased compared to the scenario without the database. Remember, we are running a 128mb lambda and it's running 10 different insert queries each time.
Database Metrics
DB Connections
As we are dealing with FIFO queue, you should not have more than 1 execution running in parallel for it. For that reason you should not have a high volume of connections over your RDS instance.
CPU Utilization
As we are not running concurrent queries over our database we should have a low usage of CPU at the RDS instance. For that print screen, we wrote more than 65k new register and only used a peek of 17% of CPU in the smallest instance that we have available.
Write Latency
This metric is very good to check. If you run a huge amount of queries simultaneously, it will increase your write latency and consequently this will make your Lambda function take more time to finish processing.
Conclusion
At the end of the day, we concluded that you can handle a very good amount of data using the SQS FIFO queue using the batchSize: 10 strategy. It literally increases the performance almost 10x compared to setting the SQS FIFO queue to batchSize: 1 and when it comes to high production traffic this is one way to help solve this kind of bottleneck.
While we were studying and running the test cases we discovered that some developers are using Kinesis Data Stream instead of SQS FIFO, which gives them the possibility to deal with a much bigger volume of data, in a smaller window of time. We are going to test this scenario as well and write another post talking about it.
Thanks for reading and let us know if you have any questions.