Looking for Senior AWS Serverless Architects & Engineers?
Let's Talk*NOTE: This article demonstrates how to get to 10k RPS, the assumption is by increasing the provisioned concurrency further we would get to the 30k RPS number in the title
GOAL
Test Lambda’s Ability to handle Handle 10000/RPS
APPLICATION
The application used for this tests is a single GET endpoint NodeJS application deployed with AWS API GW/Lambda/DynamoDB
LOAD TESTING TOOLS USED
Artillery.io
- Written in NodeJS
- Lightweight
- Easy Installation
- Uses JSON/YML/JS scripts
- NO GUI
- Load generation limited to host system memory and CPU Utilization
Jmeter
- Runs on JVM
- XML Configuration
- GUI Available
- Loads of Plugins
- Better logs
- Load generation limited to host system memory and CPU Utilization
Serverless-Artillery
- Written in NodeJS
- Runs on AWS Lambda
- Lightweight
- Easy Installation
- Can generate higher throughput with simple configurations
- Load generation time is limited to Lambda default timeout 15 min
The load test tool I used here is Serverless-Artillery because with MacBook Pro(8BG, 2.4GHz) I was not able to generate test loads more than 2000RPS.
Load Testing With Regional Soft Limits
Region: us-east-1
Concurrent Execution limit: 1000 (Shared across all the function in the region)
The Goal here is to find how many requests Lambda can handle with default soft limits applied by AWS on the us-east-1 region.
Load Test configuration
config:
target: "https://e2oerxwy12.execute-api.us-east-1.amazonaws.com"
plugins:
cloudwatch:
namespace: "sls-artillery"
phases:
-
duration: 300
arrivalRate: 500
rampTo: 10000
scenarios:
-
flow:
-
get:
url: "/dev/get?id=erewqed"
This config will try to generate 500 user request/second and will try to ramp up the requests to 10000/RPS in a period of 5 minutes.
Result
The above is the Cloudwatch Dashboard for the Application. All the count shown in the graphs are aggregate of 1 min.
For eg: the first graph shows the API GW call's count. At the highest peak API Gateway received 448K request in a minute. Which means 475000/60 = 7916 request per second
As we can see in the concurrent execution graph it hits the region concurrency and goes in leaner after it reached 1000 concurrent execution.
At this point lambda starts throttling the requests as we can see the Throttle Graph almost same amount of 5XX error can be seen in API Gateway 5XX graph
This test generated a throughput of 7916 at max and out of it 6416 got throttled
With default limits lambda can only serve 1000 concurrent request per second other requests beyond that will be throttled
Load Testing With Increased Regional Soft Limits
I have increased the regional concurrency limit of us-east-1 region to 20000 via service quota limit increase. And I was expecting by the limit increase lambda can process 10000/RPS more.
Scenario 1
Region: us-east-1
Concurrent Execution limit: 20000 (Shared across all the function in the region)
Load Test configuration
config:
target: "https://e2oerxwy12.execute-api.us-east-1.amazonaws.com"
plugins:
cloudwatch:
namespace: "sls-artillery"
phases:
-
duration: 900
arrivalRate: 2500
rampTo: 10000
scenarios:
-
flow:
-
get:
url: "/dev/get?id=erewqed"
This config will try to generate 2500 user request/second and will try to ramp up the requests to 10000/RPS in a period of 15 minutes.
Result
As we can see in the above dashboard the requests started to get throttled, when the concurrent execution goes above 3000 execution/second. even when the traffic is gradually increasing
The number 3000 is AWS Lambda's burst concurrency limit in us-east-1 region.
After the initial burst, your functions' concurrency can scale by an additional 500 instances each minute. This continues until there are enough instances to serve all requests, or until a concurrency limit is reached. When requests come in faster than your function can scale, or when your function is at maximum concurrency, additional requests fail with a throttling error (429 status code).
Scenario 2
Load Test configuration
config:
target: "https://e2oerxwy12.execute-api.us-east-1.amazonaws.com"
plugins:
cloudwatch:
namespace: "sls-artillery"
phases:
-
duration: 120
arrivalRate: 10000
rampTo: 10000
scenarios:
-
flow:
-
get:
url: "/dev/get?id=erewqed"
This test will try to generate a quick traffic of 10000 users comes in the time span of 2 minutes
Result
Here we can see artillery started with generating a quick load of around 4700/RPS . And lambda
started 3000+ containers to serve them and started throttling the requests.
Artillery generated a traffic of 10000 requests at max and out of it 600 requests got throttled
So in both scenarios(Gradual increase/Quick increase in traffic) we can see lambda was not able to process all the requests received because of the burst concurrency limit and the time needed for it to scale(500/min), during the period of scale after the initial burst it will throttle some of the requests.
Load Testing With Provisioned Concurrency
For this test i have enabled provisioned concurrency(10000) to the lambda function. Assuming that 10000 lambda instance's are available there to all the time to process any traffic up to 10000/RPS
Scenario 1
Load Test configuration
config:
target: "https://e2oerxwy12.execute-api.us-east-1.amazonaws.com"
plugins:
cloudwatch:
namespace: "sls-artillery"
phases:
-
duration: 600
arrivalRate: 2500
rampTo: 9250
scenarios:
-
flow:
-
get:
url: "/dev/get?id=erewqed"
This config will try to generate 2500 user request/second and will try to ramp up the requests to 9250/RPS in a period of 15 minutes. I kept 9250 because i want to see how the graph will look like without using 100% of the Provisioned concurrency.
Result
Some info on Provisioned concurrency Cloudwatch metrics
- ProvisionedConcurrentExecutions – concurrent executions using Provisioned Concurrency
- ProvisionedConcurrencyUtilization - fraction of Provisioned Concurrency in use ie:
- (ProvisionedConcurrentExecutions / total amount of provisioned concurrency allocated)
- ProvisionedConcurrencyInvocations - number of invocations using Provisioned Concurrency
- ProvisionedConcurrencySpilloverInvocations - number of invocations that are above Provisioned Concurrency
On the graph we can see artillery has generated a load of 9250 request per second. And lambda was able to execute all of that requests without throttling any of the request ✌️✌️✌️
There are some 5XX errors thrown by API Gateway. Which i believe because some of the lambda's timed out or failed to read from the DynamoDB, I didn't dig in deep because the goal here was to check if lambda was able to process all of the given request without throttling.
Scenario 2
Load Test configuration
config:
target: "https://e2oerxwy12.execute-api.us-east-1.amazonaws.com"
plugins:
cloudwatch:
namespace: "sls-artillery"
phases:
-
duration: 180
arrivalRate: 5000
rampTo: 10000
scenarios:
-
flow:
-
get:
url: "/dev/get?id=erewqed"
This config will try to generate 5000 user request/second and will try to ramp up the requests to 10000/RPS in a period of 3 minutes
Result
Here artillery generated a traffic of 10000RPS and kept it linear for sometime. As we can see lambda was able to process all of the requests without throttling. ✌️✌️✌️
We can also see some numbers in ProvisionedConcurrencySpilloverInvocations graph around
350 requests. These invocations happens when the ProvisionedConcurrencyUtilization goes more than 100% (Count 1 in the graph represent 100%) these requests are served by lambda's on demand scaling and these requests may have cold starts.
The provisioned concurrency can also scale with AWS autoscaling. I tried to use it and it did not work as expected. There are not much resources available online regarding autoscaling of provisioned concurrency. I will dig deep into this soon and will try to update this doc with the results.
Conclusion
All these tests gives us answer to a couple of questions,
Is AWS Lambda scalable as a traditional EC2/Container based architecture? YES
Can Lambda serve 30000RPS ? YES
- But it can be difficult.
- With default AWS regional limits lambda cannot serve more than 1000 concurrent execution
- With increased concurrent execution limit, there is still one more limit the Burst Concurrency limit. This will limit lambda to serve only 3000 concurrent request at time. If it receives more than 3000 concurrent requests some of them will be throttled until lambda scales by 500 per minute.
- By enabling provisioned concurrency and adding required number of concurrency to a function we can scale the functions without any throttling.
**Resources Referenced**