Introduction
Nowadays organizations are increasingly adopting Apache Kafka to capture and analyze real-time data streams from sources like IoT devices, website clickstreams, financial systems, and application logs.Many customers use Amazon MSK, a fully managed service for Apache Kafka, to build and run highly available, secure, and scalable applications. However, for unpredictable and unknown workloads, provisioning cluster capacity is a guessing game.
Amazon recently introduced Amazon MSK Serverless; with Amazon MSK Serverless you can run Apache Kafka without managing cluster capacity and Amazon MSK Serverless provisions resources as per the need of On Demand streaming.
What is Apache Kafka?
Apache Kafka is an open-source, high-performance, fault-tolerant, and scalable platform for building real-time streaming data pipelines and applications. Apache Kafka is a streaming data store that decouples applications producing streaming data (producers) into its data store from applications consuming streaming data (consumers) from its data store.
What is Amazon MSK?
Amazon MSK (Amazon Managed Streaming for Apache Kafka) makes it easy to ingest and process streaming data in real-time with fully managed Apache Kafka.
Amazon MSK manages Apache Kafka infrastructure and operations, making it easy for developers and DevOps managers to run Apache Kafka applications and Kafka Connect connectors on AWS, without the need to become experts in operating Apache Kafka.
For the Amazon MSK case, however, for unpredictable, unknown workload provisioning cluster capacity is a guessing game and you need to adjust manually.
Amazon MSK Console → Clusters → Create Cluster → Cluster Type: Provisioned
Cluster type “Provisioned” means you’re required to manage broker instances, EBS storage volume sizes, etc. according to your workload. This requires a much deeper understanding of how your application performs. Downsides of “Provisioned” like other AWS services, over-provisioning or provisioning for the worst-case scenario is common without elegant scale-down events as traffic slows. Meaning you’re paying a lot more than an option where it scales to match traffic in real-time or almost real-time.
What is Amazon MSK Serverless?
Amazon MSK Serverless is a cluster type for Amazon MSK that makes it easy for you to run Apache Kafka without having to manage and scale cluster capacity. MSK Serverless automatically provisions and scales compute and storage resources, so you can use Apache Kafka on demand and pay for the data you stream and retain.
Amazon MSK Console → Clusters → Create Cluster → Cluster Type: Serverless
Cluster type of “Serverless” means you’re not required to manage broker instances, EBS storage volume sizes, etc. Amazon MSK Serverless will auto-scale on demand behind the scenes. Amazing!
Comparison Between Amazon MSK And Amazon MSK Serverless
Conclusion
Amazon MSK is a nice service to handle the use cases of capturing and analyzing real-time data streams from sources like IoT devices, Website click streams, Financial systems, and application logs.
However, for unpredictable, unknown workloads provisioning MSK cluster capacity is a guessing game. Amazon Recently Introduced Amazon MSK Serverless, with MSK Serverless you can run apache Kafka without managing cluster capacity, and MSK Serverless provisions resources as per the need of OnDemand streaming.
OPINION: Since Amazon MSK Serverless is still relatively new we don’t have enough data to confirm that it’s worth the switch for all use-cases, however, if approaching anything Kafka related on AWS we would recommend you trial Amazon MSK Serverless for yourself to see if your use-case fits in despite some lack of feature parity with the non-serverless option.