Looking for Senior AWS Serverless Architects & Engineers?
Let's TalkServerless Guru’s Senior Cloud Architect Dallas Slaughter went over some Standout AWS Services a couple of days ago. In this post, we’ll go into some useful vocab of one of those.
DynamoDB helps you make, edit and store tables.
A table has rows and columns, called items and attributes.
🆔 Each item has one attribute that is its ID, called the primary key. In daily life, IDs can be simple, like a name tag, or more complex, like a driver’s license.
️Similarly, a table uses simple or composite primary keys.
A simple primary key has one attribute. 🗝️ A composite primary key has two. 🗝️🔑
It’s kind of like how Excel lets you “sort by” [Column] and “then by” [another column]. “Column” here is the partition key, and “another column” is the sort key. Together, the partition key and the sort key make up the composite primary key. By itself, the partition key is a simple primary key.
How to optimize DynamoDB
Secondary indexes
Choose how to sort from the start. Imagine you needed to reorganize your Excel spreadsheet manually each time you want to get some information. That would take a lot of mental energy — a lot of computing!
🖥️ A lot of computing power is what SQL databases are optimized for.
💽 DynamoDB, however, is optimized for storage. It can store several copies of your database, each copy sorted in a different way.
So it’s good to choose how to sort your DynamoDB table from the start, but you can choose multiple ways. A secondary index gives that secondary way of sorting your DynamoDB table.
❓ Why would you want multiple ways of sorting your information?
You may want to sort by name and receipt, but also by name and date. Here, the name is the partition key and the receipt is the sort key. In the secondary index, the date is the sort key.
Local and global secondary indexes
Your secondary index can be local or global.
🥕 Local carrots at the supermarket, come from the same partition — the same region — of the country that you’re in. You and the carrot have a partition in common.
Similarly, a table and its local secondary index have a partition key in common. In the receipt example, the secondary index is local because it also uses names as the partition key.
🌎 A global secondary index, on the other hand, is sorted in a completely different way. Think global.
🇨🇳 When I first spent a couple of months in China, I was amazed at how different life was organized. It still had all the basic building blocks, but the key values were different. The family seemed to be the organizing principle, instead of the individual. I started to realize just how self-centered I was used to being…
🤪 Ok, that’s a gross over-generalization about a vast and diverse country. But after thinking of this analogy, I never forgot that a global secondary index can have a different sort key and a different partition key than the underlying table. Here’s a fun AWS fact about that:
“For maximum query flexibility, you can create up to 20 global secondary indexes (default limit) and up to 5 local secondary indexes per table.”
Provisioning read and write capacity units or autoscaling
Speaking of silly examples, imagine you have 5 billion users. Each of them searches your database 10 times per day, and you want to make sure your database doesn’t crash.
🥧 Now imagine you subscribe to a weekly pie delivery. You live alone, so you eat one-per-week. But six months later, you get a roommate, and the two of you increase your subscription to a two-per-week plan.
🥧🥧 You just provisioned an eat-unit-capacity of two pies-per-week! You’re well on your way to eating billions of pies!
👩🍳 👨🍳 You also like baking pies. You’re busy, however, so you only bake one per month. However, your blueberry pies are a hit at the office.
🥧🥧🥧🥧🥧 So you make time in your schedule to bake five per week. You just provisioned a bake-unit-capacity of five pies-per-week!
You now have an eat-unit-capacity of 2 and a bake-unit-capacity of 5.
✖️⌛ Similarly, you can provision read and write capacity units. This takes some multiplication skills and monitoring time. However, you can also let Auto Scaling for DynamoDB manage that for you.
Autoscaling your DynamoDB table is as if your pumpkin pie subscription service would let you eat 1 pie one week, 5 pies the next, and 2 pies the third week, without penalties.
Eventually and strongly consistent reads
🚚 The pie delivery service tries to keep up with the latest trends in pie making. So they offer two options. Always get your pie delivered on time, even if the pie is so last week’s pie fashion. Or…wait a couple of extra days now and then for the pie company to catch up to the pie trend of the week and bake that trendy oreo cream pie.
These options are like eventually consistent reads or strongly consistent reads. Eventually consistent is like the pie always getting delivered right when you expect it. It’s fast, but it might have stale data from a second ago.
Strongly consistent is like waiting for the pie company to catch up with the oreo cream pie trend. It might take an extra second, but it’s always going to reflect the latest changes to the database. The pie will be strongly consistent with the trends.
💸📈 Staying up with the trend takes more effort and is more expensive. It’s usually only recommended for intensely regulated and data-driven organizations like the stock market.
The math of reading and writing
💰 AWS treats a write capacity unit (WCU) and a read capacity unit (RCU) kind of like coins. A WCU costs the same as four strongly consistent RCUs, and the same as eight eventually consistent RCUs. Writes are more expensive than reads, and strongly consistent reads are more expensive than eventually consistent reads.
AWS pricing as of March 19, 2019:
DynamoDB charges one WCU for each write per second (up to 1 KB)… For reads, DynamoDB charges one RCU for each strongly consistent read per second…and one-half of an RCU for each eventually consistent read per second (up to 4 KB).
🌡☁️ ️ For example, if you need your weather monitoring device to write 2 KB of data to your database every second, you need to provision two WCUs.
📺 In another example, if you know that a TV transcription database will have at most 5 KB of eventually consistent data read from it every second, then you need to provision one RCU. That’s one half of an RCU for the first 4 KB and another half of an RCU for the remaining 1 KB. (So in a way…you’re wasting 75% of one-half RCU…)
Diving Deeper
My favorite talk so far about DynamoDB has been the 2018 AWS re:Invent talk Advanced Design Patterns for DynamoDB. Here the principal NoSQL technologist Rick Houlihan describes the difference between NoSQL and SQL as a matter of storage optimized versus compute optimized.
This was just the tip of the DynamoDB iceberg. Stay tuned for more in-depth articles and check out the video Serverless Intro, part six: DynamoDB Configby Serverless Guru founder Ryan Jones.
Glossary
Resources:
- DynamoDB Guide’s Key Concepts (AWS docs)
- Read consistency (AWS docs)
– Using DynamoDB in Production (Dyspatch blog, an email service)
– DynamoDB Pricing (AWS docs)
– A Critical View of DynamoDB by a DB Competitor (YugaByte DB)