Some notes on DynamoDB 2022 paper

simlevesque | karma 2715 | avg karma 2.38 · 2022-08-05 16:05:33

The link does not work for me.

c4pt0r | karma 170 | avg karma 1.98 · 2022-08-05 16:12:17

https://www.notion.so/Some-notes-on-DynamoDB-2022-72a2a1f225... I also put a copy on notion, in case the link doesn't work.

nighthawk454 | karma 1058 | avg karma 3.44 · 2022-08-05 17:27:50

working fine here

jerryjerryjerry | karma 1755 | avg karma 3.15 · 2022-08-05 17:30:37

Same here.

binwiederhier | karma 2647 | avg karma 7.44 · 2022-08-05 18:51:23

Oh my. The author put his website on a host ame that includes an underscore: http://_.0xffff.me/dynamodb2022.html

While underscores are valid hostnames per the DNS spec, they are not valid for hostnames in URLs. Firefox honors the HTTP spec and fails the request, but Chrome seems more lenient and displays the page.

To the author. Please put your site on a valid hostname.

Edit: Better explanation: https://stackoverflow.com/a/2183140

reply

philipkglass | karma 12440 | avg karma 5.17 · 2022-08-05 19:07:11

Firefox honors the HTTP spec and fails the request, but Chrome seems more lenient and displays the page.

I just successfully opened this link in Firefox 103.

reply

LilBytes | karma 2007 | avg karma 3.25 · 2022-08-06 07:43:23

Doesn't work on the Firefox android app.

binwiederhier | karma 2647 | avg karma 7.44 · 2022-08-06 08:33:29

Yeah that's what I tested with. Firefox (desktop) works. Firefox mobile (Android) does not.

buzer | karma 671 | avg karma 2.81 · 2022-08-05 19:14:41

Works for me on Firefox nightly. One thing to note is that it's HTTP-only site, so if you have some kind of extension/setting to force HTTPS it's not going to work.

binwiederhier | karma 2647 | avg karma 7.44 · 2022-08-06 08:33:54

It doesn't work with Firefox mobile (Android).

flakiness | karma 1428 | avg karma 3.36 · 2022-08-05 17:35:49

Related: "The DynamoDB Paper" https://news.ycombinator.com/item?id=32094046

tonyxq | karma 1 | avg karma 0.17 · 2022-08-05 17:38:19

xthrowawayxx | karma 49 | avg karma 1.48 · 2022-08-05 18:51:51

Here's my notes on DynamoDB: How to spend $100k on what would cost $10k with an sql server for a 100x worse service

atwood22 | karma 927 | avg karma 3.22 · 2022-08-05 18:57:06

A master carpenter always blames his tools.

btown | karma 14946 | avg karma 5.75 · 2022-08-05 21:23:46

A master carpenter is knowledgeable enough to be critical of those who aggressively peddle tools to unsuspecting customers who are likely to be unfamiliar with just how dangerous and potentially project-destroying those tools can be when not the right tools for the job.

otterley | karma 9624 | avg karma 2.68 · 2022-08-06 08:47:40

That’s not the saying.

icedchai | karma 8478 | avg karma 1.53 · 2022-08-05 19:08:32

You need to use the right tool for the right job. I know people using DynamoDB for a tiny dataset that would easily fit in sqlite (or any other DB) running on a $20/month VPS. That wouldn't be serverless, of course, so it's a no-go.

cebert | karma 5211 | avg karma 4.82 · 2022-08-05 20:46:47

That same dataset cloud then be modeled and stored in DynamoDB for even leas than that, right?

BreakfastB0b | karma 455 | avg karma 4.84 · 2022-08-05 21:09:11

Yeah I have no idea what icedchai is talking about, DynamoDB free tier is super generous https://aws.amazon.com/dynamodb/pricing/on-demand/. It's going to cost you nothing until you have enough customers to afford to pay for it. Correctly modelling single table design on the other hand ...

glenngillen | karma 2389 | avg karma 4.48 · 2022-08-05 21:52:59

I use it for lots of stuff like this. The pay-per-use/on demand pricing makes it incredibly cheap even if I get occasional bursts of activity. With much better availability than SQLite running on a single VPS.

icedchai | karma 8478 | avg karma 1.53 · 2022-08-05 22:19:38

1) Latency. 2) Ease of data manipulation.

Using Dynamo for a small data set is overkill. You can manipulate the data way faster on a local server, where it is basically in memory (disk cache), and not have to deal with any modelling issues.

I guess some people like the DynamoDB API? I find it incredibly awkward.

reply

blackoil | karma 3617 | avg karma 2.53 · 2022-08-06 00:17:30

You can be even faster if you store data in client. Though different use case different solutions.

slyall | karma 8061 | avg karma 7.51 · 2022-08-05 22:52:04

Not sure what you mean by "tiny dataset" by DynamboDB is great for something with 100 or a few thousand items. Especially if these are only occasionally accessed but need to be shared.

Half the time it'll be in the Free Quota or perhaps $1/month. Certainly cheaper than creating an instance.

reply

icedchai | karma 8478 | avg karma 1.53 · 2022-08-06 06:25:53

I’m basically talking a couple gigabytes of data. Something non-trivial but also doesn’t need a massive distributed DB.

arinlen | karma 1670 | avg karma 2.48 · 2022-08-06 16:18:14

> I’m basically talking a couple gigabytes of data.

You'd be happy to learn that DynamoDB's free tier covers DBS up to 25GB.

https://aws.amazon.com/dynamodb/

reply

Closi | karma 8474 | avg karma 3.3 · 2022-08-06 01:04:39

> I know people using DynamoDB for a tiny dataset that would easily fit in sqlite (or any other DB) running on a $20/month VPS.

Depending on the use cases, there are plenty of reasons you might want to go down a NoSQL route other than price - schemaless makes it much easier and quicker to hack together new projects for instance (and more fun too!)

reply

arinlen | karma 1670 | avg karma 2.48 · 2022-08-06 16:15:38

> I know people using DynamoDB for a tiny dataset that would easily fit in sqlite (or any other DB) running on a $20/month VPS.

I have to say your comment comes off as very ignorant. If you are a AWS customer then you either pick any of the database offerings, such as DynamoDB or Amazon RDS, or run your own database on a EC2 instance. Except running your own db in EC2 can cost around the same as running Amazon RDS, and DynamoDB has a very roomy free tier.

Therefore the piece of info you somehow left out is that DynamoDB is free for "a tiny dataset", and you do not have to manage anything at all with DynamoDB too.

reply

icedchai | karma 8478 | avg karma 1.53 · 2022-08-07 15:16:27

I already know all that. I’ve been using AWS for over 10 years. I’ll just say I prefer the relational model when starting out and leave it there. I’ve had good luck with RDS.

I’ve seen people paint themselves into a corner by screwing up their DDB keys too many times and having to export and reload all their data. If you don’t think ahead about your access patterns this is very easy to do. Nobody thinks ahead with “agile.” You’re better off starting with SQL and migrating things to Dynamo where it makes sense.

reply

sass_muffin | karma 31 | avg karma 3.1 · 2022-08-05 21:23:51

This is the video I recommend to others when working with dynamodb. The video is by Rick Houlihan about dynamodb modeling. In my experience most developers that complain about dynamodb don't fully understand it.

https://www.youtube.com/watch?v=HaEPXoXVf2k

reply

newlisp | karma 45 | avg karma 0.62 · 2022-08-05 21:36:52

And many developers don't fully understand that he's a good salesman and can't see through the BS.

sass_muffin | karma 31 | avg karma 3.1 · 2022-08-05 21:48:44

All technologies have their pros and cons. They have use cases where they make sense and use case where they don't. The job of an engineer to decide which tool fits which use-case. To dismiss a useful technology as "BS", especially one used by companies all over the world for over a decade without any backing data seems a bit disingenuous.

newlisp | karma 45 | avg karma 0.62 · 2022-08-05 22:01:19

All technologies have their pros and cons. They have use cases where they make sense and use case where they don't. The job of an engineer to decide which tool fits which use-case.

Exactly. But that's not how he paints it, I have seen him bashing RDBMs as been a thing of the past and his promoted way of data modeling and "new" database technology is how companies should start today or be moving to.

reply

SPBS | karma 1480 | avg karma 3.99 · 2022-08-06 01:37:01

DynamoDB can model relational data just fine, if you're okay with setting your query access patterns in stone and never changing them again.

osigurdson | karma 3456 | avg karma 1.68 · 2022-08-05 21:45:52

SQL server seems like an odd choice outside of the enterprise. Suggest running Postgres on Linux.

kumarvvr | karma 6431 | avg karma 3.35 · 2022-08-05 22:10:23

I think it is more of an issue with not able to effectively model your data to suit the DDB paradigm.

DDB absolutely shines when you have to scale. I mean, have you ever tried setting up a cluster of SQL servers. It's a nightmare.

DDB is breezingly easy, as long as you know how to model your data effectively.

reply

hw | karma 1440 | avg karma 2.96 · 2022-08-05 22:54:51

Rdms is breezingly easy, as long as you know how to operate your clusters effectively

fubbyy | karma 9 | avg karma 1.0 · 2022-08-06 04:08:56

I could be wrong, but at truly large scale RDMS can’t compete, right? SQL simply can’t horizontally scale in the same way?

fragmede | karma 18795 | avg karma 1.82 · 2022-08-06 05:18:16

Given perfect knowledge of access patterns, I bet you could. Especially since it's basically all reading and not writing. Horizontally scaled with many, many read-only replicas. But then there are lies, damn lies, and benchmarks. All the big companies running huge Oracle db installations are busy running their workloads, which probably don't look like Amazon Prime day traffic.

It's also impossible to have perfect knowledge of access patterns.

reply

oceanplexian | karma 5045 | avg karma 3.29 · 2022-08-06 06:32:10

Sure it can, and I’ve operated MySQL (Percona) at large scale for a social media company. You shard requests by user or something else, doesn’t matter if you have 50 DBs or 50,000. However in most cases you have to write the sharding mechanism yourself, and understand your workload and what such a system can and cannot do.

dalyons | karma 1221 | avg karma 2.31 · 2022-08-06 14:14:58

even if you know how to do it, running large sharded rdbms clusters is incredibly far from "breezingly easy"

shepherdjerred | karma 4030 | avg karma 2.34 · 2022-08-05 23:16:50

> In 2021, during the 66-hour Amazon Prime Day shopping event, Amazon systems - including Alexa, the Amazon.com sites, and Amazon fulfill- ment centers, made trillions of API calls to DynamoDB, peak- ing at 89.2 million requests per second, while experiencing high availability with single-digit millisecond performance.

Yeah, good luck beating DDB on that one.

reply

Demiurge | karma 1728 | avg karma 2.17 · 2022-08-05 23:40:18

Even better luck building an Amazon scale business after spending 100k on dynamodb.

__alexs | karma 2585 | avg karma 3.13 · 2022-08-06 06:09:23

At a previous job I moved a giant Postgres server to Dynamo and it was cheaper, faster and had better resiliency.

Years later some people moved it back to SQL and made it cost 2x as much...

Bad engineering is possible with all technologies.

reply

boruto | karma 469 | avg karma 1.94 · 2022-08-06 06:40:18

I was part of project where we moved user transaction lists from pg to dynamo.

While there are pros like ease of scale and all, the biggest was to tell product and higher-ups that the out of place feature with groupbys was simply not possible and there by ending the whole discussion.

reply

gime_tree_fiddy | karma 67 | avg karma 1.97 · 2022-08-05 20:48:00

> because it doesn't need to, Shared Nothing systems look similar, and the authors know exactly who the readers of this paper are ^_^)

I didn't understand this. Who is the author referring to, and what is he implying?

reply

c4pt0r | karma 170 | avg karma 1.98 · 2022-08-05 21:15:06

I think a lot of shared-nothing systems are similar from high level, connection handling / storage node (with small shards) / Metadata (routing)

shepherdjerred | karma 4030 | avg karma 2.34 · 2022-08-05 23:17:39

> In 2021, during the 66-hour Amazon Prime Day shopping event, Amazon systems - including Alexa, the Amazon.com sites, and Amazon fulfill- ment centers, made trillions of API calls to DynamoDB, peak- ing at 89.2 million requests per second, while experiencing high availability with single-digit millisecond performance

The scale that DDB operates at is mind-boggling. Where would someone even start when designing a system that can handle nearly 100 million requests per second?

reply

ZephyrBlu | karma 7568 | avg karma 3.32 · 2022-08-05 23:30:38

Definitely insane scale, but I wonder how much of that is horizontal scaling.

yazaddaruvala | karma 3171 | avg karma 2.36 · 2022-08-06 02:34:50

Almost all of it.

“During Prime day” implies this is all read dominated traffic. The RCUs are all going to be provisioned so there are appropriate replicas pre-created in the correct regions.

Disclaimer: Used to work at Amazon.

reply

jjtheblunt | karma 4559 | avg karma 1.2 · 2022-08-06 01:55:21

a multicore machine? perhaps realized across physical cpus and os instances, or tons of memory with a high core count cpu?

Gh0stRAT | karma 604 | avg karma 4.38 · 2022-08-06 05:07:17

Yeah, it totally blew my mind when I first heard these performance numbers as well. RE "nearly 100 million requests per second": as of Prime Day 2022, it has actually handled >100 Mreq/sec!

>Over the course of Prime Day, these sources made trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 105.2 million requests per second. [0]

[0] https://aws.amazon.com/blogs/aws/amazon-prime-day-2022-aws-f...

reply

potamic | karma 1750 | avg karma 3.49 · 2022-08-06 05:17:34

You can easily hit 100k rps on a typical NOSQL cluster. Scaling it to 100 million is just a matter of running a few thousand instances. Of course, operating a system with thousands of nodes is an engineering feat, but from a design perspective it's not super complicated.

otterley | karma 9624 | avg karma 2.68 · 2022-08-06 08:45:38

“Just a matter of...”

Anyone who trivializes the complexity of actually operating such a system should be forced to build and operate it themselves, and be held to account if it fails.

reply

arinlen | karma 1670 | avg karma 2.48 · 2022-08-06 16:08:55

> Scaling it to 100 million is just a matter of running a few thousand instances.

Please post the link to any GitLab/GitHub you own where you showcase running "a few thousand instances" of anything at all.

reply

explaingarlic | karma 153 | avg karma 1.47 · 2022-08-06 09:10:04

> Where would someone even start when designing a system that can handle nearly 100 million requests per second?

In the case of DynamoDB, it's just a streak of use-case-appropriate sharding techniques, and a whole lot of scalability elsewhere :P

I would imagine a good chunk of the DynamoDB team had to work on the requirements side of engineering, or at the very least it took a lot of research into the matter of how DynamoDB would be used.

reply

samsquire | karma 2621 | avg karma 2.37 · 2022-08-06 00:14:55

I wrote a toy database that can be queried similar to dynamodb

HTTPS://GitHub.com/samsquire/hash-db

It's a trie in front of a HashMap.

reply

eatonphil | karma 21581 | avg karma 5.52 · 2022-08-06 08:13:29

Wow this is a really, really cool project!

samsquire | karma 2621 | avg karma 2.37 · 2022-08-06 16:38:42

Wow you are so kind.

Thank you! The code I tried to keep simple and I tried to keep it small. I'm trying to do the most basic thing that shall work.

I want to add document storage and unify the storage mechanism so the database can be multimodel like OrientDB and ArrangoDB. So graphs should be stored same way as documents and SQL.

Currently the graph data model is separate from the SQL data model so you cannot query a graph as SQL or vice versa.

reply