Hacker Read

cwmma · 2015-11-22 13:25:35+00:00

in theory it could be written to store each chunk as a separate value inside of indexDB because when seeding you should only need one chunk at a time...

mfreed | karma 716 | avg karma 3.11 · | 2019-08-21 20:39:59+00:00

Hey @manigandham thanks for the complements on database overall =)

I understand conceptually that this is all about splitting data, but I think if you look at most scalable databases that use sharding, it’s really meant as a partitioning of primary keyspace over servers, and then you just globally map this sharding through client libraries, some transparent proxy, or some map that every node maintains, because O(map) = O(# servers). Examples: Cassandra, DynamoDB, scale-out memcached, Vitesse, ZippyDB/RocksDB, etc.

We are instead tracking per-chunk state in catalogs to give us this level of flexibility, and allowing the movement/migration of individual chunks on a much finer-grained basis. This is both for placement/management across the cluster but also for data management on single nodes, e.g., for data retention policies, tiering, lazy indexing, etc.

I realize this isn’t a hard-and-fast rule, and exceptions always exist. But one reason we try to call this out is we’re often asked why we don’t just use a standard hash-based partitioning tool/system as a black box, which wouldn’t give us this level of fine-grained visibility & control that we find highly useful for time-series data management.

[Timescale co-founder & post co-author]

reply

dbmikus | karma 561 | avg karma 2.82 · | 2019-05-04 02:29:52+00:00

Would this just be like using RocksDB or something like that with extensions to auto-shard and balance between instances? That's what it looks like to me, but I could be misunderstanding it.

alexpopescu | karma 434 | avg karma 2.37 · | 2010-10-08 00:10:37+00:00

I didn't mean "wrong" in the full sense of the word. What I actually mean is that sharding based on user id will most probably not give/guarantee predictable distributions of the writes generated by users (basically there can be no guarantee that chunk#1 of users will always generate the aprox. same amount of data as chunk#2 of users).

mateo411 | karma 527 | avg karma 1.16 · | 2021-03-26 23:26:46+00:00

Of course! This is a distributed hash table. You can have one hashing function determine the shard and another determine the index in the hash table.

dangoodmanUT | karma 288 | avg karma 1.33 · | 2024-03-27 11:43:01

A few main reasons come to mind:

1. The hash would be an extra column that can be calculated from existing data, wasted storage

2. You effectively have to rewrite the entire database to itself to redistribute, and keeping the DB availabile during this process is _very_ complicated

3. You're putting an extreme load on the DB for a substantial amount of time. This takes away from your DB performance and makes node downtime even more severe

In a distributed DB, you have to remember that the probability your node _doesn't_ have the data you need increases with the size of the cluster, which creates a negative feedback loop for having to rewrite.

reply

lurker458 | karma 108 | avg karma 2.63 · | 2019-01-25 20:29:12+00:00

True, but it's O(n^2) complexity instead of O(n) complexity (1000 shards would lead to 1M reads instead of 4k)

If they're not persisted at the receiving side, how does it handle a crash before committing on the receiving node ? Keeping previous versions indefinitely on sending nodes to permit requesting the old values doesn't work, so this introduces a time bound on how long a crashed node has to recover (granted, this could still mean weeks)

reply

outworlder | karma 14663 | avg karma 3.03 · | 2019-10-23 17:40:52+00:00

> You can set an index template to be used on new indices that match a pattern, which is a very common thing to do

It is, but how can you tell in your template you want to keep shard sizes under 50GB? You can't.

The best thing you can do (as they did) is, based on historical data, update the template, so that the new index will have shards that (hopefully) are under 50GB.

reply

ddorian43 | karma 2409 | avg karma 1.09 · | 2016-07-08 15:53:25

I don't know if there's any. One of the good things is that you don't have to overprovision shards, since each node has 1 shard for each index.

gt565k | karma 1617 | avg karma 3.09 · | 2019-04-19 17:37:06+00:00

Oh fuck me, I didn't even realize they used a single instance (node).

To expand a little bit, the whole point of using multiple shards per index in an ES cluster is so that the shards spread across multiple nodes (servers) and distribute the load (disk i/o) and handle redundancy. ES automatically scales and reshuffles its shards across multiple nodes in the cluster to handle fault-tolerance as well. If one or more nodes go down, the cluster still has all of the data through replica shards etc...

Either way, in this particular case, the data is so small, having 5 shards per index with 50k indices results in 250k shards for 5GBs of data.

5GB / 250k shards = 20kb per shard.

You have shards of size ~ 20kb ... total cluster misconfiguration.

reply

rcoder | karma 2044 | avg karma 3.86 · | 2010-10-07 20:08:40

From the email thread, it sounds like the decision to shard on UID was made mostly to increase locality of data, so that you didn't have to query more than one node to get a single user's data.

There's no silver bullet here. Hashing on insertion order would basically guarantee that writes would favor one node over another, which random hashes would force you to aggregate results from all available nodes for each query.

reply

lmm | karma 42440 | avg karma 1.91 · | 2017-11-30 14:25:09+00:00

Random distribution seems like it would be better for write performance - it means that inserting multiple new values and rebuilding the index can happen in parallel rather than having to coordinate the auto increments. It also naturally has the right properties for sharding as and when you get to the point of needing that.

On the read side it shouldn't make any difference, as there's no reason the rows you want to access at any given point should have any correlation to when those rows were created.

reply

weddpros | karma 801 | avg karma 1.56 · | 2017-12-23 20:13:25+00:00

a single node per shard will handle writes. If you have 20 shards, that's 20x more writes than a single node...

zug_zug | karma 9793 | avg karma 10.55 · | 2021-01-23 04:54:01+00:00

Good point. I guess it would be cool if you could choose the sharding method per-table. I haven't put too much thought into this, but off the top of my head I can imagine a number of different methods suitable for different situations.

mikeyk | karma 2047 | avg karma 20.07 · | 2011-10-01 00:22:37+00:00

Hey Ken, (author here)

There's a 1:1 mapping between the user's hash modulo the number of shards, and the table they're writing to. So, if we had 1000 logical shards, we have 1000 schema/tablespaces, with the same tables in each. And the database's own 'nextval()' feature makes sure never have the same ID twice. Hope that clarified things.

reply

dobe | karma 6 | avg karma 2.0 · | 2015-04-07 07:45:57

I guess you are talking about the plan to have the data of one shard only on one disk (see https://github.com/elastic/elasticsearch/issues/9498)? This does not necessarily mean that you will end up having only one shard per datapath - only if you have just one shard per node. But you are right, the change might lead to unbalanced disk usage in some scenarios, where increasing the number of shards would solve the problem.

There are two options:

1. (Recommended for now) Export the table with COPY TO ( https://crate.io/docs/stable/sql/reference/copy_to.html). Drop the table and then import it again using COPY FROM (https://crate.io/docs/stable/sql/reference/copy_from.html ).

2. Use insert by query (see https://crate.io/docs/stable/sql/dml.html#inserting-data-by-... ) if it is ok for you to copy the whole data to another table (with more shards).

1) is recommended, since it allows for throttling on import time (see https://crate.io/docs/en/latest/best_practice/data_import.ht...) and also does not require a rename of a table, which is currently not implemented but is on our backlog. However i think once ES 2.0 is out we will have table renames and also throttling in insert by query, so option 2) will be recommended then.

Our genreal recommendation to the fixed number of shards limitation is to choose a higher number of shards upfront (number of expected cores matches the most use cases) or to use partitioned tables (https://crate.io/docs/en/latest/sql/partitioned_tables.html) where possible since those allow to change the number of shards for future partitions.

reply

jimpick | karma 285 | avg karma 2.57 · | 2016-05-07 23:22:14+00:00

My understanding that it's per shard. So if you never partition, only one lambda function will run at a time.

Kinesis might be a good solution if you wanted to use this code to run a Disqus-scale service with millions of blogs. Not much would need to change.

reply

chasd00 | karma 6692 | avg karma 2.11 · | 2021-03-26 20:17:38+00:00

can you do hashtable sharding? like have a hashtable point to another hashtable by using something appended to the key? just a thought on a procrastination filled Friday afternoon.

vikiomega9 | karma 439 | avg karma 1.41 · | 2017-09-29 18:55:40

I've always thought about sharding with a caveat of some notion of an implicit index either via timestamps or composite keys.

wejick | karma 365 | avg karma 2.09 · | 2021-01-26 17:26:33+00:00

Honestly I never found the case when this happens, data always falls into 1 shard according to the key. Then comes the concept of shard replica where the shard can live in several nodes and form a redundancy.

However I'm noy sure how usually it's being setup on Postgres

reply