Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Can you please elaborate a little bit more of how you replaced it by postgres? Because it is strange that a single box of postgres in way less powerful instance type would perform the same as your cassandra cluster. This kind of seems that the first solution was way over engineered or was built for different requirements.


sort by: page size:

PostgreSQL/Oracle to Cassandra is really an apples to oranges comparison. Both offer very different ways to solve some similar problems (get your data out of storage) and some very different ones (easily adding/removing servers from a cluster, distributing a cluster across data-centers). Some things are easy to do with PostgreSQL or Oracle, some are not, and some are very expensive and still some are next to impossible.

Cassandra is completely different beast. Like, it has no feature of postgresql and vice-versa.

I definitely could have been more clear on that. Cassandra has so many great properties, and when we made the decision to use Postgres for the large dataset under question was shortly after 0.7 was released, and it took a while to get more stable.

Oh it was a total mistake, Fortunately it wasn't mine. But I did have to support it and migrate away from it.

Cassandra session are quite heavy, We have a large farm that spins up, does stuff and closes down. So thats the first problem. (yes we used kafka to pipeline the data in, and that worked well, but...)

It _used_ to be a very heavy write/read ratio. But as time went on, we needed to read more and more thing concurrently.

Because its "distributed" and basically a glorified token ring system, throughput drops dramatically as load increases.

We are not inserting that much data, just lots and lots of records. We then do a geospatial query later on to pull that data back. postGIS is far better at handling this, compared the datastax graph layer + solr(ie, the full datastax "stack" ).

But honestly, we could have coped with that, if the backups worked. That and shipping code with a 4 year old CVE that could have been easily remedied if they'd bothered to do an automated scan.

Every point release would involved 1-5 days of hard work from me. considering the support cost was > my wage, that stung quite a lot.


Not sure they built it, but I remember reading they replaced Cassandra with HBase(?).

Yeah, I've been following that work for quite a while, it's really impressive! I'm a big fan of Cassandra in general (in fact, I wrote an example application to help teach beginners about it: https://github.com/ericflo/twissandra).

The decision to go with PostgreSQL over Cassandra (or another distributed system like Riak or HBase) was simply because it gave us the most flexibility to change our product quickly while we operate at low scale. And if I'm honest with myself, we're operating at very low scale right now.

In the future as we scale up, Cassandra's distributed counters will be one of the first places we look.


And they did it because of maintenance problems not (just) performance (from the link you provided). But they also said "Postgres was picked over Cassandra for the key-value store because Cassandra didn’t exist at the time. Plus Postgres is very fast and now natively supports KV." Which isn't patronizing like the article OP linked.

http://highscalability.com/blog/2013/8/26/reddit-lessons-lea...


Think my company burned a year on a solution using Cassandra. I think the problem was 1/3 Cassandra and 2/3rds the people that would pick Cassandra amd Java over PostgreSQL and C# or go.

One of my favourite tech overkills I've seen in my career is Cassandra database to store couple million records. Yup. Needless to say, it was later converted to PostgreSQL. The guy who did it is still advertising the fact on his online CV.

Didn't see that before. Still a poor implementation compared to what's available. And I mentioned Instaclustr, so if you're fine with them offering Cassandra then AWS is no different.

I had heard you were using MySQL as database earlier. What made you switch to Cassandra? Just curious.

Thanks, I appreciate the info, it is an area of computing that I just have not had time to get into. I have used Cassandra before, but never taken the time to look at the other offerings.

I mean, for a start Cassandra wasn't available for years after that date, but either way, whatever Postgres was like back then Cassandra was much more of a piece of shit. Smells like implementing what the cool kids were doing to me.

Cassandra is used for some things, Postgres for the rest. Unless they went full Cassandra recently, which is possible. But for many years we ran both.

Thanks, wasn't aware of that one. After a cursory look, it sounds like a better Cassandra?

OK I'll say it... Your friend chose wrong. HORRIBLY wrong. I have a lot of experience with both Postgres and Cassandra and the fact that your friend chose Cassandra despite their data model being relational AND having high consistency requirements shows that they chose Cassandra for the wrong reason(s).

Before you come back saying something about scalability, I've run both databases at way above average scale so what you choose should come down to what you need, which is not what was done in this case (your friend)


We use instance store for Postgres and Cassandra now on i3 and i2 respectively.

can anyone explain to me why someone would want to use cassandra when not handling internet scale stuff?

the team am in uses it, but after many times asking why it was chosen since it seems a poor fit for our uses cases compared to a relational DB, the only justification I was given is a that the cluster is easier to maintain for our ops guys.


Just curious... At what scale have you run Cassandra? We're running a cluster containing around 50 TB of data and we see massive issues around compaction and repair. While I agree Cassandra is impressive, it definitely comes with a lot of management overhead.
next

Legal | privacy