CockroachDB 1.0

nathell | karma 5510 | avg karma 7.03 · 2017-05-10 14:16:06

I read the announcement, got all excited, then clicked "What's inside CockroachDB Core?" and got rewarded with a 404. Ouch! This itches.

orangechairs | karma 832 | avg karma 7.3 · 2017-05-10 14:22:56

[cockroachdb here] Yeah, we're experiencing some caching issues.

cwisecarver | karma 458 | avg karma 2.63 · 2017-05-10 14:21:05+00:00

Cue the comments stating that no one will use this because the name is bad.

Gurrewe | karma 205 | avg karma 2.73 · 2017-05-10 14:22:40+00:00

I love their name, it clearly states what they want to achieve.

toxican | karma 1670 | avg karma 3.65 · 2017-05-10 14:55:04+00:00

Spread disease, infest, smell, make my skin crawl?

jachee | karma 1523 | avg karma 1.56 · 2017-05-10 14:25:24+00:00

Are you suggesting the name is a feature, not a bug?

dsimms | karma 113 | avg karma 1.57 · 2017-05-10 14:29:10+00:00

literally a bug! Also, I enjoyed the beta announcement title: CockroachDB skitters into Beta.

Good work y'all! I hope Cockroach Labs continued success!

reply

notacoward | karma 14833 | avg karma 3.12 · 2017-05-10 14:32:16

Maybe it's both.

Double_a_92 | karma 951 | avg karma 0.75 · 2017-05-10 14:29:09+00:00

But the more important question is "Does it webscale?" /s

mirekrusin | karma 4612 | avg karma 1.97 · 2017-05-10 14:32:44+00:00

Just call it C9DB if you don't want to use the c-word.

cholantesh | karma 1017 | avg karma 1.12 · 2017-05-10 14:38:43

Wouldn't it be C8DB?

JetSpiegel | karma 3479 | avg karma 1.41 · 2017-05-10 19:07:52

Stylized as C8=>DB

s_kilk | karma 4842 | avg karma 3.59 · 2017-05-10 14:35:45+00:00

Maybe they want to keep all the startup webscale dweebs out.

jabl | karma 5571 | avg karma 2.3 · 2017-05-10 15:04:20+00:00

Just like, err, GIMP?

What is it with Spencer Kimball and naming things that gets people so upset? It's not like other company or product names are that good; we're just used to them.

Some high profile tech companies:

- Google: some propellerhead big number joke (hey, I have a Phd and I don't know offhand how big a googleplex is...)

- Alphabet: Really? That out of ideas?

- Amazon: Some hot snake and insect-infested jungle? Why should I go there?

- Microsoft: At least it gives a hint what the company does, but really... (cue the penis jokes)

- Yahoo: WTF, some slang term I've never heard of before..

- Apple: Mmmm, are they organic and locally produced? Oh, they sell computers and phones? WTF?! (Yes, I've heard the backstory about Alan Turing and the poisoned apple which I guess puts me in a very small minority)

reply

Mizza | karma 11954 | avg karma 7.16 · 2017-05-10 15:15:20

I mean the big issue is really the inherent positive/negative association. I wouldn't over-think it.

Amazon - big, marvelous jungle. Yahoo - an expression of Joy. Apple - a delicious sweet fruit.

Cockroach - a disgusting plague insect.

reply

cocktailpeanuts | karma 5438 | avg karma 4.22 · 2017-05-10 15:18:49

There's bad name, and then there's repulsive name.

All the examples you mention fall under "bad name", and it's not even objectively bad, I actually think they're great names, so it's subjective. And NONE of them are repulsive.

Then again, if you insist cockroaches are lovable creatures I have nothing more to say.

reply

jabl | karma 5571 | avg karma 2.3 · 2017-05-10 16:13:07

> All the examples you mention fall under "bad name", and it's not even objectively bad, I actually think they're great names, so it's subjective. And NONE of them are repulsive.

My argument was not that they are good or bad, but rather that we've come to associate positive things with the companies in question, and then we post-hoc come up with explanations why they names are good etc.

> Then again, if you insist cockroaches are lovable creatures I have nothing more to say.

I don't think they are lovable, no. But they are an evolutionary success story; they've been around for hundreds of millions of years, long before humans. And they'll be here after we humans have extincted ourselves in some nuclear holocaust/massive environmental disaster/pick your favorite apocalyptic scenario/.

And if you manage to squish one, there's hordes of em left; just like I'd like my DB to be, so actually I think it's a very good name! :)

reply

Retra | karma 4460 | avg karma 0.93 · 2017-05-11 04:49:17+00:00

There's no post-hoc for some of those names. Some of them were picked because they actually were good. Even something as bland as 'Microsoft' fit right in with the culture that spawned it. And the rest were picked because they were simple, neutral, and had the potential to be iconic brands.

Cockroach is not something someone picks because it is good. That's a name you pick to make a statement that your name doesn't 'technically' matter beyond the fact that it is memorable and associative.

reply

tracker1 | karma 11460 | avg karma 1.11 · 2017-05-10 17:56:27

While a little over the top... I do partly agree. In addition to GIMP, git bugs me a little bit, not enough to ever consider not using it, but the term is more or less offensive depending on the culture you're in.

nathan_f77 | karma 3018 | avg karma 2.27 · 2017-05-10 15:07:12

I think it's one of the worst names I've ever heard. Both because of the bug, but also because of the first syllable. That's not going to stop me from using it if I have to, but I'm certainly less interested in trying it out. And I would be embarrassed to put "Cockroach Expert" on my resume.]

Disclaimer: I've come up with my fair share of bad names. SCM Breeze [1] makes me cringe now.

[1] https://github.com/scmbreeze/scm_breeze

reply

korzun | karma 327 | avg karma 0.68 · 2017-05-10 15:52:10

> I think it's one of the worst names I've ever heard. Both because of the bug, but also because of the first syllable.

You sound like you would get offended by a slight breeze of air.

reply

nathan_f77 | karma 3018 | avg karma 2.27 · 2017-05-10 16:26:42

I'm sorry it comes across that way, but I'm not really "offended" by it. I just think it's a gross name and it gives me a bad feeling. It's not something I can really control.

irfansharif | karma 1122 | avg karma 7.19 · 2017-05-10 19:23:03+00:00

> I would be embarrassed to put "Cockroach Expert" on my resume

well, now that you mention it... https://irfansharif.io/irfan-sharif-resume.pdf

reply

dang | karma 18142 | avg karma 0.25 · 2017-05-10 20:35:24+00:00

Please don't do this here—it's just as bad as the bad thing. Worse in fact, because it's smug too.

cwisecarver | karma 458 | avg karma 2.63 · 2017-05-12 18:49:16

Sorry dang. I know your job is hard enough.

Gurrewe | karma 205 | avg karma 2.73 · 2017-05-10 14:21:47

Congratulations to the team on the relase!

Everything under "The Future" really excites me, especially the geo-partitioning features. That is something that I'm really looking forward to be using!

reply

jazoom | karma 1660 | avg karma 1.6 · 2017-05-10 23:22:26+00:00

That might end up being an enterprise feature though.

dis-sys | karma 726 | avg karma 0.85 · 2017-05-10 14:22:46+00:00

I really like the fact that the CockroachDB team recently did a detailed Jepsen test with Aphyr. The follow up articles from both CockroachDB and Aphyr explaining the findings are very interesting to read. For those who might be interested -

https://www.cockroachlabs.com/blog/cockroachdb-beta-passes-j...

https://jepsen.io/analyses/cockroachdb-beta-20160829

reply

dmix | karma 39707 | avg karma 3.48 · 2017-05-10 14:37:58

> CockroachDB is a distributed, scale-out SQL database which relies on hybrid logical clocks

I was curious what "hybrid logical clocks" meant and found the linked paper a bit over my head. I found this more layman description:

http://muratbuffalo.blogspot.ca/2014/07/hybrid-logical-clock...

Apparently Google used GPS/atomic clocks to keep time synced:

>> To alleviate the problems of large ?, Google's TrueTime (TT) employs GPS/atomic clocks to achieve tight-synchronization (?=6ms), however the cost of adding the required support infrastructure can be prohibitive and ?=6ms is still a non-negligible time.

And CockroachDB created more of a hybrid version that works on commodity hardware.

Distributed systems programming sounds endlessly challenging as you are always balancing trade-offs.

reply

irfansharif | karma 1122 | avg karma 7.19 · 2017-05-10 14:41:45

You might find our post[1] on atomic clocks, rather having to do without them, partially interesting.

[1]: https://www.cockroachlabs.com/blog/living-without-atomic-clo...

reply

socmag | karma 125 | avg karma 0.46 · 2017-05-10 15:33:28+00:00

Hey guys, I'm a fellow developer of distributed systems here.

First of all I think what you are doing is great.

My question is what's the point of clocks at all? The current time is a very subjective matter and I'm sure you know this, the only real time is at the point when the cluster receives the request to commit. Anything else should be considered hearsay.

Specifically the time source of any client is totally meaningless since as you say further in the discussion that client machine times can be off by huge margins.

If you accept that then one has to accept the fact that individual machines within the cluster itself are prone to drift too, although one can attempt to correct for that I appreciate.

Wouldn't you think though that what is more important is that the order is more based on the bucketed time of arrival (with respect to the cluster).

I don't see how given network delays anyone can be totally sure A is prior to B, atomic clocks or not.

What is important is first to commit.

[edit] Yes would love to talk privately about this topic @irfansharif

reply

irfansharif | karma 1122 | avg karma 7.19 · 2017-05-10 15:48:16

Hmm, I'm not sure I completely understand your question or your source of confusion here but unless I'm grossly misunderstanding what you're stating I think we might be conflating a couple of different subjects here. I'm happy to discuss this further over e-mail (up on my profile now) to clear up any doubts on the matter (to the best of my limited knowledge).

moe | karma 13342 | avg karma 3.55 · 2017-05-10 15:57:06

My question is what's the point of clocks at all?

I would highly recommend to read the link by irfansharif. It's probably the best primer ever written on the subject.

reply

andy_ppp | karma 10079 | avg karma 2.53 · 2017-05-11 03:22:50+00:00

Yes, I really enjoyed it!

hinkley | karma 39933 | avg karma 2.46 · 2017-05-10 18:09:39+00:00

When a single system is receiving messages, you pick an observed order of events that meets some definition of fairness, and you stick with it all the way through a transaction. By pretending A happens before B (even if you're not entirely sure) you can return a self-consistent result. And once you have that you can simplify a lot of engineering and make a lot of optimizations, so that the requests aren't just reliable but also timely.

You throw three more observers in and how do you make sure that all of them observe the requests arriving in the same order? Not even the hardware can guarantee that packets arrive at 4 places in the same order, even if the hardware is arranged in a symmetrical fashion (which takes half the fun out of a clustered solution).

reply

philsnow | karma 4551 | avg karma 2.16 · 2017-05-10 22:23:14+00:00

> Specifically the time source of any client is totally meaningless since as you say further in the discussion that client machine times can be off by huge margins.

distributed systems like cockroach shouldn't use the client's conception of current time for anything at all, except possibly to store it (_verbatim_, don't interpret it) and relay it back to the client or to other clients (and let the client interpret it however they want).

reply

dsparkman | karma 218 | avg karma 2.69 · 2017-05-11 10:44:44+00:00

Why not simply have the cluster sync a time between themselves? First node in the cluster gets the time, and as the new nodes come online they set their own internal time via the cluster? So in a world where there is not NTP or atomic clocks the system could continue to operate.

mjibson | karma 860 | avg karma 5.97 · 2017-05-11 11:57:11+00:00

This doesn't take into account when clocks on different systems run at different speeds, or when clocks jump, especially on VMs and cloud instances, which happens all the time.

nvarsj | karma 5248 | avg karma 3.69 · 2017-05-10 20:32:38+00:00

I don't really get why you would build a distributed database with dependency on wall time (unless you're Google and can stick atomic clock HW on every node). Why not use vector clocks? Am I missing something?

irfansharif | karma 1122 | avg karma 7.19 · 2017-05-10 20:36:14+00:00

the section on lock-free distributed transactions on our design document[1] should answer your question, specifically the sub-section on hybrid logical clocks.

[1]: https://github.com/cockroachdb/cockroach/blob/master/docs/de...

reply

nvarsj | karma 5248 | avg karma 3.69 · 2017-05-10 20:40:04+00:00

Thanks! Interesting. http://www.cse.buffalo.edu/tech-reports/2014-04.pdf is the relevant paper on hybrid logical clocks, linked in the faq.

smueller1234 | karma 1071 | avg karma 2.99 · 2017-05-12 17:56:54

It may be a nitpick, but Google don't stick atomic clocks or even just GPS clocks into every node. Just into every data center. The difference means that it's actually perfectly feasible to do that for very many other companies running DCs or just in colos. The big news was how they used the fact (that times are synchronised with an upper limit to how far the clocks in two nodes will diverge) as a very significant optimization in Spanner, one of their distributed databases.

Building a distributed database that can optionally benefit from the same optimization actually makes a great deal of sense. Your average hobbyist won't care, but spending some extra few kilo bucks on hardware in a dc and get big throughput improvements out of your database system is a steal.

reply

DonaldFoss | karma 33 | avg karma 1.1 · 2017-05-10 23:18:53+00:00

The real definition of distributed systems is endlessly challenging as you are always balancing trade-offs.

The CAP theorem still holds, so we pick which 2 out 3 to be strength?s and where to compromise as little as possible. It's a guaranteed 87.3% effective hair loss formula. I find Quiet Riot helps.

reply

rodionos | karma 1453 | avg karma 2.76 · 2017-05-10 14:51:33

> When a node exceeds the clock offset threshold, it will automatically shut down to prevent anomalies.

If you're planning to run on VMware, be prepared to handle rather dramatic system clock shifts. I've seen shifts of up to 5 minutes during heavy backup windows. Not all customers might be willing to have their nodes go down due to system clock / NTP issues.

reply

tschottdorf | karma 95 | avg karma 4.13 · 2017-05-10 14:54:50+00:00

employee here.

Yep, we've also had our share of troubles with noisy clock on cloud environments, so that's something we're very aware of. Further down the road, we're considering a "clockless" mode, which of course isn't clockless, but depends less on the offset threshold: https://github.com/cockroachdb/cockroach/issues/14093

That said, even today, configuring a cluster with a fairly high maximum clock offset is feasible for many workloads.

reply

newman314 | karma 3766 | avg karma 2.01 · 2017-05-10 17:23:31+00:00

Do people not run NTP on their VMs?

Or are you saying that you see heavy clock skew despite having NTP in place?

reply

arjunnarayan | karma 2890 | avg karma 6.27 · 2017-05-10 17:29:06

The latter. NTP only checks and corrects clock offsets every so often. If the "hardware"[1] clock undergoes offset shifts at random times because of VM pauses this won't get fixed immediately until the next NTP sync.

This gets exacerbated in cloud settings where VMs get moved between physical machines, or racks since now it's not just the pause, its that the clock is now pointing to a new hardware time source. [1] in quotes since it's viewed as a single piece of hardware to the software inside the VM.

reply

nvarsj | karma 5248 | avg karma 3.69 · 2017-05-10 20:34:52+00:00

Cassandra user here in AWS. Clock drift is a big problem on VMs. NTP is not aggressive enough in these environments to keep clocks relatively in sync. We regularly had several hundred milli drifts between nodes. As cassandra is extremely clock sensitive, this is a big problem. We ended up using chrony with very aggressive settings to keep things in the sub-ms range for the most part. But it's still possible to get "hiccups" where time will skip. Especially if you reboot a VM.

Vanilla ntp makes assumptions about the hardware clock (that drift is stable) that don't apply to virtualised clocks. Using tsc clocksource may help as well.

reply

newman314 | karma 3766 | avg karma 2.01 · 2017-05-11 01:00:31+00:00

Interesting. I wonder if anyone has documented any best practices for timekeeping in VMs.

VMware has this but it does not appear to have been updated in a while. https://kb.vmware.com/selfservice/microsites/search.do?langu...

reply

andy_ppp | karma 10079 | avg karma 2.53 · 2017-05-11 03:27:53

Why does VM ware emulate the hardware clock rather than giving (possibly slightly debounced) access to the real system clock?

koffiezet | karma 621 | avg karma 1.39 · 2017-05-11 16:07:54+00:00

I suppose for vMotion purposes. A VM is not tied to a physical machine.

gerdesj | karma 6221 | avg karma 2.25 · 2017-05-11 13:04:17+00:00

I run a lot of VMware:

* Set the esxis to have five external sources

* Search fwenable-ntpd (https://www.v-front.de/2012/01/howto-use-esxi-5-as-ntp-serve...) and download the .vib (do a security audit on it - its a zip file I think - to ensure it is what you think it is). Install the .vib which simple adds a ntp daemon option to the firewall ports. This works on v6.5

* Run ntpd on Linux VMs, pointed at the hosts with the local clock fudge as a fallback

* For Windows VMs in a domain, set the AD DC with PDC emulator role to sync its clock to the host via the VM guest tools, leave the rest alone

* On your monitoring system make sure that it has an independent list of five sources and use plugins like ntp-peer for ntpds and ntp-time for Windows (Nagios/Icinga etc)

With the above recipe, ntpq -p <host> shows offsets less than 1 ms across the board for ntpds after stabilising.

reply

newman314 | karma 3766 | avg karma 2.01 · 2017-05-11 19:22:15

Hijacking my own thread:

I don't suppose anyone knows how to make a Windows NTP server permit queries? Googling does not seem to reveal anything insightful. I know how to do this for ntpd but am stuck with dealing with a Windows NTP server right now.

reply

frik | karma 5234 | avg karma 1.61 · 2017-05-10 19:24:50+00:00

@CockroachDB dev:

Has CockroachDB some health status page or REST api? (like Nginx/Apache/Redis/Memcached or a special table like MySQL)

It would be helpful to monitor the CockroachDB database in production.

I see there is some feature inbuild, but it only sends that data home to your server for analytics. (can be turned off) https://www.cockroachlabs.com/docs/diagnostics-reporting.htm...

reply

d4l3k | karma 510 | avg karma 3.38 · 2017-05-11 01:01:35+00:00

CockroachDB has a rather nice admin interface that monitors the health of the cluster.

https://www.cockroachlabs.com/docs/explore-the-admin-ui.html

There's also a lot of rpc end points used for the admin UI that can be queried to get more fine grain info. However, they're primarily for internal use and might change in the future.

https://github.com/cockroachdb/cockroach/blob/master/pkg/ser...

reply

state_machine | karma 1223 | avg karma 8.38 · 2017-05-11 02:04:49

We're still working on integration with other monitoring systems, but the one we've tested the most and documented is prometheus: https://www.cockroachlabs.com/docs/monitor-cockroachdb-with-...

Additionally, you can get some of the same status info on the dashboard using the `node status` command (https://www.cockroachlabs.com/docs/view-node-details.html).

reply

nik736 | karma 1631 | avg karma 2.53 · 2017-05-10 14:27:42

What advantages do I have using Cockroach compared to Postgres, Cassandra, Rethink or MongoDB? (I know that all of them are completely different, that's part of the question)

irfansharif | karma 1122 | avg karma 7.19 · 2017-05-10 14:30:15

We have an comparison page[1] that might potentially be what you're looking for.

[1]: https://www.cockroachlabs.com/docs/cockroachdb-in-comparison...

reply

elmalto | karma 452 | avg karma 7.66 · 2017-05-10 14:31:59

Do you have any performance comparisons as well?

arjunnarayan | karma 2890 | avg karma 6.27 · 2017-05-10 15:11:12+00:00

So performance is complicated. Right now, we’re performance testing CockroachDB regularly, and everything is out in the open. Everything we do is tracked with a GitHub issue with the “perf:” prefix, if you want to follow along.

Here are all our issues that track performance: https://github.com/cockroachdb/cockroach/issues?utf8=%E2%9C%...

Here’s our open source repository where we keep our load generators: https://github.com/cockroachdb/loadgen

A blog post (well, many) are in the works outlining our performance benchmarking. The situation on the ground is changing fast - our performance has improved rapidly over the past months, and each time we sit down to write a blog post, it gets quickly obsoleted. So, trust that we will have a blog post talking about performance very soon.

Anecdotally, our customers are not finding performance to be a bottleneck. I encourage you to set up a Cockroach cluster, and try the various load generators (we've got the standards and a couple other homegrown ones in the repository).

reply

chuckledog | karma 222 | avg karma 2.81 · 2017-05-10 14:33:57+00:00

From the linked website: "CockroachDB provides scale without sacrificing SQL functionality. It offers fully-distributed ACID transactions, zero-downtime schema changes, and support for secondary indexes and foreign keys". Significantly, CockroachDB has had extensive design dedicated to surviving adverse network conditions (see Jepsen references in other posts)

zzzcpan | karma 4245 | avg karma 1.44 · 2017-05-10 14:40:13

They are targeting MySQL/Postgres users, basically a post-CAP approach to RDBMS. But if you can work with eventual consistency, they are definitely not your first choice.

benesch | karma 1161 | avg karma 6.05 · 2017-05-10 15:05:53

[Cockroach Labs engineer here.]

Yes, if very low-latency (i.e., P99 latency sub-5ms) reads and writes are critical to your application, CockroachDB should not be your first choice. That said, one of the primary motivations for CockroachDB is that most existing systems don't handle eventual consistency well. In our experience, most developers will eventually write code that assumes a consistent database, either accidentally or intentionally, because it works most of the time. Dealing with eventual consistency is hard.

Rather than "if you can work with eventual consistency, you should look elsewhere," the sentiment we're trying to cultivate is "if and only if your performance requirements can't work with strong consistency, then you should look elsewhere."

reply

trimbo | karma 4220 | avg karma 4.72 · 2017-05-10 15:36:33

> P99 latency sub-5ms

Has the team cooked up any latency benchmarks for different configurations? E.g. same-rack, same-zone, multi-zone, multi-region?

reply

arjunnarayan | karma 2890 | avg karma 6.27 · 2017-05-10 16:43:26+00:00

Not in a rigorous fashion yet, but we will talk about that soon. I've got a couple other comments on this post talking about performance benchmarking: https://news.ycombinator.com/item?id=14308770 and https://news.ycombinator.com/item?id=14308903

nickpsecurity | karma 14152 | avg karma 1.39 · 2017-05-10 18:57:34+00:00

I support the default on consistence the way you posed it. Main reason is safe-by-default construction has proven more effective for average programmer over decades. The other approach caused many disasters.

zzzcpan | karma 4245 | avg karma 1.44 · 2017-05-10 19:59:44

Meh, this is just pr, nothing is safe-by-default. It's not actually true that people eventually assume strong consistency, because eventual consistency forces certain stricter way of thinking about the state and time, kind of functional, you just can't escape it. It's strong consistency that lets you get sloppy, while making you forget how not simple it is. It only exists inside the system and if you have clients from the outside of the system, like web browsers, you don't have two phase commit protocol on a button click there, so you have to resort to that same functional way of thinking to at least try not to confuse anyone on retry, but it's clearly not the case in the wild. It's just too complex.

I don't think anyone goes back from eventual consistency. It's more appropriate for this asynchronous world, easier and more reliable.

reply

nickpsecurity | karma 14152 | avg karma 1.39 · 2017-05-10 21:11:40+00:00

Google disagreed on that last part. Their bright engineers kept screwing up with eventual consistency. It's why they built Spanner in the first place followed by F1. So did customers of FoundationDB and Cochroach despite free solutions available for eventual consistency.

So, Im not seeing it so clear cut in favor of eventual consistency.

reply

zzzcpan | karma 4245 | avg karma 1.44 · 2017-05-10 22:25:54+00:00

Google never did or bothered to do much work on eventual consistency, they cannot possibly have any experience with it. CRDTs didn't came from them. And you know very well that customers do not care about any of this.

nickpsecurity | karma 14152 | avg karma 1.39 · 2017-05-11 00:56:12+00:00

Their cloud storage said eventually consistent for apps needing a lot of performance when I looked into it. A quick Google on the offerings show pages describing what tradeoffs are available for customers with each option. So, they not only know about it: they implemented it as a product feature. Their internal stores were strongly-consistent with high performance except AdWords on MySQL. That got moved to F1 for strongly-consistent high-performance. Spanner, which F1 uses, then got offered to cloud customers.

After re-reading the F1 paper, my mistake seems to be thinking they relied on eventually-consistent stuff internally. It appears that was just an option for 3rd party developers in their cloud products. Thanks for the peer review as I found some more stuff double checking. :)

reply

misterbowfinger | karma 603 | avg karma 1.88 · 2017-05-10 14:27:56

Can someone give a brief pros/cons between Cockroach DB Core and Google Cloud Spanner?

bpicolo | karma 6988 | avg karma 2.56 · 2017-05-10 14:33:35+00:00

Open source vs not open source. Cockroach still in it's infancy vs spanner. I'm sure there are a variety of things here, but they mostly aim to solve a similar problem with a slightly different approach.

Some of the big details relate to not requiring atomic clocks: https://www.cockroachlabs.com/blog/living-without-atomic-clo...

Here's their comparison chart, though naturally it's biased for things-cockroach-does: https://www.cockroachlabs.com/docs/cockroachdb-in-comparison...

(I guess you can't write to Spanner with SQL? That seems like a big difference. No INSERT/UPDATE?)

reply

dianasaur323 | karma 29 | avg karma 1.38 · 2017-05-10 14:45:48

[cockroachdb here] Thanks for the great response, bpicolo!

greenshackle2 | karma 1836 | avg karma 2.93 · 2017-05-10 15:24:36+00:00

I'm confused. What's the difference between 'Yes' and 'Optional' in the 'Commercial Version' row on the comparison chart? To me 'Yes' suggests there is only a commercial version, but clearly that's not true for CockroachDB.

dianasaur323 | karma 29 | avg karma 1.38 · 2017-05-10 15:39:55+00:00

Thanks for pointing that out! We will fix that to optional for us :)

amq | karma 1148 | avg karma 3.45 · 2017-05-10 14:28:19

Can someone explain how is/can it be better than MariaDB Galera or MySQL Group Replication?

dis-sys | karma 726 | avg karma 0.85 · 2017-05-10 14:31:15+00:00

You can't deploy your MariaDB Galera/MySQL Group Replication systems across the Pacific and then expect it to further scale from there.

sergiotapia | karma 13626 | avg karma 2.88 · 2017-05-10 14:34:19+00:00

Is Cockroach DB intended for just "big-data" companies? Would a small project run really well with Cockroach DB?

Of course a small database probably won't need a lot of the unique features, but is this aiming to replace PG/MySQL in the small/mid-size projects?

reply

dianasaur323 | karma 29 | avg karma 1.38 · 2017-05-10 14:42:12+00:00

[cockroachdb here] Yes! In addition to being highly scalable, CockroachDB also comes with built-in replication. That means that even with a smaller project that hasn't scaled yet, you still get the benefit of a more resilient database.

Also, CockroachDB is super easy to install and get started with!

reply

eblanshey | karma 244 | avg karma 2.1 · 2017-05-11 01:58:15+00:00

I've come across many projects that are easy to get started with, but the main stuff to look for is in the details. Although MySQL might be easy to get into, for example, it takes time to learn the intricacies for query optimizations, and importantly, what to do when SHTF, like when a table gets corrupted.

My question is, in your opinion, what does it take to become proficient in CockroachDB sufficiently enough to be comfortable using it in a high volume, high-uptime-required environment?

Thanks.

reply

zokier | karma 17208 | avg karma 2.54 · 2017-05-10 16:29:35+00:00

I can't speak for others, but at least for me the main attraction of CockroachDB is getting foolproof HA straight out of the box. That is something I think anyone can appreciate regardless of their dataset size.

Note that I haven't actually ran CockroachDB yet, so I can't confirm if it really delivers on that promise, but I'm hopeful.

reply

raybb | karma 9072 | avg karma 5.74 · 2017-05-10 16:39:06

What is HA?

jackweirdy | karma 2947 | avg karma 4.95 · 2017-05-10 16:42:18+00:00

High Availability

sixdimensional | karma 2330 | avg karma 3.03 · 2017-05-10 16:44:57+00:00

High availability :)

the_duke | karma 16666 | avg karma 6.89 · 2017-05-10 16:51:51

High Availability

nebabyte | karma 985 | avg karma 1.75 · 2017-05-11 04:28:29

Here we see a cornerstone of HA: redundancy

zzzcpan | karma 4245 | avg karma 1.44 · 2017-05-10 16:41:04

"getting foolproof HA straight out of the box"

This is a minimal requirement for any modern database.

reply

jhall1468 | karma 2149 | avg karma 2.5 · 2017-05-10 18:29:41+00:00

No, HA straight out of the box is a minimal requirement. Foolproof HA is not a requirement, since neither MySQL or Postgres offer "easy" HA setup.

lima | karma 8270 | avg karma 4.8 · 2017-05-10 21:11:31

Galera cluster works "out of the box", that might be the closest SQL competitor in that regard.

niceperson | karma 22 | avg karma 0.28 · 2017-05-10 14:35:06+00:00

>Cockroach

What were they thinking?

reply

dkersten | karma 11132 | avg karma 2.09 · 2017-05-10 15:52:20

Cockroaches are highly resilient creatures. The name, I assume, is alluding to the goal of this database being a highly resilient system. Whats the problem?

camus2 | karma 2399 | avg karma 1.63 · 2017-05-11 13:46:20+00:00

Cockroaches are disgusting, the name hurts to the product. I'm confident they will acknowledge that fact sooner or later and change the name.

johnwheeler | karma 5016 | avg karma 4.68 · 2017-05-10 14:39:53

I think the name "Cockroach" was a really poor decision from a marketing standpoint. The team intended to convey durability, since cockroaches can live through anything. But when I think of a cockroach, I think, gross, disgusting, etc.

scandox | karma 7873 | avg karma 4.96 · 2017-05-10 14:47:56+00:00

It's memorable. So if the product is really excellent and is needed by customers - then I think it could be a boon.

I mean Mongo has very bad associations for me in terms of childhood taunts and Blazing Saddles...but now the name really relates more to the product than to the original meaning.

reply

gm | karma 1087 | avg karma 2.67 · 2017-05-10 14:53:30

The difference is that "mongo" does not have a universally-known meaning. Cockroaches and known throughout the world, and are disgusting throughout the world.

1_player | karma 8196 | avg karma 4.61 · 2017-05-10 15:02:00+00:00

In Italy "mongo" conjures up the slur used for mentally challenged people. I had a friend smirk when I mentioned MongoDB once.

vec | karma 2000 | avg karma 5.13 · 2017-05-10 16:04:25

I dunno, a reference to the mentally challenged that is more than a little obscene in some circles seems to really capture the essence of MongoDB.

chimeracoder | karma 20702 | avg karma 3.63 · 2017-05-10 15:32:12+00:00

> I mean Mongo has very bad associations for me in terms of childhood taunts and Blazing Saddles...but now the name really relates more to the product than to the original meaning.

The difference is that the word "mongo" is an issue of the same word having different meanings in different dialects. Whereas, with "cockroach", it's the same intended meaning, but with different connotations.

reply

gm | karma 1087 | avg karma 2.67 · 2017-05-10 14:51:55

Agreed. It's a very stupid name. It's memorable for all the wrong reasons: It detracts from the product, and takes attention away from the product to the product name that elicits a visceral disgust in a lot of people. The conversation then is about why the product is called that instead of the product's merits.

crudbug | karma 532 | avg karma 1.56 · 2017-05-10 14:53:30

Yeah, I suggested "RoachDB" a while back. Which sounds more pleasant. There was a github issue for this as well.

deno | karma 1915 | avg karma 1.94 · 2017-05-10 15:06:32

Was about to suggest that before I saw your comment. It’s like that PostgreSQL vs Postgres naming fiasco all over again.

It’s not even the association, which I actually think is great, the name is simply a mouthful. RoachDB rolls off the tongue just so much better.

reply

crudbug | karma 532 | avg karma 1.56 · 2017-05-10 15:48:40+00:00

Yeah .. "SQL" in product name is just lame. Its a culture, I always omit SQL from Postgres.

deno | karma 1915 | avg karma 1.94 · 2017-05-10 19:37:16

On second thought I’m not sure. Everyone will call it RoachDB for short anyway, but the full name has more impact. It shocks, which is a good thing. I was so focused on aesthetics that I didn’t even consider strategy.

They can always spin off “RoachDB” as an enterprise option, if they have any problems with selling it due to name.

reply

tyingq | karma 59102 | avg karma 3.47 · 2017-05-10 22:34:20+00:00

Roach is the remnants of a joint for many of a certain age. https://en.m.wikipedia.org/wiki/Roach_(smoking)

deno | karma 1915 | avg karma 1.94 · 2017-05-12 16:33:04+00:00

Okay, CockDB then :)

wand3r | karma 2501 | avg karma 3.75 · 2017-05-10 14:54:31+00:00

Cockroaches are figuratively unkillable; resilient and survive in nuclear wastelands. I think the team was going for something like: you put your data in here and it will survive basically anything but a multi-continent nuclear war

johnwheeler | karma 5016 | avg karma 4.68 · 2017-05-10 15:28:45

Then how about ShelterDB or ProtectDB or something along those lines? Shelter is actually a pretty kick ass name for a DB IMO

_jal | karma 13880 | avg karma 4.13 · 2017-05-10 15:10:57+00:00

This isn't the place to find out, but I'm curious as to the relative ratio of people who have this reaction.

I don't, at all - I'm vaguely positive towards the name, but in general don't care what things are called, so long as I can remember it. (Although I still maintain that "Paypal" is the stupidest name ever.)

I know people exist who will avoid things simply because they react negatively to the name. How prevalent is this? This isn't about overall product aesthetics/ergonomics/etc., just the name.

reply

johnwheeler | karma 5016 | avg karma 4.68 · 2017-05-10 15:21:32+00:00

You're going to have a hard time convincing nontechnical managements they need to go with Cockroach instead of Oracle.

It's unfortunate the world works that way, but nevertheless, it works that way.

It could be the best database in the world. They did a real disservice to themselves by naming it after a bug people typically associate with filth, disease, and germs.

Would you use something called TurdDB or AssDB?

reply

cocktailpeanuts | karma 5438 | avg karma 4.22 · 2017-05-10 15:55:13

it's not at all unfortunate that the world works that way. What's really unfortunate is the founder of this seemingly great database system has decided to not care about how human psychology works.

Here's an Wikipedia excerpt on cockroach:

> They feed on human and pet food and can leave an offensive odor.[60] They can passively transport pathogenic microbes on their body surfaces, particularly in environments such as hospitals.[61][62] Cockroaches are linked with allergic reactions in humans.[63][64] One of the proteins that trigger allergic reactions is tropomyosin.[65] These allergens are also linked with asthma.[66] About 60% of asthma patients in Chicago are also sensitive to cockroach allergens. Studies similar to this have been done globally and all the results are similar. Cockroaches can live for a few days up to a month without food, so just because no cockroaches are visible in a home does not mean they are not there. Approximately 20-48% of homes with no visible sign of cockroaches have detectable cockroach allergens in dust.[67]

reply

johnwheeler | karma 5016 | avg karma 4.68 · 2017-05-10 16:04:54

This guy has a bad track record for naming software. I just read he is also the creator of The Gnu Image Manipulation Program (GIMP).

https://en.wikipedia.org/wiki/Spencer_Kimball_(computer_prog...

reply

petre | karma 3013 | avg karma 0.68 · 2017-05-10 18:41:54+00:00

There are only two hard problems in Computer Science: cache invalidation and naming things. — Phil Karlton

zzzcpan | karma 4245 | avg karma 1.44 · 2017-05-10 16:16:48

Human psychology is on their side, people just don't seem to understand it. Which is fine, most people are not marketing experts. FYI, it doesn't actually matter how much you dislike the name, but when the time comes to make a choice between a silly negative name but unusual and very memorable because of that and between something boring you have seen just as much, you will trust the silly name more. And since database choice for most people is purely dogmatic one - the name gives Cockroach a slight competitive advantage (at the stage they are in).

korzun | karma 327 | avg karma 0.68 · 2017-05-10 15:45:52

> I think the name "Cockroach" was a really poor decision from a marketing standpoint

Good thing that marketing types have no say when it comes to engineering decisions.

reply

thraway2016 | karma 280 | avg karma 3.73 · 2017-05-10 15:54:27

If you make technology stack decisions based on your feelings rather than what the product actually does, then you shouldn't be employed as a decision-maker.

treehau5 | karma 1703 | avg karma 3.16 · 2017-05-10 16:12:38+00:00

Feelings become reality. People care about what things are called. You just don't care because it doesn't bother you. But if it was a topic you were sensitive about or something you feel is inappropriate, you would feel otherwise. Everyone has their limits of what is going too far. It's almost as if we live in a society with people from different backgrounds. What this really hits on is subjective relativism, and that's dangerous for an entire society to operate on. Maybe Cockroach isn't that bad, maybe it grosses some people out. Fine, not that big of a deal here. What if it was called "BondageDB"?

thraway2016 | karma 280 | avg karma 3.73 · 2017-05-10 16:23:56+00:00

My point was that the job of a technology decision-maker is to make decisions on the actual technical merits of various options, the costs and tradeoffs thereof.

If you are in that role, and you permit the name of a vendor to trump the actual merits of the vendor's product, you should never have been trusted with decision-making authority in the first place, and any competitors who don't harbor your particular emotional hangups will get the better of you, and you won't be long for your position anyway.

Cockroach Labs is not selling to the end-consumer. They're selling to people whose job it is to behave like Vulcans. In this particular market, it doesn't matter what the name is.

reply

skocznymroczny | karma None | avg karma None · 2017-05-11 07:15:18

GIMP?

MarkMc | karma 4150 | avg karma 2.44 · 2017-05-10 17:05:30+00:00

Then nobody should be employed as a decision-maker, because everyone is affected by their feelings.

kyle-rb | karma 934 | avg karma 3.47 · 2017-05-10 16:06:17+00:00

It's sort of like clickbait, but whatever works I guess.

A name like CockroachDB catches my eye the same way Yandex's Cocaine PaaS makes me click to find out what the hell that product actually is.

reply

MarkMc | karma 4150 | avg karma 2.44 · 2017-05-10 17:00:56+00:00

The point is that is doesn't work. Yes, the name makes the product stand out but that benefit doesn't compensate for having your product associated with filth and disease.

There's a reason Toyota has never named a car 'The Cockroach' and a soft drink company has never released 'Cockroach Cola'.

reply

Karunamon | karma 16796 | avg karma 2.2 · 2017-05-10 22:42:32

That reason being that many negative associations with cockroaches are good in the context of databases. Less so with food and cars.

exizt88 | karma 2218 | avg karma 15.19 · 2017-05-10 16:51:40

Yeah, this is going the way of "Testacular" (which is now "Karma") in a year, tops.

raarts | karma 1463 | avg karma 1.49 · 2017-05-10 21:43:50+00:00

I think we will actually see real cockroaches' reputation improved because of this product

unabst | karma 1480 | avg karma 2.55 · 2017-05-10 23:17:02+00:00

I would go with CyanoDB just off the top of this list.

http://www.ranker.com/list/the-top-10-oldest-living-things-o...

But if there were two similar products, one named Roach, I would go with the other without much thought. The name is horrible. As long as they are the king of whatever they do, they can call themselves whatever they want, but handing competitors an automatic naming advantage did not have to be the case.

reply

ericb | karma 7630 | avg karma 3.36 · 2017-05-10 14:41:23

Can Cockroach be plugged into a Rails app where mysql was?

I'd be interested in hearing:

- the backup story

- the replication/failover story

- horizontal scaling story (is it plug and play)

reply

irfansharif | karma 1122 | avg karma 7.19 · 2017-05-10 14:49:03

Not mysql, but we've tested and recommend the Ruby pg driver and the ActiveRecord ORM[1] (CockroachDB supports the PostgreSQL wire protocol). It should be 'plug and play' insofar as you simply point to any node in the running cluster when setting up ActiveRecord::Base.establish_connection.

As for our backup story, our doc page[2] on the subject should shed more details.

[1]: https://www.cockroachlabs.com/docs/build-a-ruby-app-with-coc...

[2]: https://www.cockroachlabs.com/docs/backup.html

reply

arjunnarayan | karma 2890 | avg karma 6.27 · 2017-05-10 14:52:02+00:00

I have ported a MySQL-based ActiveRecord Rails app that was somewhat complicated to Postgres, and then on to CockroachDB. It works pretty well, so I'd give it a go. We're also committed to supporting ActiveRecord via the Postgres connector, so if you run into any bugs, we would do our best to fix them. I am personally invested in ActiveRecord support myself. At this point ORM support on CockroachDB is driven mostly by usage so please try it!

Your other questions are better answered on the blog post, but quickly:

* CockroachDB core comes with a `dump` command to backup your databases. CockroachDB Enterprise has blazingly fast _incremental_ cloud backup and restore, the kind that you might want for a very large deployment.

* Replication is managed under the hood by sharding the data into many ranges that are each 64mb in size. Each range is replicated using Raft, and if a node goes down, the other replicas scattered across the cluster seamlessly take over and upreplicate a new replica to "heal" the cluster.

* The horizontal scaling is indeed plug and play - just add more nodes to the cluster and they'll automatically rebalance replicas across the cluster with no downtime and no additional configuration.

reply

brightball | karma 15608 | avg karma 3.36 · 2017-05-10 14:41:54

How does it compare to Couchbase with it N1QL?

ansible | karma 5563 | avg karma 2.43 · 2017-05-10 19:02:04

The main difference is the consistency model:

https://blog.couchbase.com/10-things-developers-should-know-...

Whereas CockroachDB aims to be strongly consistent. This makes life for the application developer much easier.

reply

Perignon | karma -13 | avg karma -1.18 · 2017-05-10 14:45:43+00:00

Name still sucks and is disgusting af.

socmag | karma 125 | avg karma 0.46 · 2017-05-10 14:46:59+00:00

Clocks are meaningless under load.

The higher frequency the transactions the more you get into quantum physics.

In reality, nobody cares if T-Mobile debited your account 0.01ms before WalMart.

[edit] what is important is isolation and consistency of the transactons.

reply

socmag | karma 125 | avg karma 0.46 · 2017-05-10 14:55:18+00:00

Instead of just downvoting, how about refuting my claim?

I'm seriously curious what is the disagreement. These guys already established atomic clocks are unnecessary. Very interested in which use cases require them.

reply

theptip | karma 9431 | avg karma 3.56 · 2017-05-10 15:23:47+00:00

Read and learn: https://research.google.com/archive/spanner.html

Serializability is all about ensuring a single consistent ordering of events. Lots of algorithmic shortcuts you can take if all your nodes' clocks are precisely in sync.

reply

socmag | karma 125 | avg karma 0.46 · 2017-05-10 15:46:10

I'm very familiar with the literature since I'm a distributed database developer.

If you investigate high frequency trading you will understand that the quantum phenomena that I'm talking about is not just me high on mushrooms but a real world thing.

The only "time" relevant is the time when the cluster agrees an atomic, isolated transaction is time to commit from its own perspective.

reply

nocman | karma 1404 | avg karma 2.11 · 2017-05-10 15:41:01+00:00

Am I wrong in remembering that the HN guidelines used to say that you should not downvote someone's comment simply because you disagreed with it?

I went looking, and I don't see that in the current guidelines. I could be wrong about it being there before, but I was almost certain that it was at one point.

Seems like it used to say that you should only downvote comments that you think don't contribute anything of value to the conversation.

Just curious, because it seems to me that for quite a while now there have been a lot of comments that appear to get downvoted just because people don't agree with what the person said (and often there are no responses to counter, the person just gets downvoted).

reply

dec0dedab0de | karma 7735 | avg karma 3.27 · 2017-05-10 15:55:29+00:00

I think you're thinking of somewhere else. The up/down votes are a way of agreeing or disagreeing without cluttering up the comments with a bunch of "me toos" or "nuhuhhs"

nocman | karma 1404 | avg karma 2.11 · 2017-05-10 15:57:44+00:00

I'm certain that I'm not thinking of somewhere else. I'm completely open to the possibility that I just remember it wrong, but I'm sure that it was HN that I was thinking of, and not another site.

tw04 | karma 17157 | avg karma 5.0 · 2017-05-10 16:40:09

You're thinking of Reddit.

cachvico | karma 574 | avg karma 2.26 · 2017-05-10 17:09:56+00:00

Reddit, we did it again.

nocman | karma 1404 | avg karma 2.11 · 2017-05-10 23:03:36

No, you did not "do it again". I almost never read anything on Reddit. So that was not the source of confusion.

nocman | karma 1404 | avg karma 2.11 · 2017-05-10 23:02:42+00:00

Nope, I know I'm not thinking of Reddit (as I said in reply to another comment, I've spent almost no time on Reddit, and would not have seen their guidelines at all).

tyleraldrich | karma 217 | avg karma 2.71 · 2017-05-10 16:47:16+00:00

Does this seem familiar? https://www.reddit.com/wiki/reddiquette

nocman | karma 1404 | avg karma 2.11 · 2017-05-10 23:00:54

Nope. I've never spend much time on reddit, and I'm certain I've never seen that page before. Perhaps I am thinking of comments other people made on HN in the past (who thought that the policy was as I described).

dang | karma 18142 | avg karma 0.25 · 2017-05-10 18:11:43+00:00

There's never been such a policy on HN; you remembered it wrong, as have many before you. It's the same phenomenon as attributes pithy quotes to Einstein and makes Canadians think we have Miranda rights: people hang memories on the nearest pre-existing hook in the brain.

Probably the most that I've written about downvotes on HN is at https://news.ycombinator.com/item?id=9440694.

reply

soperj | karma 7260 | avg karma 2.08 · 2017-05-10 16:21:15+00:00

me too.

tsuraan | karma 966 | avg karma 3.04 · 2017-05-10 16:57:31+00:00

I don't have a full history of the guidelines, but the canonical link on this tends to be [0]. About nine years ago, PG thought downvote to disagree was perfectly reasonable. I don't think there's been any official change since then.

[0] - https://news.ycombinator.com/item?id=117171

reply

grzm | karma 13860 | avg karma 2.67 · 2017-05-10 17:08:31+00:00

While it's not in the guidelines, 'pg is on record years ago saying that downvoting for disagreement is okay, given that people upvote for agreement:

https://news.ycombinator.com/item?id=117171

That said, I think many on HN do think that downvotes should be reserved for uncivil or unsubstantive comments as they don't contribute to the conversation. Some will still downvote for disagreement or for other reasons.

I think it's best not to let it bother you or worry about it because there's not much you can do about it, other than contribute as civilly, substantively, charitably, and in good faith.

reply

johnhenry | karma 2036 | avg karma 2.45 · 2017-05-10 15:30:38

Actually, WalMNart cares and so does T-mobile. You probably care too if you stop and think for a bit...

The concern here isn't just order of transactions, but also synchronization. For instance, WalMart might charge you twice for a transaction if it appears to have happened at different times when it arrives in different data centers.

Also, the comment "The higher frequency the transactions the more you get into quantum physics." isn't relevant here. This is more in the realm of relativity than quantum physics. Even so, we aren't currently at a point where we need to worry about transactions happening at relativistic speeds.

reply

socmag | karma 125 | avg karma 0.46 · 2017-05-10 15:41:59+00:00

No I disagree since you left out atomicity from your argument.

WalMart cannot commit if someone else committed previously, they have to try again.

Atomicity is precisely what it is. There is no fuzziness there, you either do it or you don't.

The problem with current database designs is the idea of BeginTransaction, that function is the core of the problem.

"Transactions" in the real world are NOT completed until everyone agrees.

Consider you yourself enter into a transaction with your landlord, you BeginTransaction..

However during the negotiation you choose to disagree and back away from the deal.

That transaction, even though it took three months to decide was rejected (by either party).

The only "transaction" is the committed transaction.

reply

johnhenry | karma 2036 | avg karma 2.45 · 2017-05-10 23:45:37

Ah, I think I see where you are confused -- your arguments seem to make more sense when dealing with a single, local database. The idea here is that you want to achieve atomicity, but you need to do it across multiple distributed databases and you want to have a system who's components have exactly the same time in order to ensure consistence across each database.

Attempting to extend the landlord example... let's say that I'm your landlord and you have to pay me £1000 each month. You send the bank a message telling the to pay me the money. The bank may make several copies of that message and keep it around for their own reasons. Now, let's say that there are employees at that bank whose job it is to do go through all copies of all messages and make sure what they say is done. If they find a message from several months ago saying "transfer £1000 from you to me this month" and are somehow oblivious to which month it is, they may transfer an additional one thousand pounds even if it's already happened. It's not an exact analogy, but...

reply

dang | karma 18142 | avg karma 0.25 · 2017-05-10 17:09:20+00:00

We detached this subthread from https://news.ycombinator.com/item?id=14308521 and marked it off-topic.

ccallebs | karma 373 | avg karma 2.54 · 2017-05-10 15:00:49

First, this is awesome! Congrats to the team for reaching this milestone.

Secondly, I think the name is memorable and conveys exactly what it should. If I were ever on an engineering team that chose not to use CockroachDB due to being "grossed out" by the name, I wouldn't be on that engineering team for long. Perhaps someone can explain the knee-jerk reaction to it for me.

reply

nathan_f77 | karma 3018 | avg karma 2.27 · 2017-05-10 15:03:27

I hate the name, and that does put me off from experimenting with it. But I would use it if it was the right tool for the job.

gervase | karma 1979 | avg karma 5.69 · 2017-05-10 15:11:04+00:00

I might be an interesting case.

I had previously been a big supporter of their name, agreeing with some other posters that it promotes the durability of the system.

However, after a move last year, I was forced to live with cockroaches for approximately 6 months, after never encountering them prior to that.

Since then, I've completely switched camps. Can't see the name without being skeeved out. The reality of cockroaches is so absolutely repulsive that it completely changed my view 180º.

I moved out of that place in November, and haven't seen once since; I'm curious if my aversion will fade over time.

reply

5ourpu55 | karma 0 | avg karma 0.0 · 2017-05-10 15:02:21+00:00

I'm amazed at how visceral a reaction the term "cockroach" receives haha

Though I AM a country kid, so maybe I'm just a bit less squeamish when it comes to these things.

reply

anthonylebrun | karma 33 | avg karma 3.67 · 2017-05-10 15:08:47

Since there's a little side riff about the name going on I thought I'd throw in my 2 cents. Personally I love the name. I think it does a great job of conveying the spirit of the project and provides unlimited pun opportunities. Plus it's memorable, just like a real life roach encounter. Unfortunately I'm sure some people will discriminate against your DB on the basis of name alone. That's ludicrous, but that's our species for ya.

dboreham | karma 16160 | avg karma 2.32 · 2017-05-10 15:12:06+00:00

At first when I saw yet more name comments on this thread I felt disappointment that people can't leave the subject alone.

But then I realized that as someone who doesn't care about the name, even positively enjoys it, I have a competitive advantage over those people.

Now I feel good again.

reply

api | karma 31631 | avg karma 2.2 · 2017-05-10 15:15:20

Choosing technologies based on first-hand review and first principles rather than things like Gartner magic quadrants, big company brand recognition, feature lists, and "serious" sounding names is a competitive advantage that startups often have over big businesses. The latter are forced by their procurement departments and other forces to use old, inferior, and more costly technology.

On the flip side though if I were in charge of CockroachDB I would look at doing something about the name. Maybe rename it something like "Resilient" as part of the "exit from beta" milestone. It's going to be a serious liability for them selling to the kinds of customers I described above, and unfortunately that's where most of the money is in these devops/infrastructure markets. The key to success is to make a superior product and then figure out how to sell it to pointy haired bosses. The latter often means making it look more boring than it actually is.

Fun factoid: scientists sometimes do this with grant proposals. I've had two scientists independently tell me that they often take cool, fascinating research proposals and "make them boring" to sell them to bureaucrats. "You have to hide all the interesting stuff and make it sound like you are doing boring incremental research. If you talk about anything 'revolutionary' you will never get funded."

reply

thraway2016 | karma 280 | avg karma 3.73 · 2017-05-10 15:36:10

I see it as technical people on HN who appreciate the metaphor, versus marketing/business people who can only think of "image".

It's to expected with the massive infestation of HN by suits and khakis in the last few years.

reply

ixtli | karma 2854 | avg karma 3.36 · 2017-05-10 18:18:47

I think the problem is worse: marketing / business people have convinced the worker that this surface level analysis is all we can expect of anyone. As said by other commenters: if the name of the DB solution influences your choice then you're probably gonna get what you deserve.

(Within reason. Someone on here actually said this argument is reasonable to have "because what would you do if they named it 'n-word'DB." Seriously.)

reply

matt4077 | karma 12676 | avg karma 3.15 · 2017-05-11 02:05:01+00:00

It appears to me that "marketing/business" people are simply stereotyped in this thread, because surely the complaints come mostly from "tech people".

It's the classic case of everyone saying "I think it's great buy <somebody> will complain". Which ends in mindless mediocracy.

Go CockroachDB!

reply

bogomipz | karma 8657 | avg karma 1.34 · 2017-05-10 15:54:50

I agree with all your points. That being said MongoDB, Aerospike and Hadoop have all gotten good traction even with their slightly silly names.

beamatronic | karma 2306 | avg karma 1.32 · 2017-05-10 17:36:02+00:00

To me, cockroaches are such an unbelievably negative association that I don't think I could get over the name and work with this product, because I wouldn't want to be saying cockroach all the time.

Same thing if your database was called BedBug.

reply

BenjiWiebe | karma 2161 | avg karma 1.13 · 2017-05-10 18:12:28

To me, cockroaches aren't disgusting. And yes, I have used an outhouse in a 3rd world country where cockroaches were swarming up and out... But they just don't disgust me.

jhall1468 | karma 2149 | avg karma 2.5 · 2017-05-10 18:21:10+00:00

> I don't think I could get over the name and work with this product

That puts everyone competing with you at a HUGE competitive advantage. Making technical decisions based on the name of a product is the worst type of decision making.

reply

jondubois | karma -7 | avg karma -0.0 · 2017-05-10 16:24:12+00:00

As the creator of a moderately popular open source project, I can attest that the name of the project is very important.

A common problem for open source projects is that the name is not recognizable enough (e.g. too technical) or too generic (e.g. a simple English word which makes is heard to search on Google).

In this case the name evokes negative emotions of fear and disgust which are not what you want to associate with a database.

reply

retox | karma 1890 | avg karma 2.51 · 2017-05-10 16:28:33

I wondered for a time if the action movie XXX[0] chose that name because it would be very hard to search online.

[0] http://m.imdb.com/title/tt0295701/

reply

beamatronic | karma 2306 | avg karma 1.32 · 2017-05-10 17:33:36+00:00

Back in 2000, I used to enjoy an online streaming radio station called echo.com, and as a sort of reward for listening, you could earn Amazon gift certificates.

I tried googling for "Amazon echo gift certificates" but I couldn't quite find what I was looking for.

I miss AltaVista.

reply

tyingq | karma 59102 | avg karma 3.47 · 2017-05-10 22:31:33

There is a tools menu on Google search where you can set a date range. Not perfect, but it helps.

matt4077 | karma 12676 | avg karma 3.15 · 2017-05-11 02:09:07+00:00

But you'd want your airline named "Virgin" and your morning-after pill named "Plan B"?

The name is, indeed, evocative. Good names don't have to universally convey "positive" emotions.

reply

efsavage | karma 2562 | avg karma 5.34 · 2017-05-10 16:55:37

It's a bad name because this topic will come up every time it's discussed, forever. It's a distraction from other relevant issues like new features or how it performs.

sillysaurus3 | karma 15505 | avg karma 3.87 · 2017-05-10 18:53:19+00:00

Strangely, it seems to be helping them. Usually whenever there's an excellent product/article featured on HN, there's not much to say, so there are very few comments. CockroachDB seems like an excellent product, yet the firestorm about their name is fueling discussion, which amusingly might be leading to more upvotes from people who dislike that they're being discriminated against based on their name. It's counterintuitive internet behavior at its finest, similar to everyone complaining that Soylent was a terrible name.

efsavage | karma 2562 | avg karma 5.34 · 2017-05-11 13:51:41+00:00

I hope it is excellent and advances the state of the art, but it won't reach it's full potential until it has a name people can use when talking to users, customers, and board members.

"Well first we collect all of the data in the Epidemic schema, run it through the Apocalypse pipeline to transform it into something that our Extinction servers can handle, and finally store it in CockroachDB."

reply

wmfiv | karma 262 | avg karma 3.36 · 2017-05-10 15:17:00

Are there published benchmarks for multi-key operations and more complex SELECT statements? I apologize if I missed them.

I'm trying to determine whether there's a place for Cockroach within what I think are the constraints in the database space.

* Traditional SQL Databases

  - Go to solution for every project until proven otherwise.

  - Battle tested and unmatched features.

  - Hugely optimized with incredible single node performance.

  - Good replication and failover solutions.

* Cassandra

  - Solved massive data insert and retention.

  - Battle tested linear scalability to thousands of nodes.

  - Good per node performance.

  - Limited features.

It seems like many new databases tend to suffer from providing scale out but relatively poor per node performance so that a mid-size cluster still performs worse than a single node solution based on a traditional SQL database.

And if you genuinely need huge insert volumes, because of the per node performance you'd need an enormous cluster whereas Cassandra would deal with it quite comfortably.

reply

arjunnarayan | karma 2890 | avg karma 6.27 · 2017-05-10 15:28:58

[Cockroach Labs engineer here working on performance benchmarking]

We have load generators for YCSB (just raw key-value ops in a firehose) and TPC-H (very complicated read-only queries) running right now, and we're about to start running TPC-C queries (moderately complex queries in large volume) as well. You can follow along on our progress here: https://github.com/cockroachdb/loadgen

In the context of your dichotomy, we want to bridge that gap. We want the linear scalability of your second group along with the full feature-set of the first group.

We will be publishing our performance numbers, but we haven't so far because the product has improved rapidly, and our numbers have been quickly obsoleted, but rest assured, we will be publishing a series of blog posts very soon. Anecdotally, our beta customers are not finding that they need very many more CockroachDB nodes than their existing database solutions, even with something as high-performant (but inconsistent) as Cassandra.

reply

wmfiv | karma 262 | avg karma 3.36 · 2017-05-10 16:59:56+00:00

That's great. Thanks for the response and I'll keep an eye out for the blogs.

deferredposts | karma 16 | avg karma 2.67 · 2017-05-10 15:20:04

In a couple of years, I suspect that they will rebrand their name to just "RoachDB". It conveys the same meaning, while not being that awkward to discuss with users/clients

chimeracoder | karma 20702 | avg karma 3.63 · 2017-05-10 15:30:27

> "RoachDB". It conveys the same meaning, while not being that awkward to discuss with users/clients

"Roach" has some other connotations as well[0], which may not help with selling to larger and enterprise clients.

[0] https://www.urbandictionary.com/define.php?term=roach

reply

api | karma 31631 | avg karma 2.2 · 2017-05-10 15:24:34

About nine months ago we made the decision to go with RethinkDB for our infrastructure in place of PostgreSQL (at least for live replicated data), but if this existed at the time we'd have seriously taken a look. We're pretty happy with RethinkDB but I plan on still taking a look at this so we have a backup option.

dianasaur323 | karma 29 | avg karma 1.38 · 2017-05-10 15:41:44+00:00

[cockroachdb here] We are big fans of RethinkDB, but also glad to hear that you'll explore CockroachDB. Let us know how it goes, and definitely file any issues / feature requests in our GitHub repo!

wtf_is_up | karma 233 | avg karma 0.8 · 2017-05-10 15:33:17+00:00

Does CockroachDB have a streaming API a la RethinkDB changefeeds? This is a killer feature, IMO.

arjunnarayan | karma 2890 | avg karma 6.27 · 2017-05-10 15:39:51+00:00

Not yet, but it's on our roadmap.

ralusek | karma 4542 | avg karma 2.77 · 2017-05-10 22:36:23

Just out of curiosity, do you mind elaborating a little bit on why not? It strikes me as something that would be very easy to implement in a database, is there a reason why so few databases have a mechanism to do this?

If it's about maintaining an open connection in order to notify the client, that part makes sense, but at the very least the changefeed itself should be toggleable and easy to query in any DB.

reply

state_machine | karma 1223 | avg karma 8.38 · 2017-05-11 02:22:33+00:00

One of the challenges for us in implementing something like LISTEN/NOTIFY comes from our distributed nature: since a table is likely broken up across many nodes, you somehow need to aggregate changes from all of them back into a single change feed wherever the listener is, and in such a way that it doesn't create a single point of failure.

MichaelBurge | karma 2729 | avg karma 2.09 · 2017-05-10 15:36:25

It probably scales but how is the performance? If I need to load a couple billion rows and do a dozen joins in some analytics, is that one machine, a dozen, or 100?

Is it more for web apps, analytics, or what? When would I consider switching from e.g. Postgres to CockroachDB?

reply

arjunnarayan | karma 2890 | avg karma 6.27 · 2017-05-10 15:50:50+00:00

[Cockroach Labs engineer here]

For just a couple billion rows and a dozen joins, a single node will suffice (with the caveat that you really want at least 3 nodes because CockroachDB is built for replication and fault-tolerance and you're not getting that with a single node cluster), but you'll get linear speedup as you add more machines.

Your performance on a single node should be on the same order of magnitude as doing this in Postgres right now. We are rapidly closing that gap, and intend to close it completely for TPC-H style queries, while retaining the linear performance speedup with more nodes.

The reason this gap isn't already closed is we've been focused on transactional performance in distributed, fault-tolerant situations rather than analytics performance, for 1.0. There are lots of optimization low hanging fruit that we haven't focused on in analytics scenarios that we are just getting started on.

reply

MichaelBurge | karma 2729 | avg karma 2.09 · 2017-05-10 16:47:40+00:00

Thanks for your response. It sounds like CockroachDB might be an alternative to setting up an RDBMS for read replication once you need many connections.

gflarity | karma 7 | avg karma 0.58 · 2017-05-10 17:41:56+00:00

Hi Cockroach Labs Engineer here,

On the feature FAQ joins are describe as 'functional' which doesn't inspire a lot of confidence but maybe it's just a perception thing. What exactly does functional mean?

A SQL db without joins sounds a lot like just a NOSQL db with a familiar query dialect.

reply

arjunnarayan | karma 2890 | avg karma 6.27 · 2017-05-10 18:02:08

If you are using Joins in an OLTP setting, everything should work absolutely as you might expect.

"Functional" is our caveat that if you run Joins across your data in an OLAP setting, it will work, but it may not be the most performant Join possible. For example, our query planner does not currently plan Merge-joins even if the appropriate secondary indices exist. So after a point (joining ~billions of rows of data) it no longer is as performant as it could be. Now we expect to roll out this particular fix within 6 months. However, optimizing 4 or 5-way nested Joins in OLAP-cube style settings isn't something we're going to be performant at for years. We need a lot more infrastructure built up before we start solving the kinds of problems revealed by, say, the Join Order Benchmark paper (http://www.vldb.org/pvldb/vol9/p204-leis.pdf).

reply

therealmarv | karma 4384 | avg karma 2.58 · 2017-05-10 15:38:50+00:00

Does this work theoretically interplanetary (just asking because for science) ?

arjunnarayan | karma 2890 | avg karma 6.27 · 2017-05-10 15:47:52

No. Once your latency goes beyond single digit seconds, performance will probably collapse. Too many subsystems would time out. in theory it could be made to work (with terrible performance, and extremely long commit-waits due to having to wait until the remote planets get back to you), but I wouldn't architect a planetary spanning distributed database this way. We probably would have to go back to the drawing board and start from scratch.

therealmarv | karma 4384 | avg karma 2.58 · 2017-05-10 15:52:19

Thanks for the long answer. Much appreciated. The question came into my mind when reading some of graphics and specifications.

pishpash | karma 1462 | avg karma 0.67 · 2017-05-11 15:58:10+00:00

You'd need to give up on consistency, because there is no such thing when the time of communication is long compared to interval of events. In the long run, ACID is dead.

state_machine | karma 1223 | avg karma 8.38 · 2017-05-10 15:59:29+00:00

[cockroachdb employee]

Short answer: no.

Long answer: at their closest earth and mars are about 54m km apart, at the furthest it's over 400, with an average of around 225m km, so theoretical latency is varies between 4 and 24 minutes.

CockroachDB uses synchronous replication via raft, and that latency would cause problems as would some other setting like our window sizes and their interaction with timeouts.

reply

politician | karma 6427 | avg karma 2.21 · 2017-05-10 16:29:36+00:00

> CockroachDB uses synchronous replication via raft

Deep space aside, I wish the announcement just said that! I came back to HN for insight into the paragraph about "multi-active availability... an evolution in high availability from active-active replication". Marketing... sometimes... I tell you what.

reply

state_machine | karma 1223 | avg karma 8.38 · 2017-05-10 16:59:44+00:00

Whoops, sorry about that. If you're looking for more on how it works (rocksdb, raft, distributed transactions across multiple raft groups, etc), you might find the design doc interesting: https://github.com/cockroachdb/cockroach/blob/master/docs/de...

phlakaton | karma 990 | avg karma 2.28 · 2017-05-10 19:56:31+00:00

More practically, I note this from Cockroach's document on "Deploy > Recommended Production Settings":

"When replicating across datacenters, it’s recommended to use datacenters on a single continent to ensure performance (inter-continent scenarios will improve in performance soon). Also, to ensure even replication across datacenters, it’s recommended to specify which datacenter each node is in using the --locality flag. If some of your datacenters are much farther apart than others, specifying multiple levels of locality (such as country and region) is recommended."

In short, IIUC, even _planetary_ deployment doesn't come for free (yet). Perhaps I'm just not well-enough versed yet in how people deal with globally-distributed databases, but I'd love to see the docs dig into this a bit more: practical limits of cluster deployment, recommended strategies and tools (if any) to replicate data between clusters, etc.

reply

pishpash | karma 1462 | avg karma 0.67 · 2017-05-11 15:52:19

Distance aside, you don't even have a semblance of simultaneity with multiple reference frames, so not even theoretically possible.

v_elem | karma 5 | avg karma 5.0 · 2017-05-10 15:49:21+00:00

It looks like there is still no mechanism for change notification, which in our particular case is the only missing feature that prevents using it as a postgresql replacement.

Does anybody know if this feature is planned in the short or medium term ?

https://github.com/cockroachdb/cockroach/issues/6130 https://github.com/cockroachdb/cockroach/issues/9712

reply