What is it with Spencer Kimball and naming things that gets people so upset? It's not like other company or product names are that good; we're just used to them.
Some high profile tech companies:
- Google: some propellerhead big number joke (hey, I have a Phd and I don't know offhand how big a googleplex is...)
- Alphabet: Really? That out of ideas?
- Amazon: Some hot snake and insect-infested jungle? Why should I go there?
- Microsoft: At least it gives a hint what the company does, but really... (cue the penis jokes)
- Yahoo: WTF, some slang term I've never heard of before..
- Apple: Mmmm, are they organic and locally produced? Oh, they sell computers and phones? WTF?! (Yes, I've heard the backstory about Alan Turing and the poisoned apple which I guess puts me in a very small minority)
There's bad name, and then there's repulsive name.
All the examples you mention fall under "bad name", and it's not even objectively bad, I actually think they're great names, so it's subjective. And NONE of them are repulsive.
Then again, if you insist cockroaches are lovable creatures I have nothing more to say.
> All the examples you mention fall under "bad name", and it's not even objectively bad, I actually think they're great names, so it's subjective. And NONE of them are repulsive.
My argument was not that they are good or bad, but rather that we've come to associate positive things with the companies in question, and then we post-hoc come up with explanations why they names are good etc.
> Then again, if you insist cockroaches are lovable creatures I have nothing more to say.
I don't think they are lovable, no. But they are an evolutionary success story; they've been around for hundreds of millions of years, long before humans. And they'll be here after we humans have extincted ourselves in some nuclear holocaust/massive environmental disaster/pick your favorite apocalyptic scenario/.
And if you manage to squish one, there's hordes of em left; just like I'd like my DB to be, so actually I think it's a very good name! :)
There's no post-hoc for some of those names. Some of them were picked because they actually were good. Even something as bland as 'Microsoft' fit right in with the culture that spawned it. And the rest were picked because they were simple, neutral, and had the potential to be iconic brands.
Cockroach is not something someone picks because it is good. That's a name you pick to make a statement that your name doesn't 'technically' matter beyond the fact that it is memorable and associative.
While a little over the top... I do partly agree. In addition to GIMP, git bugs me a little bit, not enough to ever consider not using it, but the term is more or less offensive depending on the culture you're in.
I think it's one of the worst names I've ever heard. Both because of the bug, but also because of the first syllable. That's not going to stop me from using it if I have to, but I'm certainly less interested in trying it out. And I would be embarrassed to put "Cockroach Expert" on my resume.]
Disclaimer: I've come up with my fair share of bad names. SCM Breeze [1] makes me cringe now.
I'm sorry it comes across that way, but I'm not really "offended" by it. I just think it's a gross name and it gives me a bad feeling. It's not something I can really control.
Everything under "The Future" really excites me, especially the geo-partitioning features. That is something that I'm really looking forward to be using!
I really like the fact that the CockroachDB team recently did a detailed Jepsen test with Aphyr. The follow up articles from both CockroachDB and Aphyr explaining the findings are very interesting to read. For those who might be interested -
Apparently Google used GPS/atomic clocks to keep time synced:
>> To alleviate the problems of large ?, Google's TrueTime (TT) employs GPS/atomic clocks to achieve tight-synchronization (?=6ms), however the cost of adding the required support infrastructure can be prohibitive and ?=6ms is still a non-negligible time.
And CockroachDB created more of a hybrid version that works on commodity hardware.
Distributed systems programming sounds endlessly challenging as you are always balancing trade-offs.
Hey guys, I'm a fellow developer of distributed systems here.
First of all I think what you are doing is great.
My question is what's the point of clocks at all? The current time is a very subjective matter and I'm sure you know this, the only real time is at the point when the cluster receives the request to commit. Anything else should be considered hearsay.
Specifically the time source of any client is totally meaningless since as you say further in the discussion that client machine times can be off by huge margins.
If you accept that then one has to accept the fact that individual machines within the cluster itself are prone to drift too, although one can attempt to correct for that I appreciate.
Wouldn't you think though that what is more important is that the order is more based on the bucketed time of arrival (with respect to the cluster).
I don't see how given network delays anyone can be totally sure A is prior to B, atomic clocks or not.
What is important is first to commit.
[edit] Yes would love to talk privately about this topic @irfansharif
Hmm, I'm not sure I completely understand your question or your source of confusion here but unless I'm grossly misunderstanding what you're stating I think we might be conflating a couple of different subjects here. I'm happy to discuss this further over e-mail (up on my profile now) to clear up any doubts on the matter (to the best of my limited knowledge).
When a single system is receiving messages, you pick an observed order of events that meets some definition of fairness, and you stick with it all the way through a transaction. By pretending A happens before B (even if you're not entirely sure) you can return a self-consistent result. And once you have that you can simplify a lot of engineering and make a lot of optimizations, so that the requests aren't just reliable but also timely.
You throw three more observers in and how do you make sure that all of them observe the requests arriving in the same order? Not even the hardware can guarantee that packets arrive at 4 places in the same order, even if the hardware is arranged in a symmetrical fashion (which takes half the fun out of a clustered solution).
> Specifically the time source of any client is totally meaningless since as you say further in the discussion that client machine times can be off by huge margins.
distributed systems like cockroach shouldn't use the client's conception of current time for anything at all, except possibly to store it (_verbatim_, don't interpret it) and relay it back to the client or to other clients (and let the client interpret it however they want).
Why not simply have the cluster sync a time between themselves? First node in the cluster gets the time, and as the new nodes come online they set their own internal time via the cluster? So in a world where there is not NTP or atomic clocks the system could continue to operate.
This doesn't take into account when clocks on different systems run at different speeds, or when clocks jump, especially on VMs and cloud instances, which happens all the time.
I don't really get why you would build a distributed database with dependency on wall time (unless you're Google and can stick atomic clock HW on every node). Why not use vector clocks? Am I missing something?
the section on lock-free distributed transactions on our design document[1] should answer your question, specifically the sub-section on hybrid logical clocks.
It may be a nitpick, but Google don't stick atomic clocks or even just GPS clocks into every node. Just into every data center. The difference means that it's actually perfectly feasible to do that for very many other companies running DCs or just in colos. The big news was how they used the fact (that times are synchronised with an upper limit to how far the clocks in two nodes will diverge) as a very significant optimization in Spanner, one of their distributed databases.
Building a distributed database that can optionally benefit from the same optimization actually makes a great deal of sense. Your average hobbyist won't care, but spending some extra few kilo bucks on hardware in a dc and get big throughput improvements out of your database system is a steal.
The real definition of distributed systems is endlessly challenging as you are always balancing trade-offs.
The CAP theorem still holds, so we pick which 2 out 3 to be strength?s and where to compromise as little as possible. It's a guaranteed 87.3% effective hair loss formula. I find Quiet Riot helps.
> When a node exceeds the clock offset threshold, it will automatically shut down to prevent anomalies.
If you're planning to run on VMware, be prepared to handle rather dramatic system clock shifts. I've seen shifts of up to 5 minutes during heavy backup windows. Not all customers might be willing to have their nodes go down due to system clock / NTP issues.
Yep, we've also had our share of troubles with noisy clock on cloud environments, so that's something we're very aware of. Further down the road, we're considering a "clockless" mode, which of course isn't clockless, but depends less on the offset threshold: https://github.com/cockroachdb/cockroach/issues/14093
That said, even today, configuring a cluster with a fairly high maximum clock offset is feasible for many workloads.
The latter. NTP only checks and corrects clock offsets every so often. If the "hardware"[1] clock undergoes offset shifts at random times because of VM pauses this won't get fixed immediately until the next NTP sync.
This gets exacerbated in cloud settings where VMs get moved between physical machines, or racks since now it's not just the pause, its that the clock is now pointing to a new hardware time source.
[1] in quotes since it's viewed as a single piece of hardware to the software inside the VM.
Cassandra user here in AWS. Clock drift is a big problem on VMs. NTP is not aggressive enough in these environments to keep clocks relatively in sync. We regularly had several hundred milli drifts between nodes. As cassandra is extremely clock sensitive, this is a big problem. We ended up using chrony with very aggressive settings to keep things in the sub-ms range for the most part. But it's still possible to get "hiccups" where time will skip. Especially if you reboot a VM.
Vanilla ntp makes assumptions about the hardware clock (that drift is stable) that don't apply to virtualised clocks. Using tsc clocksource may help as well.
* Search fwenable-ntpd (https://www.v-front.de/2012/01/howto-use-esxi-5-as-ntp-serve...) and download the .vib (do a security audit on it - its a zip file I think - to ensure it is what you think it is). Install the .vib which simple adds a ntp daemon option to the firewall ports. This works on v6.5
* Run ntpd on Linux VMs, pointed at the hosts with the local clock fudge as a fallback
* For Windows VMs in a domain, set the AD DC with PDC emulator role to sync its clock to the host via the VM guest tools, leave the rest alone
* On your monitoring system make sure that it has an independent list of five sources and use plugins like ntp-peer for ntpds and ntp-time for Windows (Nagios/Icinga etc)
With the above recipe, ntpq -p <host> shows offsets less than 1 ms across the board for ntpds after stabilising.
I don't suppose anyone knows how to make a Windows NTP server permit queries? Googling does not seem to reveal anything insightful. I know how to do this for ntpd but am stuck with dealing with a Windows NTP server right now.
There's also a lot of rpc end points used for the admin UI that can be queried to get more fine grain info. However, they're primarily for internal use and might change in the future.
What advantages do I have using Cockroach compared to Postgres, Cassandra, Rethink or MongoDB? (I know that all of them are completely different, that's part of the question)
So performance is complicated. Right now, we’re performance testing CockroachDB regularly, and everything is out in the open. Everything we do is tracked with a GitHub issue with the “perf:” prefix, if you want to follow along.
A blog post (well, many) are in the works outlining our performance benchmarking. The situation on the ground is changing fast - our performance has improved rapidly over the past months, and each time we sit down to write a blog post, it gets quickly obsoleted. So, trust that we will have a blog post talking about performance very soon.
Anecdotally, our customers are not finding performance to be a bottleneck. I encourage you to set up a Cockroach cluster, and try the various load generators (we've got the standards and a couple other homegrown ones in the repository).
From the linked website: "CockroachDB provides scale without sacrificing SQL functionality. It offers fully-distributed ACID transactions, zero-downtime schema changes, and support for secondary indexes and foreign keys". Significantly, CockroachDB has had extensive design dedicated to surviving adverse network conditions (see Jepsen references in other posts)
They are targeting MySQL/Postgres users, basically a post-CAP approach to RDBMS. But if you can work with eventual consistency, they are definitely not your first choice.
Yes, if very low-latency (i.e., P99 latency sub-5ms) reads and writes are critical to your application, CockroachDB should not be your first choice. That said, one of the primary motivations for CockroachDB is that most existing systems don't handle eventual consistency well. In our experience, most developers will eventually write code that assumes a consistent database, either accidentally or intentionally, because it works most of the time. Dealing with eventual consistency is hard.
Rather than "if you can work with eventual consistency, you should look elsewhere," the sentiment we're trying to cultivate is "if and only if your performance requirements can't work with strong consistency, then you should look elsewhere."
I support the default on consistence the way you posed it. Main reason is safe-by-default construction has proven more effective for average programmer over decades. The other approach caused many disasters.
Meh, this is just pr, nothing is safe-by-default. It's not actually true that people eventually assume strong consistency, because eventual consistency forces certain stricter way of thinking about the state and time, kind of functional, you just can't escape it. It's strong consistency that lets you get sloppy, while making you forget how not simple it is. It only exists inside the system and if you have clients from the outside of the system, like web browsers, you don't have two phase commit protocol on a button click there, so you have to resort to that same functional way of thinking to at least try not to confuse anyone on retry, but it's clearly not the case in the wild. It's just too complex.
I don't think anyone goes back from eventual consistency. It's more appropriate for this asynchronous world, easier and more reliable.
Google disagreed on that last part. Their bright engineers kept screwing up with eventual consistency. It's why they built Spanner in the first place followed by F1. So did customers of FoundationDB and Cochroach despite free solutions available for eventual consistency.
So, Im not seeing it so clear cut in favor of eventual consistency.
Google never did or bothered to do much work on eventual consistency, they cannot possibly have any experience with it. CRDTs didn't came from them. And you know very well that customers do not care about any of this.
Their cloud storage said eventually consistent for apps needing a lot of performance when I looked into it. A quick Google on the offerings show pages describing what tradeoffs are available for customers with each option. So, they not only know about it: they implemented it as a product feature. Their internal stores were strongly-consistent with high performance except AdWords on MySQL. That got moved to F1 for strongly-consistent high-performance. Spanner, which F1 uses, then got offered to cloud customers.
After re-reading the F1 paper, my mistake seems to be thinking they relied on eventually-consistent stuff internally. It appears that was just an option for 3rd party developers in their cloud products. Thanks for the peer review as I found some more stuff double checking. :)
Open source vs not open source. Cockroach still in it's infancy vs spanner. I'm sure there are a variety of things here, but they mostly aim to solve a similar problem with a slightly different approach.
I'm confused. What's the difference between 'Yes' and 'Optional' in the 'Commercial Version' row on the comparison chart? To me 'Yes' suggests there is only a commercial version, but clearly that's not true for CockroachDB.
[cockroachdb here] Yes! In addition to being highly scalable, CockroachDB also comes with built-in replication. That means that even with a smaller project that hasn't scaled yet, you still get the benefit of a more resilient database.
Also, CockroachDB is super easy to install and get started with!
I've come across many projects that are easy to get started with, but the main stuff to look for is in the details. Although MySQL might be easy to get into, for example, it takes time to learn the intricacies for query optimizations, and importantly, what to do when SHTF, like when a table gets corrupted.
My question is, in your opinion, what does it take to become proficient in CockroachDB sufficiently enough to be comfortable using it in a high volume, high-uptime-required environment?
I can't speak for others, but at least for me the main attraction of CockroachDB is getting foolproof HA straight out of the box. That is something I think anyone can appreciate regardless of their dataset size.
Note that I haven't actually ran CockroachDB yet, so I can't confirm if it really delivers on that promise, but I'm hopeful.
Cockroaches are highly resilient creatures. The name, I assume, is alluding to the goal of this database being a highly resilient system. Whats the problem?
I think the name "Cockroach" was a really poor decision from a marketing standpoint. The team intended to convey durability, since cockroaches can live through anything. But when I think of a cockroach, I think, gross, disgusting, etc.
It's memorable. So if the product is really excellent and is needed by customers - then I think it could be a boon.
I mean Mongo has very bad associations for me in terms of childhood taunts and Blazing Saddles...but now the name really relates more to the product than to the original meaning.
The difference is that "mongo" does not have a universally-known meaning. Cockroaches and known throughout the world, and are disgusting throughout the world.
> I mean Mongo has very bad associations for me in terms of childhood taunts and Blazing Saddles...but now the name really relates more to the product than to the original meaning.
The difference is that the word "mongo" is an issue of the same word having different meanings in different dialects. Whereas, with "cockroach", it's the same intended meaning, but with different connotations.
Agreed. It's a very stupid name. It's memorable for all the wrong reasons: It detracts from the product, and takes attention away from the product to the product name that elicits a visceral disgust in a lot of people. The conversation then is about why the product is called that instead of the product's merits.
On second thought I’m not sure. Everyone will call it RoachDB
for short anyway, but the full name has more impact. It shocks, which is a good thing. I was so focused on aesthetics that I didn’t even consider strategy.
They can always spin off “RoachDB” as an enterprise option, if they have any problems with selling it due to name.
Cockroaches are figuratively unkillable; resilient and survive in nuclear wastelands. I think the team was going for something like: you put your data in here and it will survive basically anything but a multi-continent nuclear war
This isn't the place to find out, but I'm curious as to the relative ratio of people who have this reaction.
I don't, at all - I'm vaguely positive towards the name, but in general don't care what things are called, so long as I can remember it. (Although I still maintain that "Paypal" is the stupidest name ever.)
I know people exist who will avoid things simply because they react negatively to the name. How prevalent is this? This isn't about overall product aesthetics/ergonomics/etc., just the name.
You're going to have a hard time convincing nontechnical managements they need to go with Cockroach instead of Oracle.
It's unfortunate the world works that way, but nevertheless, it works that way.
It could be the best database in the world. They did a real disservice to themselves by naming it after a bug people typically associate with filth, disease, and germs.
it's not at all unfortunate that the world works that way. What's really unfortunate is the founder of this seemingly great database system has decided to not care about how human psychology works.
Here's an Wikipedia excerpt on cockroach:
> They feed on human and pet food and can leave an offensive odor.[60] They can passively transport pathogenic microbes on their body surfaces, particularly in environments such as hospitals.[61][62] Cockroaches are linked with allergic reactions in humans.[63][64] One of the proteins that trigger allergic reactions is tropomyosin.[65] These allergens are also linked with asthma.[66] About 60% of asthma patients in Chicago are also sensitive to cockroach allergens. Studies similar to this have been done globally and all the results are similar. Cockroaches can live for a few days up to a month without food, so just because no cockroaches are visible in a home does not mean they are not there. Approximately 20-48% of homes with no visible sign of cockroaches have detectable cockroach allergens in dust.[67]
Human psychology is on their side, people just don't seem to understand it. Which is fine, most people are not marketing experts. FYI, it doesn't actually matter how much you dislike the name, but when the time comes to make a choice between a silly negative name but unusual and very memorable because of that and between something boring you have seen just as much, you will trust the silly name more. And since database choice for most people is purely dogmatic one - the name gives Cockroach a slight competitive advantage (at the stage they are in).
If you make technology stack decisions based on your feelings rather than what the product actually does, then you shouldn't be employed as a decision-maker.
Feelings become reality. People care about what things are called. You just don't care because it doesn't bother you. But if it was a topic you were sensitive about or something you feel is inappropriate, you would feel otherwise. Everyone has their limits of what is going too far. It's almost as if we live in a society with people from different backgrounds. What this really hits on is subjective relativism, and that's dangerous for an entire society to operate on. Maybe Cockroach isn't that bad, maybe it grosses some people out. Fine, not that big of a deal here. What if it was called "BondageDB"?
My point was that the job of a technology decision-maker is to make decisions on the actual technical merits of various options, the costs and tradeoffs thereof.
If you are in that role, and you permit the name of a vendor to trump the actual merits of the vendor's product, you should never have been trusted with decision-making authority in the first place, and any competitors who don't harbor your particular emotional hangups will get the better of you, and you won't be long for your position anyway.
Cockroach Labs is not selling to the end-consumer. They're selling to people whose job it is to behave like Vulcans. In this particular market, it doesn't matter what the name is.
The point is that is doesn't work. Yes, the name makes the product stand out but that benefit doesn't compensate for having your product associated with filth and disease.
There's a reason Toyota has never named a car 'The Cockroach' and a soft drink company has never released 'Cockroach Cola'.
But if there were two similar products, one named Roach, I would go with the other without much thought. The name is horrible. As long as they are the king of whatever they do, they can call themselves whatever they want, but handing competitors an automatic naming advantage did not have to be the case.
Not mysql, but we've tested and recommend the Ruby pg driver and the ActiveRecord ORM[1] (CockroachDB supports the PostgreSQL wire protocol). It should be 'plug and play' insofar as you simply point to any node in the running cluster when setting up ActiveRecord::Base.establish_connection.
As for our backup story, our doc page[2] on the subject should shed more details.
I have ported a MySQL-based ActiveRecord Rails app that was somewhat complicated to Postgres, and then on to CockroachDB. It works pretty well, so I'd give it a go. We're also committed to supporting ActiveRecord via the Postgres connector, so if you run into any bugs, we would do our best to fix them. I am personally invested in ActiveRecord support myself. At this point ORM support on CockroachDB is driven mostly by usage so please try it!
Your other questions are better answered on the blog post, but quickly:
* CockroachDB core comes with a `dump` command to backup your databases. CockroachDB Enterprise has blazingly fast _incremental_ cloud backup and restore, the kind that you might want for a very large deployment.
* Replication is managed under the hood by sharding the data into many ranges that are each 64mb in size. Each range is replicated using Raft, and if a node goes down, the other replicas scattered across the cluster seamlessly take over and upreplicate a new replica to "heal" the cluster.
* The horizontal scaling is indeed plug and play - just add more nodes to the cluster and they'll automatically rebalance replicas across the cluster with no downtime and no additional configuration.
Instead of just downvoting, how about refuting my claim?
I'm seriously curious what is the disagreement. These guys already established atomic clocks are unnecessary. Very interested in which use cases require them.
Serializability is all about ensuring a single consistent ordering of events. Lots of algorithmic shortcuts you can take if all your nodes' clocks are precisely in sync.
I'm very familiar with the literature since I'm a distributed database developer.
If you investigate high frequency trading you will understand that the quantum phenomena that I'm talking about is not just me high on mushrooms but a real world thing.
The only "time" relevant is the time when the cluster agrees an atomic, isolated transaction is time to commit from its own perspective.
Am I wrong in remembering that the HN guidelines used to say that you should not downvote someone's comment simply because you disagreed with it?
I went looking, and I don't see that in the current guidelines. I could be wrong about it being there before, but I was almost certain that it was at one point.
Seems like it used to say that you should only downvote comments that you think don't contribute anything of value to the conversation.
Just curious, because it seems to me that for quite a while now there have been a lot of comments that appear to get downvoted just because people don't agree with what the person said (and often there are no responses to counter, the person just gets downvoted).
I think you're thinking of somewhere else. The up/down votes are a way of agreeing or disagreeing without cluttering up the comments with a bunch of "me toos" or "nuhuhhs"
I'm certain that I'm not thinking of somewhere else. I'm completely open to the possibility that I just remember it wrong, but I'm sure that it was HN that I was thinking of, and not another site.
Nope, I know I'm not thinking of Reddit (as I said in reply to another comment, I've spent almost no time on Reddit, and would not have seen their guidelines at all).
Nope. I've never spend much time on reddit, and I'm certain I've never seen that page before. Perhaps I am thinking of comments other people made on HN in the past (who thought that the policy was as I described).
There's never been such a policy on HN; you remembered it wrong, as have many before you. It's the same phenomenon as attributes pithy quotes to Einstein and makes Canadians think we have Miranda rights: people hang memories on the nearest pre-existing hook in the brain.
I don't have a full history of the guidelines, but the canonical link on this tends to be [0]. About nine years ago, PG thought downvote to disagree was perfectly reasonable. I don't think there's been any official change since then.
That said, I think many on HN do think that downvotes should be reserved for uncivil or unsubstantive comments as they don't contribute to the conversation. Some will still downvote for disagreement or for other reasons.
I think it's best not to let it bother you or worry about it because there's not much you can do about it, other than contribute as civilly, substantively, charitably, and in good faith.
Actually, WalMNart cares and so does T-mobile. You probably care too if you stop and think for a bit...
The concern here isn't just order of transactions, but also synchronization. For instance, WalMart might charge you twice for a transaction if it appears to have happened at different times when it arrives in different data centers.
Also, the comment "The higher frequency the transactions the more you get into quantum physics." isn't relevant here. This is more in the realm of relativity than quantum physics. Even so, we aren't currently at a point where we need to worry about transactions happening at relativistic speeds.
Ah, I think I see where you are confused -- your arguments seem to make more sense when dealing with a single, local database. The idea here is that you want to achieve atomicity, but you need to do it across multiple distributed databases and you want to have a system who's components have exactly the same time in order to ensure consistence across each database.
Attempting to extend the landlord example... let's say that I'm your landlord and you have to pay me £1000 each month. You send the bank a message telling the to pay me the money. The bank may make several copies of that message and keep it around for their own reasons. Now, let's say that there are employees at that bank whose job it is to do go through all copies of all messages and make sure what they say is done. If they find a message from several months ago saying "transfer £1000 from you to me this month" and are somehow oblivious to which month it is, they may transfer an additional one thousand pounds even if it's already happened. It's not an exact analogy, but...
First, this is awesome! Congrats to the team for reaching this milestone.
Secondly, I think the name is memorable and conveys exactly what it should. If I were ever on an engineering team that chose not to use CockroachDB due to being "grossed out" by the name, I wouldn't be on that engineering team for long. Perhaps someone can explain the knee-jerk reaction to it for me.
I had previously been a big supporter of their name, agreeing with some other posters that it promotes the durability of the system.
However, after a move last year, I was forced to live with cockroaches for approximately 6 months, after never encountering them prior to that.
Since then, I've completely switched camps. Can't see the name without being skeeved out. The reality of cockroaches is so absolutely repulsive that it completely changed my view 180º.
I moved out of that place in November, and haven't seen once since; I'm curious if my aversion will fade over time.
Since there's a little side riff about the name going on I thought I'd throw in my 2 cents. Personally I love the name. I think it does a great job of conveying the spirit of the project and provides unlimited pun opportunities. Plus it's memorable, just like a real life roach encounter. Unfortunately I'm sure some people will discriminate against your DB on the basis of name alone. That's ludicrous, but that's our species for ya.
Choosing technologies based on first-hand review and first principles rather than things like Gartner magic quadrants, big company brand recognition, feature lists, and "serious" sounding names is a competitive advantage that startups often have over big businesses. The latter are forced by their procurement departments and other forces to use old, inferior, and more costly technology.
On the flip side though if I were in charge of CockroachDB I would look at doing something about the name. Maybe rename it something like "Resilient" as part of the "exit from beta" milestone. It's going to be a serious liability for them selling to the kinds of customers I described above, and unfortunately that's where most of the money is in these devops/infrastructure markets. The key to success is to make a superior product and then figure out how to sell it to pointy haired bosses. The latter often means making it look more boring than it actually is.
Fun factoid: scientists sometimes do this with grant proposals. I've had two scientists independently tell me that they often take cool, fascinating research proposals and "make them boring" to sell them to bureaucrats. "You have to hide all the interesting stuff and make it sound like you are doing boring incremental research. If you talk about anything 'revolutionary' you will never get funded."
I think the problem is worse: marketing / business people have convinced the worker that this surface level analysis is all we can expect of anyone. As said by other commenters: if the name of the DB solution influences your choice then you're probably gonna get what you deserve.
(Within reason. Someone on here actually said this argument is reasonable to have "because what would you do if they named it 'n-word'DB." Seriously.)
To me, cockroaches are such an unbelievably negative association that I don't think I could get over the name and work with this product, because I wouldn't want to be saying cockroach all the time.
To me, cockroaches aren't disgusting. And yes, I have used an outhouse in a 3rd world country where cockroaches were swarming up and out... But they just don't disgust me.
> I don't think I could get over the name and work with this product
That puts everyone competing with you at a HUGE competitive advantage. Making technical decisions based on the name of a product is the worst type of decision making.
As the creator of a moderately popular open source project, I can attest that the name of the project is very important.
A common problem for open source projects is that the name is not recognizable enough (e.g. too technical) or too generic (e.g. a simple English word which makes is heard to search on Google).
In this case the name evokes negative emotions of fear and disgust which are not what you want to associate with a database.
Back in 2000, I used to enjoy an online streaming radio station called echo.com, and as a sort of reward for listening, you could earn Amazon gift certificates.
I tried googling for "Amazon echo gift certificates" but I couldn't quite find what I was looking for.
It's a bad name because this topic will come up every time it's discussed, forever. It's a distraction from other relevant issues like new features or how it performs.
Strangely, it seems to be helping them. Usually whenever there's an excellent product/article featured on HN, there's not much to say, so there are very few comments. CockroachDB seems like an excellent product, yet the firestorm about their name is fueling discussion, which amusingly might be leading to more upvotes from people who dislike that they're being discriminated against based on their name. It's counterintuitive internet behavior at its finest, similar to everyone complaining that Soylent was a terrible name.
I hope it is excellent and advances the state of the art, but it won't reach it's full potential until it has a name people can use when talking to users, customers, and board members.
"Well first we collect all of the data in the Epidemic schema, run it through the Apocalypse pipeline to transform it into something that our Extinction servers can handle, and finally store it in CockroachDB."
Are there published benchmarks for multi-key operations and more complex SELECT statements? I apologize if I missed them.
I'm trying to determine whether there's a place for Cockroach within what I think are the constraints in the database space.
* Traditional SQL Databases
- Go to solution for every project until proven otherwise.
- Battle tested and unmatched features.
- Hugely optimized with incredible single node performance.
- Good replication and failover solutions.
* Cassandra
- Solved massive data insert and retention.
- Battle tested linear scalability to thousands of nodes.
- Good per node performance.
- Limited features.
It seems like many new databases tend to suffer from providing scale out but relatively poor per node performance so that a mid-size cluster still performs worse than a single node solution based on a traditional SQL database.
And if you genuinely need huge insert volumes, because of the per node performance you'd need an enormous cluster whereas Cassandra would deal with it quite comfortably.
[Cockroach Labs engineer here working on performance benchmarking]
We have load generators for YCSB (just raw key-value ops in a firehose) and TPC-H (very complicated read-only queries) running right now, and we're about to start running TPC-C queries (moderately complex queries in large volume) as well. You can follow along on our progress here: https://github.com/cockroachdb/loadgen
In the context of your dichotomy, we want to bridge that gap. We want the linear scalability of your second group along with the full feature-set of the first group.
We will be publishing our performance numbers, but we haven't so far because the product has improved rapidly, and our numbers have been quickly obsoleted, but rest assured, we will be publishing a series of blog posts very soon. Anecdotally, our beta customers are not finding that they need very many more CockroachDB nodes than their existing database solutions, even with something as high-performant (but inconsistent) as Cassandra.
In a couple of years, I suspect that they will rebrand their name to just "RoachDB". It conveys the same meaning, while not being that awkward to discuss with users/clients
About nine months ago we made the decision to go with RethinkDB for our infrastructure in place of PostgreSQL (at least for live replicated data), but if this existed at the time we'd have seriously taken a look. We're pretty happy with RethinkDB but I plan on still taking a look at this so we have a backup option.
[cockroachdb here] We are big fans of RethinkDB, but also glad to hear that you'll explore CockroachDB. Let us know how it goes, and definitely file any issues / feature requests in our GitHub repo!
Just out of curiosity, do you mind elaborating a little bit on why not? It strikes me as something that would be very easy to implement in a database, is there a reason why so few databases have a mechanism to do this?
If it's about maintaining an open connection in order to notify the client, that part makes sense, but at the very least the changefeed itself should be toggleable and easy to query in any DB.
One of the challenges for us in implementing something like LISTEN/NOTIFY comes from our distributed nature: since a table is likely broken up across many nodes, you somehow need to aggregate changes from all of them back into a single change feed wherever the listener is, and in such a way that it doesn't create a single point of failure.
It probably scales but how is the performance? If I need to load a couple billion rows and do a dozen joins in some analytics, is that one machine, a dozen, or 100?
Is it more for web apps, analytics, or what? When would I consider switching from e.g. Postgres to CockroachDB?
For just a couple billion rows and a dozen joins, a single node will suffice (with the caveat that you really want at least 3 nodes because CockroachDB is built for replication and fault-tolerance and you're not getting that with a single node cluster), but you'll get linear speedup as you add more machines.
Your performance on a single node should be on the same order of magnitude as doing this in Postgres right now. We are rapidly closing that gap, and intend to close it completely for TPC-H style queries, while retaining the linear performance speedup with more nodes.
The reason this gap isn't already closed is we've been focused on transactional performance in distributed, fault-tolerant situations rather than analytics performance, for 1.0. There are lots of optimization low hanging fruit that we haven't focused on in analytics scenarios that we are just getting started on.
Thanks for your response. It sounds like CockroachDB might be an alternative to setting up an RDBMS for read replication once you need many connections.
On the feature FAQ joins are describe as 'functional' which doesn't inspire a lot of confidence but maybe it's just a perception thing. What exactly does functional mean?
A SQL db without joins sounds a lot like just a NOSQL db with a familiar query dialect.
If you are using Joins in an OLTP setting, everything should work absolutely as you might expect.
"Functional" is our caveat that if you run Joins across your data in an OLAP setting, it will work, but it may not be the most performant Join possible. For example, our query planner does not currently plan Merge-joins even if the appropriate secondary indices exist. So after a point (joining ~billions of rows of data) it no longer is as performant as it could be. Now we expect to roll out this particular fix within 6 months. However, optimizing 4 or 5-way nested Joins in OLAP-cube style settings isn't something we're going to be performant at for years. We need a lot more infrastructure built up before we start solving the kinds of problems revealed by, say, the Join Order Benchmark paper (http://www.vldb.org/pvldb/vol9/p204-leis.pdf).
No. Once your latency goes beyond single digit seconds, performance will probably collapse. Too many subsystems would time out. in theory it could be made to work (with terrible performance, and extremely long commit-waits due to having to wait until the remote planets get back to you), but I wouldn't architect a planetary spanning distributed database this way. We probably would have to go back to the drawing board and start from scratch.
You'd need to give up on consistency, because there is no such thing when the time of communication is long compared to interval of events. In the long run, ACID is dead.
Long answer: at their closest earth and mars are about 54m km apart, at the furthest it's over 400, with an average of around 225m km, so theoretical latency is varies between 4 and 24 minutes.
CockroachDB uses synchronous replication via raft, and that latency would cause problems as would some other setting like our window sizes and their interaction with timeouts.
> CockroachDB uses synchronous replication via raft
Deep space aside, I wish the announcement just said that! I came back to HN for insight into the paragraph about "multi-active availability... an evolution in high availability from active-active replication". Marketing... sometimes... I tell you what.
More practically, I note this from Cockroach's document on "Deploy > Recommended Production Settings":
"When replicating across datacenters, it’s recommended to use datacenters on a single continent to ensure performance (inter-continent scenarios will improve in performance soon). Also, to ensure even replication across datacenters, it’s recommended to specify which datacenter each node is in using the --locality flag. If some of your datacenters are much farther apart than others, specifying multiple levels of locality (such as country and region) is recommended."
In short, IIUC, even _planetary_ deployment doesn't come for free (yet). Perhaps I'm just not well-enough versed yet in how people deal with globally-distributed databases, but I'd love to see the docs dig into this a bit more: practical limits of cluster deployment, recommended strategies and tools (if any) to replicate data between clusters, etc.
It looks like there is still no mechanism for change notification, which in our particular case is the only missing feature that prevents using it as a postgresql replacement.
Does anybody know if this feature is planned in the short or medium term ?
This feature is planned, but I cannot give you a concrete timeline. We want to do this right, and we need other parts in place to do this with high performance, in a transactionally consistent fashion, in the face of high contention, and for arbitrarily complicated "views".
I will say that this is the single feature that I personally am most invested in at the company, so it will happen.
Name doesn't bother me. It's memorable and I'd definitely consider using it, whether in a startup or enterprise. Better than "Postgres" -- how do you even pronounce that?
Pardon the nature of my question, but I'm really interested in what your experience has been so far building a database with Go? Has its runtime (the GC for example) posed any issues for you so far? Looking at other RDBMS's, languages with manual memory management like C or C++ seems to be the go-to choice, so what were the reasons you chose Go?
I'm quite frankly amazed that Go's runtime is able to support a database with such demanding capabilities as CockroachDB!
More technically, here's a somewhat random set of thoughts on the subject:
The Go GC is performant and predictable, unlike the JVM GC. We do have some very memory-allocation-conscious code patterns to minimize the performance impact of working in a garbage-collected language runtime, but in the end it's not as bad as you might expect if your expectations are coming from the JVM world.
Library support is good. To quote our CEO, "Most of us on the team have done extensive work with C++ and Java in the past. At Google, C++ was the standard for building infrastructure and there are a lot of good reasons for that. It's fast and predictable. It would be a good choice for Cockroach, except that in the world outside of Google, in open source land, the supporting libraries for C++ are either terrible, incredibly heavyweight, or non-existent. We didn't want to rebuild everything which you take for granted at Google from scratch. It turns out that Go has many of the necessary libraries, and they're straightforward and very well written."
Basically, if Google's internal C++ libraries, tooling, style guides (and the tooling to enforce them) were available externally, we might have gone with C++.
Some of us are fans of Rust, but Rust sadly did not exist in a stable state when CockroachDB started. I'm not sure we would pick Rust were we to start today (tooling is still a concern there), but it would certainly be part of the discussion.
The native support for concurrency in Go is a huge plus. We use thousands of goroutines in CockroachDB, and that's been a huge blessing.
I can answer any more specific questions if you have them.
Thank you for the reply! You and the presentation video Ben posted covered pretty much all of my questions, and I'm going to keep an eye on the issue tracker regarding performance to see what interesting things you might run into and how you deal with them in Go!
This is more my personal opinion, and perhaps more revealing my ignorance on the existing equivalent tools in the Rust ecosystem, but here is a list of some of the Go tools we use when developing CockroachDB:
1. gofmt and goimports really helps enforce a single uniform style. We don't really care what the style is, as long as it's consistent across our 30 engineers and 200k lines of code. We have hand-rolled more Cockroach-specific linters on top of this as well, but we could do that for Rust too.
2. go tool pprof is a great profiler. Being able to quickly dig into allocations, cpu usage, etc. is great, and we do so regularly. As a result, the overhead of the GC is minimized, since we can rapidly identify and mitigate the allocation overhead with the application of a few known patterns.
Now I don't know what the state of the art of rust profiling is, but if we were to litigate Rust vs Go starting CockroachDB from scratch today, we'd probably pay close attention to what the answer is here. The Xooglers on this team have a tonne of C++ experience, and were very happy with C++ profiling tools, and thought the Go profiler matched up to the best tools they had used previously. If there is a Rust equivalent, this isn't a problem.
3. Consistency of code (in both style, but also patterns used) across third party libraries is a concern. The existence of a single toolchain that enforces a single style in Go really helps keep the whole ecosystem healthy here. Even if tools exist for Rust, if they aren't universally used, that is not as powerful.
I honestly think that Rust would probably be a close contender if we litigated this question today. The TiDB folks use Rust for their KV side, but Go for their query engine, which is an interesting mix. If faced with this decision today, I personally would push for Rust; I'm not a fan of the Go type system's various limitations, which we are running into particularly as we write a more sophisticated query optimizer that has to do more classical programming languages reasoning. But I am one of the most junior engineers on the CockroachDB team, so I'm not sure I would prevail in this fight! :)
Thank you so much for the thorough answer! This is stuff we're always working on, so it's helpful to know about this stuff. Since you're not actively looking, I won't go into all the details, but if you ever are in the future, happy to give you a rundown of the state of the art whenever that is :)
Why do you think the Go GC is better than any the JVM options? From what I've seen, while the Go GC is well tuned for low latency, by picking the right JVM GC parameters you can on balance get a better throughput latency tradeoff. I'm just wondering if you have any reliable benchmarks or evidence to support what your saying? I don't use either language for work, so I think you might have better information than I.
I talk about this in the presentation I linked in another subthread (https://www.cockroachlabs.com/community/tech-talks/challenge...). The key to getting good performance out of any GC is to generate as little garbage as possible, and in our experience Go makes better use of stack allocation and value types keep many objects out of the garbage-collected heap. We've found that idiomatic go programs tend to produce less garbage than similar java programs, and in the presentation I discuss some tricks we use to get that even lower in critical paths. Admittedly, we're not JVM tuning wizards so maybe there's more that could have been done on the JVM side.
We write our own GPU algorithms, Java native interface transpiler (eg: we generate JNI bindings) as well as our own memory management.
We've found the JVM to be more than suitable. Granted - we wrote our own tooling and had reasons we can't move (those customers are a neat thing most people don't think about :D)
I understand why you guys did go though. Congrats on pushing the limits of the runtime.
As I understand it, Java needs a complicated GC implementation because it produces, by design, a makes a huge amount of heap allocations -- lots of very short-lived little objects.
Much of Java's GC focus has been on correctly partitioning the heap so that long-lived objects can be less aggressively collected than short-lived ones. (An example of a challenging long-lived object is the entire set of classes used by a program, all of which need to available to the runtime for reflection. For many bigger apps, the class hierarchy alone takes up many megabytes of RAM!)
Go can make use of the stack to a much larger degree (structs and arrays can be passed by value), and so it can get by with a much less advanced GC. As a result, Go team's main focus has been on reducing pause times more than anything else.
Overall we've been happy with the choice. The GC is sometimes a performance issue, but it's manageable (and Go gives you better tools to limit the cost of GC than many other garbage-collected languages)
We have started parallelizing our tests with the new subtest feature: leaktest in the top-level test, t.Parallel in the subtests. This means we only check for leaks in between batches of parallel subtests. This works OK for us for now since our slowest "test" is really a huge data-driven test suite, and that's the only place we're currently parallelizing, although it would be better if we could parallelize more of our tests.
How does Cockroach efficiently handle the shuffle step when data is on many nodes on the cluster and has to move to be joined? Does Cockroach need high capacity network links to function well?
I always see companies making the claim of linear speedup with more nodes but surely that can't be the case if the nodes are geographically disjointed over anything less than gigabit links? Perhaps linear speedup with more nodes is only possible over high speed connections? How high is that exactly?
Congratulations to the team on the release! Introducing this kind of database is no easy task - thank you and great job, keep up the good work!
The short story is we do need high capacity network links to function well. By "high capacity" I mean at least double digit megabit links between your datacenters.
A query that inherently requires shuffling because the data is geographically distributed can't get past the bandwidth needs of performing the shuffle. At the very least, with the literal simplest query plan, you're going to need all the raw data to be transported to a single node/datacenter, and I doubt there's a query and network setup where that's more efficient than doing networked shuffles themselves.
I don't think you need gigabit networks, but you're certainly going to want at least 10 megabit links. We have not tried to benchmark scenarios where we are bandwidth constrained, so I can't tell you precisely what the minimums are. All the cloud scenarios we've tested (on GCE, Azure, AWS, DigitalOcean) are constrained on other dimensions (i.e. CPU cores, memory, disk IO).
That makes sense- I think part of the reason such types of databases are well suited to cloud operations is the guaranteed throughput of the cloud providers own network backbone, which is almost impossible for any single "regular" organization to match, at least for the price. I think we are at a point where doing business without the cloud will become nearly (but not completely) impossible at huge scale with all these features.
Thank you very much for your detailed answer and good luck with the continued rollout!
15 years ago I was working on a similar distributed DB product. At the time, the idea was to send the query execution plan to each node to execute any filtering criteria to trim down the candidate row set. Then compute a Bloom Filter on the joining keys on the node with the largest candidate set (using some heuristic statistics), ship the Bloom Filter to other nodes with smaller data set to greatly reduce the non-matching rows. The rows survived the Bloom Filter are highly likely joinable and are shipped back to the main joining node to perform the final join. Bloom Filter is the perfect compromise between size and speed.
I'd imagine CockroachDB is doing something similar for distributed join.
haven't come across this idea before, interesting - will definitely have to give it some more thought. our 'distributed joins', so to speak, run through our distributed query execution model (distsql) setting up incremental 'stages' of computation with the results pipelined and plumbed through individual computes. viewing it through this model our implementation more closely resembles the Grace Hash Join[1] algorithm. you might be interested in the PR[2] that landed this changeset, there's a cool visualization in one of the comments[3] showing the query execution plan.
The Grace Hash Join approach ships the entire joining key set across network. Even if each node just get one partition of it, the aggregate network traffic is the entire set. For small table, it's fine. Large table is going to really tax the network.
I've heard of pushdown techniques including function, predicate and aggregate pushdown in distributed relational engines before.
Another interesting idea I read about (I can't find it anywhere online) was called "join zippering". Basically you first request the cluster to solve a join by querying and streaming the key columns from a join predicate back into the cluster itself to identify which nodes have matches and then streaming the results from each node in parallel, and doing the join in the stream.
I agree! we have some semblances of pushdown filtering across aggregations and some other interesting techniques as documented in the RFC[1] that first proposed the distributed execution model.
CockroachDB looks like a great alternative to PostgreSQL, congrats to the team for doing so much in such a short time. The wire protocol is compatible with Postgres, which allows re-using battle-tested Postgres clients. However it's a non-starter for my use case since it lacks array columns, which Postgres supports [0]. I also make use of fairly recent SQL features introduced in Postgres 9.4, but I'm not sure if there are major issues with compatibility.
I'm basically here to ask a similar question, whether this is aimed as an modern alternative to Postgresql, since they don't clearly state this on the OP news announcement.
To me, at least for now, it seems more like a SQL enabled etcd or similar. They aren't currently claiming performance numbers that make it sound suitable for general purpose relational database scenarios. A SQL aware etcd like thing has a lot of appeal though, and I assume the performance work is coming.
I'm an engineer on the SQL team at CockroachDB. We're very aware of our missing support for array column types - and in fact beginning to add support for arrays is one of my team's priorities for the next release cycle.
What kind of other recent SQL features introduced in Postgres 9.4 do you use? Postgres has a ton of features, as I'm sure you're aware, and while we strive for wire compatibility with Postgres it's not a goal of ours to implement support for every Postgres feature out there.
I double checked my codebase and it looks like it's just JSONB, which CockroachDB also doesn't support [0]. Sorry to bother about missing features, but there are really some things that prevent a smooth transition from Postgres.
Yep, JSONB is on our roadmap as well, although it won't come before array column type support. Thanks for the feedback - I'd personally love to see migrations from PostgreSQL to CockroachDB become seamless for more complex use cases as we continue development.
It occurred to me to migrate Odoo ERP to CockroachDB, scaling up the DB is one of our biggest challenges with some of our clients.
However Odoo leans heavily on Postgres, migration would be a lot of work I imagine. The first snag I've hit with CockroachDB is the lack of 'CREATE SEQUENCE'.
Plus, Odoo uses REPEATABLE READ + a hand-rolled system of locks for consistency, I'm not sure how that would play out with CockroachDB. In my experience some of the performance issues come more from long lived locks in the app than from sheer DB performance.
JSON/JSONB will come after array support. As far as I know, we don't have any concrete plans at this time to support listen/notify or spatial datatypes.
Very disappointed with HN turning into a 4chan/reddit style trolling board about the name. Guys, we get it that you don't like the name. Can we please stop bike shedding and move on? The people at cockroachdb have obviously seen all your messages but decided it's worth keeping the name. What more is there to talk about? Why not talk about the relative technical merits of this DB?
It's not bikeshedding when the bikeshed's color will actually have concrete effects on adoption. Most people -- i.e. in procurement, management, finance, and others you need to appeal to -- don't want anything to do with cockroaches. The idea disgusts them at a gut level, not something you can talk away.
HN users are giving vital advice, for free. Those who ignore it will have only themselves to blame.
As I say every time this comes up, would you be so dismissive about critics of naming a product PubesDB? Or GonorrheaDB? Or [n-word]DB? Then you agree that disgust-invoking connotations of the name matter, and we're just haggling over the details.
Ubuntu, Mongo, Swagger (edit: Hadoop also) ... they're weird, sure, but they don't evoke the visceral feeling of disgust that cockroaches do.
procurement, management, finance, and others you need to appeal to
They don't need to appeal to any of these suits. Just the technical decision-makers, whose express job it is to choose solutions on their technical merits, not their spurious emotional reactions.
There is a sad fact though is that there are many organizations do not have technical people at the decision making level.
So these Suits you speak of, won't be able to get past the product name, enough to hear any technical merits of why this technology should ever be considered. Due to disfunctional leadership not even having a role of Chief Technical officer, or Chief Informational officer at the senior leadership level. A lot outsource because they don't want to hire/pay for this in house. It also shifts responsibility away, giving the CEO,COO,CFO, etc... the ability to point fingers at an outside entity.
That is a double whammy! internal can't sell/justify it to management, and outside IT providers/contractors can't sell it either.
So while they may be surviving with the current name they have, that does not mean they wouldn't be crushing the market share with a different name. If they are getting negative comments about the product name, then that's a warning that they should do market research to find out how many people would avoid the product because of the name.
But what the hell do I know, I'm making yet another HN comment post.
"Deragatory" isn't the problem; the issue is whether it invokes visceral feelings of disgust. Many terms can be used as an insult, but are still tolerable as a name because a) they have non-insulting usages, and b) the emotional response does not rise to the level of "visceral disgust".
The Spanish Wikipedia suggests many usages of the term "mongo", which probably wouldn't persist if the term was so repulsive: https://es.wikipedia.org/wiki/Mongo
I think SilasX is not suggesting that it isn't a insult, but rather that a word being an insult is not what really matters as far as naming is concerned. What matters, according to him, is whether the word automatically elicits a strong negative emotional reaction. That a lot of words that elicit such a reaction are used as insults is mostly incidental to the argument.
If calling your product RetardedDB or MentallyChallengedDB in a professional setting is OK by his standards because it's not spelled in english, then I'm OK with that. That's what mongo means in spanish, btw.
Okay, there are separate issues going on here; let me try to clarify:
Is "mongo" the equivalent of English "retard", in terms of being a low-class insult that invokes a visceral reaction among the majority of the population?
I didn't believe that at first; if so, why didn't anyone ever put it in Wikipedia? English has "retard" (in the pejorative sense):
If it's merely an insult with numerous other meanings, I don't think it's comparable.
But let's assume it is equivalent to "retard". In that case, I would agree that it shouldn't be used as a name. But you have to pick your battles: all words will have that trait in some language. For my part, I would consider the Spanish-speaking market big enough not to expect them to buy [the equivalent of] RetardDB. So I agree there.
Edit: I agree with the sibling commenter networked's points.
And that's my point. You're only offended because it's in english (and that's fair). But no matter what name you use, it will offend someone. CucarachaDB will fly under the radar.
Maybe you have to be culturally immersed to know those things. Mongo, mongol and mongólico are the terms you should research.
First of all, Wikipedia isn't infallible. Second, genkaos and myself grew up on different sides of the world. Culturally different, and yet, in our own respective cultures we learned, however wrong it is, the implicit meaning of mongo when used derogatively. It doesn't have several meanings as you pointed out, it has one, which does not mean it is the same for genkaos. As you pointed out in the Wikipedia link, it refers to certain type of people. When used as an insult, towards to a, whether the person is white, Hispanic, black, whatever, it is implied that that person is that sort of person and a retard. Thus, my comment of it being racist as well.
Now, even if I had associated MongoDB with that explanation, and now that I do remember it's inherent meaning under a certain context, I take no offense in it since the people behind MongoDB didn't have that intent. Obviously this is an assumption on my part.
Let us not get derailed from the main point, which is the 'visceral' feelings that cockroachDB has on so many people as you mentioned in several comments. It is true, it happens to me as well. But not the word itself, but when I'm around one. Those feelings of fear, whatever, when around one are irrational. I don't remember the explanation why it's irrational, I've never worked in the field of psychology.
My mother tongue is English, so I never made that association. But I've lived all my life in a Spanish speaking country. Mongo is usually used in the context to mean "retard".
Edit: Now that my memory kicked in, it's racist as well.
> It's not bikeshedding when the bikeshed's color will actually have concrete effects on adoption.
Not taking a stance either way on the name, but that is the definition of bike-shedding (aka law of triviality). A committee won't vote for my nuclear plant because the bike shed is red. The bike shed's color has concrete effects on adoption.
EDIT: I would just like to acknowledge the irony of bike-shedding bike-shedding.
Alright, if you really want to unpack the metaphor:
The bikeshed story is to illustrate overemphasis on something that is trivial. It uses the example of a bikeshed color and a committee wanting to spend a lot of time on it because a) they care a little about it, and b) they understand it well enough for hard-headed members to wade into the dispute rather than trust experts.
It's a failure mode -- by stipulation -- because the bikeshed color doesn't matter beyond minor (but real) aesthetic feelings among the committee, that are far outweighed the cost of high-level personnel devoting time to it. Had they been aware of the general dynamic of these thing, they could entirely prevent the loss by moving on; it's purely an internal matter.
The bikeshed model ceases to demonstrate a failure mode if and when the bikeshed color has impacts far beyond things under the control of the committee. For example, if the majority of the world's people had a near-religious devotion to destroying facilities that house a blue bikeshed, and that fanaticism was hard to defend against, this would be a valid reason not to make the bikeshed blue, and would warrant the committee's attention.
I summarize such situations as "that's not bikeshedding", though of course, to be more technically correct, I should say "that situation does not illustrate the avoidable failure mode in the parable of the bikeshed".
Similarly, if adoption matters for more than just that committee -- if they need to convince numerous other committees to adopt the design -- it's likewise "not bikeshedding" because the first committee doesn't have control over all the other ones; with respect to the first, it's an external matter, and they can't stem the loss just by saying "hey, this is trivial".
Now, you are correct that, a high enough level, this could work as a bikeshedding example, if you could simultaneously get the entire world to collectively agree on the non-importance of aesthetics on technical matters, and on what counts as technical vs aesthetic. Then the world could play the role of that first committee and say "wow, this is trivial" and it's done.
But if that were actually feasible, then that should be your product (producing universal agreement on matters where you have a logical proof-of-correctness), not a database!
> ...but that is the definition of bike-shedding (aka law of triviality)
> A committee won't vote for my nuclear plant because the bike shed is red.
> The bike shed's color has concrete effects on adoption.
Not exactly.
> Parkinson observed that a committee whose job is to approve plans for a
> nuclear power plant may spend the majority of its time on relatively
> unimportant but easy-to-grasp issues, such as what materials to use for
> the staff bikeshed, while neglecting the design of the power plant itself,
> which is far more important but also far more difficult to criticize constructively.
> -- https://en.wiktionary.org/wiki/bikeshedding
This part is key here:
> A reactor is so vastly expensive and complicated that an average person cannot
> understand it, so one assumes that those who work on it understand it. On the
> other hand, everyone can visualize a cheap, simple bicycle shed, so planning
> one can result in endless discussions because *everyone involved wants to add a
> touch and show personal contribution*.
> -- https://en.wikipedia.org/wiki/Law_of_triviality
> -- https://books.google.com/books?id=RsMNiobZojIC&pg=PA317
I need some additional hand-holding here if you don't mind, I don't see the difference.
If I were to rephrase those two excerpts:
> Parkinson observed that a committee whose job is to approve plans for a
> [globally distributed relational database] may spend the majority of its time on relatively
> unimportant but easy-to-grasp issues, such as what [the name is],
> while neglecting the design of the [globally distributed relational database] itself,
> which is far more important but also far more difficult to criticize constructively.
> A [globally distributed relational database] is so vastly expensive and complicated that an average person cannot
> understand it, so one assumes that those who work on it understand it. On the
> other hand, everyone can [read a name], so planning
> one can result in endless discussions because *everyone involved wants to add a
> touch and show personal contribution*.
It is both a negative reaction and is memorable. It is not clear which wins, and it isn't your job to decide. Yes, you have an opinion but you may not be right.
I remember in the mid-2000s thinking that a particular politician couldn't possibly succeed with a Muslim sounding name. Turns out that a lot of people thought that. Yet Barack Hussein Obama managed to become President.
Your opinion has definitely been registered. Continuing to state it has no value.
It's not your company. You're (probably) not an equity holder. Have you personally been harmed by the name because your company wouldn't let you adopt it in spite of its technical merits? Are you worried it won't succeed because of the name and thus are fighting on the company's behalf for its survival?
> Most people ... and others you need to appeal to ... don't want anything to do with cockroaches
You're making so much of this up out of thin air.
> giving vital advice, for free
> As I say every time this comes up
As the parent said, the staff have already seen these messages. They have decided to keep the name. Advice is helpful, but once the decision is made, it's not. Let it go.
On the other hand, if I heard of a database called "CockroachDB" gaining ever-greater adoption, I'd pay close attention to it because it was clearly succeeding despite a marketing handicap.
It so far appears not to be hurting them. In the slightest.
This "warning" comes from the HN crowd every time something is posted about CockroachDB. I think it's time to LET IT GO.
I for one, completely disagree with you but that's because I have a different understanding of the relationship between the business side and engineering. We are already looked at as eccentric and strange people, rarely if ever has an absurd technology name caused issue.
Someone talking about "cockroach" is equivalent to talking about "unicorns" or "git." Its considerably less offensive than talk of "masters" and "slaves." If you think this is such a problem for you, then work on your salesmanship as I wouldn't hesitate to talk to other departments or investors about this product.
I was a CTO up until I took medical leave this past October and I cannot stress how important salesmanship is to the role. I think your examples of other databases are hyperbole and not the point. You want them to be equivalent but they aren't. This comes down to what you can sell in your organization and if there is merit to it, then selling it should not be a problem.
One last point is other departments don't give a shit what the database technology is called unless it's something to put on their CV. Just call it the "database" as they most certainly will.
> It so far appears not to be hurting them. In the slightest.
I feel like that is tough to judge because the public has only known them by one name as far as I know. If they switched to this name from another name and saw no difference then we could surmise that the name has had no affect.
The end goal of a company is not to raise venture funding. So you cannot use "they raised capital" as proof that their name isn't a problem. Their name absolutely will hurt their adoption. Maybe the product is good enough that they'll still be successful, but if so, you would expect them to be even more successful if they didn't have such an off-putting name.
Did I say it was the end goal? It's merely a metric for a young company. What it means is that enough people have decided that there is a future that current revenue, growth, and expectations are being met or substantial. Raising $53 million dollars isn't easy. So I can say capital raised is a metric on which to base a judgement.
Your statement that it "absolutely will hurt adoption" is unqualified and nothing but opinion. And what exactly is "more successful?"
The handful of people who won't try this because of the name won't matter to their bottom line. If it's good enough then for even a large majority of those they'll end up using it anyway.
You're missing the point. Saying "they raised capital" is not a good counterargument to "it will hurt adoption". Your response would be a good counter to "they will never raise capital" or "no one will use this".
You can't know how many VCs didn't fund due to the name or how many tech decision-makers at companies will pass on this product due to the name. That being said, I doubt it will be/was significant in any case.
Pretty much any reasonable definition will do. For example, higher adoption is one metric that can be used to define success.
> Your statement that it "absolutely will hurt adoption" is unqualified and nothing but opinion.
It's an opinion that a lot of people share, judging from the HN threads I've seen about CockroachDB. And really, I shouldn't need to defend the idea that having a name that disgusts people will hurt adoption. It's just common sense. The only real question is how much damage will the name do? The better the product is, the more people will forgive things like bad names, but there will definitely be at least some level of damage.
In addition, if there's multiple products in the same category that are fairly close in quality, then subjective things like names will matter more. Maybe CockroachDB is significantly better than the alternatives right now (I really have no idea; this product category isn't something I know anything about), but if so, surely it won't remain "significantly better" forever. Other products will catch up, or other products will be created to compete, and we'll end up with several products that are similar, and once again, naming will become more important.
And finally, you're completely ignoring the fact that a lot of decisions about tech stack aren't actually made by technical people. They're frequently made by managers rather than engineers. And when the decision is made by non-technical people, marketing (e.g. name) is very important. Heck, even when the product is made by engineers, marketing is important, because that's how you convince the engineers to spend the time investigating the product to see if it lives up to its claims or does what they need.
Speaking as an engineer, if tomorrow I suddenly have the need for a cloud-native NewSQL database, I'm probably not even going to look at CockroachDB, simply based on the name, unless someone else convinces me that it's clearly superior. I find the name very off-putting and I'd rather not be confronted with the mental imagery of cockroaches any time I use the product.
It will never be let go, because each new person is a new interaction with the system that prompts the same point again.
It's like those '*porn' subreddits. You can explain and explain 'till you're blue in the face why the subs are so named, but there will always be some sniggering discussion when it is introduced to new users no matter how much you try and silence or control for it, because it's based on a natural response.
Capitalize all you like, but that's just how people work. :)
When people are bike shedding, they aren't doing it to waste time, they think that they are adding value because of `list of bad results of wrong colour here`.
So here's my question to you: could you be wrong about this having "concrete effects on adoption"? And if you are wrong, is this just bike shedding?
And to continue the bike shed metaphor, it's about people ignoring nuclear power plant design whose worst case scenario is nuclear meltdown. For CockroachDB 1.0, what's the equivalent, data loss? So are you discussing something technically trivial (colour is easy to understand) over the design (technically complex) that would prevent data loss? If the answer is yes, aren't you bike shedding like a champion?
tl;dr Bike shedders don't know they're bike shedding and think the discussion is very important.
With respect to your specific point: if we could resolve how much it matters, then yes, that would obviate the debate. But the bikeshedding metaphor doesn't add much there because it's precisely in dispute about how much it matters.
Well know I feel like a bit like an asshole with how abrasive my response was, sorry. Kudos to you for not escalating.
I agree that it resolves to how much it matters, and I guess I disagree with you on how much it matters. How it relates to the bike shedding metaphor is starting to feel like a semantic argument, which is not something I want to continue.
In response to your escalated names like PubesDB... my opinion is that I agree I wouldn't work with them, not because of any internal disgust reaction, but because the name signals a level of maturity that I don't want in my stack. Some people might have the same reaction to Cockroaches.
>Well know I feel like a bit like an asshole with how abrasive my response was, sorry. Kudos to you for not escalating.
I didn't feel it was abrasive at all.
For my part, I'm just upset that I went to such great lengths (in the comment I linked) to unpack where the bikeshed metaphor does or doesn't apply, disentangling the various issues and merging them into a general understanding, right where that comment was needed, and yet that's the one that no one is responding to... (what's worse, it was downvoted less than a minute after I posted it ).
>In response to your escalated names like PubesDB... my opinion is that I agree I wouldn't work with them, not because of any internal disgust reaction, but because the name signals a level of maturity that I don't want in my stack. Some people might have the same reaction to Cockroaches.
Right, like I said, "we're haggling over the details"; it should be regarded as a question of which names are so disgusting to be out of the question, yet people are dismissing the entire naming issue as "lol emotional primates".
It's not trolling. It's a legitimate warning and they can choose to ignore the chorus to their own peril. The warnings get louder as they get more resistant to changing their name. Keeping the name for whatever reason IS going to cost them enterprise customers
There's a difference between "legitimate" and "useful". If the top comment on every CockroachDB post was "hey y'all remember that Go's maps aren't thread-safe", that would certainly be a legitimate warning. But at the same time, the CockroachDB team have been coding in Go for years, and they obviously already know that. If those top comments frequently turn into big threads arguing about whether Go's maps should have been thread-safe, the whole thing goes from being questionably useful to seriously annoying. Same thing's happening with the name. They know.
So I do product marketing for a living, and have launched a whole wack of things with good and bad and boring names. The fact that CockroachDB is consistently on top of HN with each thing they do is pretty strong evidence that they're doing just fine with the name they have, and probably doing even better than if they had a milquetoast tech startup-esque name.
Also, they know what their sales cycles look like. They hear feedback from actual customers. They have people whose job it is to notice any advantage they could have along the way. And yet! They're still selling stuff, they're at 1.0, and they're still alive — with the name they have.
If the name were Milquetoast it would be awesome, because that was the cockroach from Bloom County. Or, was that the joke and I'm only just getting it late? ;)
I haven't really seen that many comments about the name, though?
So now the top thread is about how terrible HN is for bikeshedding instead of talking about the actual topic... except this top thread is also not talking about the actual topic. Worth considering, imo.
What's even wrong with the name anyway? It's certainly a lot better than the ridiculous ones like "PostgreSQL" and "MongoDB" and "Redis" (what do these words even mean?).
Unfortunately this is a version of the thing it's trying to stop, as is plain from the below. These balls of mud are immune to negation; they laugh at it and grow stronger.
The blog and other non-docs pages use hugo (http://gohugo.io/) and the docs use jekyll, but will be ported to hugo soon. We use github pages for hosting with cloudflare in front (for https on a custom domain).
There was a great session with Spencer Kimball (CockroachDB creator) and Alex Polvi (CoreOS) at the OpenStack Summit. It's a good overview and demo: https://youtu.be/PIePIsskhrw
I think this is the DB Project of the year in the open source community. Cockroachlabs has done an incredible effort to develop and test a new Database and these guys are giving it for free (I read about the series B raise too ;)), for us to use it.
Thanks for doing this. You're very much appreciated.
(BTW I love the name and the logo!!)
I've been following CockroachDB for quite a while. Great job on 1.0.
I've had a question for quite some time though (and I think there is an RFC for it on GitHub): do we still need to have a "seed node" that is run without the --join parameter, or can we run all the nodes with the same command line, with the cluster waiting for quorum to reconcile on its own?
Currently, you need to run one node without --join for the initial bootstrapping (as soon as this bootstrapping is complete, you can and should restart it with --join to get everything into a homogenous configuration). I was hoping to make some changes here so you could start every node with --join from the beginning, but it was trickier than anticipated so it didn't make the cut for 1.0. Watch for improvements here in a future release.
That's okay, for now, I run a simple StatefulSet where each pod checks whether the Service is reachable on port 26257 to determine if it should join or init the cluster.
It's not as nice as if it was handled by Cockroach itself, but it does the job.
short answer: nope. cockroachdb replicates data for availability and in order to guarantee consistency across the replicas, it uses Raft[1] internally. Raft necessitates a majority of the replicas remain available in order to operate. it ensures that a new 'leader' for each group of replicas is elected if the former leader fails, so that transactions can continue and affected replicas can rejoin their group once they're back online.
raft is premised on overlapping majorities, so to speak. in order to tolerate up to `n` node failures you'd need to run `2n + 1` instances (for nine nodes you'd tolerate up to four node failures).
In an era where hot air and hip DB technologies prevail, I'd like to emphasize the fact that the CockroachDB engineers are consistently honest and down to earth, in all relevant HN posts.
This builds up my confidence in their tech, so much so that even though I had no real reason to try this new DB, I'm gonna find one! :D
Exactly! The confidence that the devs inspire by taking the time to explain the choices behind the tech, makes me want to find a project to test it out on.
Does the replication work cross-region, say US-East and US-West? or even cross continent? It sounds like the timing requires very short latency and might not work in these scenarios
Jepsen test results basically show that latency caused by replica distance won't screw your data. On the other hand, clock drift can stop your system, or even potentially corrupt your data, depending on how fast such incident can be detected/handled and what is your workload/what you are doing.
Yes, it works. Your latency will just be correspondingly higher (due to the speed of light). We are constantly testing a cross-region (i.e. US-East and US-West) cluster and have periodically run tests on cross-continent clusters (US to Asia-Pacific).
I'm struggling to understand how this company has raised $50 million dollars when db companies with paying customers like RethinkDB and FoundationDB had to shut down.
They are gonna earn back $50 million by selling...a backups tool?
I think one major difference is that it's a drop in replacement for certain SQL products, plus a major selling point of NoSQL - good horizontal scaling.
RethinkDB and FoundationDB are great, but require a paradigm shift I think.
Curious why Mac is better supported than Windows. This is obviously something you'd run on a server. Do orgs run Mac servers? Is it just to support dev work for people too lazy to launch a VM? Sorry, Windows/Linux ops person here with very little awareness of Mac ecosystem.
It's not so much a matter of Mac > Windows but rather Mac+Linux+*nix > Windows.
This just comes down to the fact that Windows is a special snowflake that does everything differently. Sometimes for good reasons, but usually not for good reasons.
Very interesting. I have to admit I've seen the product name a few times, but never took the time to have a look. I do have a few questions, though, if any of the engineering team are still around watching the discussion :-)
From the high availability page [1] in the docs:
> Cross-continent and other high-latency scenarios will be better supported in the future.
Do you have a specific timeline in mind? I've been working on an application that needs to be highly-available, and which uses Oracle right now. It seems like you can add all sorts of tools to the mix (RAC, DataGuard, etc), but there are always significant caveats around the capabilities of the resultant system. We're talking 1 to 2 TB of data total, tables of up to 100 million rows with 1 million rows added per day, distributed across three data centers (US, EU, Asia).
And regarding high availability in the context of application deployments, is there any documentation on the locking characteristics of DDL statements? I'm interested in the ability to modify the schema during an application deployment without having to bring down the system or implicitly locking users out. Apologies if I missed it somewhere on the website!
I don't have a specific timeline but it is something we will be focusing on in the following releases.
Regarding DDL statements, this blog post [1] has details. In a nutshell, online schema changes are possible; the changes become visible to transactions atomically (a concurrent transaction either sees the old schema, or the fully functional new schema).
Say you scaled up to 100 nodes for the holiday season, is there any way to tell how many/much storage/nodes you have to keep running in order to keep 3 backups and maintain your new post holiday load?
We don't have any auto scaling for either up or down scaling, but if you're using a deployment tool such as Kubernetes, I don't see why it wouldn't be fairly easy. And it might be a good idea to add a message in the admin UI if you all of your nodes are experiencing a high load.
By just looking at your max load over the last 24h or perhaps week, it would be pretty easy to see when to down scale.
That being said, as long as you remove the cockroach nodes one at a time , it's pretty easy to down scale a cockroach cluster.
Since CockroachDB is Eventually Consistent Reads then how would that affect my SaaS multiuser application? How long on average would I have to wait for them to become Consistent?
reply