Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Do you have any tips on how to convince management they don't have big data? My company's largest database has been in production for 10 years and could fit into the RAM on my dev machine, yet I'm constantly pushing back against "big data" stacks. This post made me laugh because "big data" and Hadoop were mentioned in our standup meeting yesterday morning.


sort by: page size:

"If you have to ask if you have big data, you don't have big data. When you have big data, you know you have big data."

Seriously, you can launch a server that has almost a terabyte of RAM! Is your business doing enough to make that database even blink? "My sources say no".


Ugh.. I had a boss years back who insisted on using "big data" to refer to our analytics and reporting work (which was nowhere near big data in terms of data size - we had maybe a million rows in our database across all our tables), and I fruitlessly tried for months to explain to him that anyone who really knows what "big data" means would immediately see through his bullshit..

Indeed. For more real-world examples of people who thought that they had "big data", but didn't, see https://news.ycombinator.com/item?id=6398650 ("Don't use Hadoop - your data isn't that big"). The linked essay has:

> They handed me a flash drive with all 600MB of their data on it (not a sample, everything). For reasons I can't understand, they were unhappy when my solution involved pandas.read_csv rather than Hadoop.

User w_t_payne commented:

> I have worked for at least 3 different employers that claimed to be using "Big Data". Only one of them was really telling the truth.

> All of them wanted to feel like they were doing something special.

I think that last line is critical to understanding why a CIO might feel this way.


Cute but "Big Data" is really just data that's not in the building and isn't feasible to just move around from one machine to another in your department.

It's slightly mystifying. The only company I've worked at that did "big data" _really_ well just plugged a few TB of RAM into some sharded databases and got on with life.

Usually when I tell that story, I get a lot of objections about how that solution won't scale and they must not have really had big data from people who are, truth be told, used to working with data on a fraction of the scale that this company did.

That said, it's not a turnkey solution. This company also was more meticulous about data engineering than others, and that certainly had its own cost.


Yes I saw this, and got a little disillusioned at first, but after looking carefully this is not big data, their entire dataset fits in RAM. When your dataset can't fit in RAM - this is where the last resort comes into play. Sadly most companies, I agree, don't know when data is really big data. Most of the time it's just medium data. And I agree about the overhead costs.

5 years ago at least, I was dealing with 50Tb databases on hundreds of Tb of physical storage. We didn't call it big data, it was just data, we didn't call ourselves data scientists, we were just database guys, DBAs and devs.

Nowadays a million rows is "big data" and only a "data scientist" can handle it. These guys are a joke.


I've worked on 50Tb relational databases, I don't consider myself to be a "big data" guy.

I think the problem is that people don't really appreciate how big your data has to be to be Big Data.

I don’t know why these ‘you don’t have big data’ type articles bother me so much, but they really do. I know it isn’t saying NO ONE has big data, but I feel defensive anyway. Some of us DO work for companies that need to process millions of log lines a second. I know the article is not for me, but it still feels like a dismissal of our real, actual, big data problem.

Thank you MS Research for a dose of sanity. "Big data" seems very potent as far as marketing buzzwords go. It plays on people's ignorance and the general sentiment of "too much information".

I'll be keeping this pdf in my "rebuttals to idiocy" folder.

There are some industries that certainly have do have "big data" (Wikipedia has some definitions for "big data" that include size ranges for whatever that's worth) but it does not seem like companies with "big data" are the only targets of "big data" marketing. And from what I know about available solutions, if I really had a "big data" problem (e.g., 100 terabytes not 100 gigabytes) then I would not be choosing Hadoop. I also would not choose SQL or "NoSQL". But that's just me. Some of the best solutions I've found have nearly zero marketing. Go figure.


I think many people are confused about what "big data" means.

I work for an analytics consulting company, and many of our clients want us to use Hadoop with their data. They've heard that Hadoop is the standard for big data, and they associate with "big data" with machine learning.

But the data they want us to put in Hadoop is usually small enough to work with in RAM on my laptop.


It's very hard because they see "big data" as something that important people and important companies do. When you say "We don't have big data!", it translates to "We aren't that important!" This, of course, makes everyone very angry.

Be mindful of people looking to introduce big data without justification. They are playing a game of some sort (maybe just personal resume value, or maybe a larger vie for power), and you are positioning yourself as their opponent when you try to stop the proposal they're pushing. Do not go into this naively.


Slap a TB of RAM on a single machine and call it a day. Not big data.

But the underlying problem is the same - companies that use Big Data tech are clueless about data management. You can use unicorns - it's not going to do anything. "Garbage in, garbage out" is a timeless principle.

> You probably just need better buzzwords (and ideally the background to back it up) -- NoSQL, big data, MongoDB, Hadoop, etc.

Are there as many data scientists who don't work on Big Data?


I remember the hype around BigData. I was in those meetings where vendors pitched their product. Our director would asked "Do you do Big Data?" Any vendor who said no was immediately dismissed.

I still don't know what the answer to that question was supposed to be. We scraped coupons from our competitors then displayed them on our websites.


COMPLETELY agree. In companies that don't have data as a core competency, "big data" ends up being this business buzzword thrown about because their data is too big for their current set of tools... whether it's R or even Excel or what not.

As a math/stats guy who picked up more programming along the way, I personally think it's MUCH easier to train a DB guy some business sense than it is for a a business analyst to have Hadoop drilled into them. Of course, the downsides of a coder without sufficient savvy are harder to detect than a numbers guy who can't make his program work, and therein lies your problem.


This is so true. I do business intelligence at Amazon, and I've seen this play out millions of times over. The fetishization of big data ends up meaning that everybody thinks their problem needs big data. After 4 years in a role where I am expected to use big data clusters regularly, I've really only needed it twice. To be fair, in a complex environment with multiple data sources (databases, flat files, excel docs, service logs), ETL can get really absurdly complicated. But that is still no excuse to introduce big data if your data isn't actually big.

I really hate pat-myself-on-the-back stories, but I'm really proud of this moment, so I'm gonna share. One time a principal engineer came to me with a data analysis request and told me that the data would be available to me soon, only to come to me an hour later with the bad news that the data was 2 terabytes and I'd probably have to spin up an EMR cluster. I borrowed a spinning disk USB drive, loaded all the data into a SQLite database, and had his analysis done before he could even set up a cluster with Spark. The proud moment comes when he tells his boss that we already had the analysis done despite his warning that it might take a few days because "big data". It was then that I got to tell him about this phenomenal new technology called SQLite and he set up a seminar where I got to teach big data engineers how to use it :)

P.S. If you do any of this sort of large dataset analysis in SQLite, upgrade to the latest version with every release, even if it means you have to `make; make install;` Seemingly every new release since about 3.8.0 has given me usable new features and noticeable query optimizations that are relevant for large query data analysis.

next

Legal | privacy