Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

"Data model" isn't a confusing term, and "data modeller" is a profession. They work on data models.


sort by: page size:

This article just reinforces that management thinks that data scientists spend most of their time building models. In reality, data scientists spend the majority of their time munging and wrangling disparate data types (that are typically a total mess), understanding the data well enough to fix problems, translating business requirements to code, and converting model outputs to human understandable presentations. Building models is a trivial amount of the effort for most projects, typically 10% or less of the time. Most organizations will eventually move over to AutoML type solutions, but then will be shocked when they fail to achieve any significant gains in productivity because they're optimizing one of the shortest steps in the whole process.

Data engineering is not data science. Data engineers deliver the data for data scientists, data scientists use the data in models.

Someone who can build models and someone who can scale it are 2 different people and professions. Data scientist vs data engineer.

Yes, Ideally, A data scientist works on creating the model or logic while a Data Engineer works on the part of deploying the same.

Data science is a complicated profession, wouldn't you agree?

My guess (and this is a reasonably educated guess): a data scientist creates models and usually has an advanced math or stats degree (or similar). A data analyst uses models created by data scientists, and often has a business/econ or other undergraduate degree.

Of course this is a generalization, but in my perusal of job openings over the past year, this seems to be roughly what companies mean.


That seems like a pretty inaccurate job title then. Data Engineers are people working with data pipelines, storage, and schemas. They can lean towards more software engineering or towards analytics with dashboarding/machine learning but their primary responsibilities are the former.

"Just look around in the finance and insurance sector for data modeling."

That is a very narrow niche.


Ill-defined though it may be, there's still an understandable difference between data science and data engineering.

Sure, anyone can say they are a data engineer.

isn't it typical for data engineers to be paid less (way less?) than the modelers?

This is a deeply misleading (though somewhat accurate) comment.

The reason it's misleading is because the 70% above (who may be called data scientists) are not actually data scientists, at best they are data analysts.

In general, the core difference between data scientists and data analysts is that the former can code in at least one language (SQL doesn't count, unfortunately).

However, because the term data science became so popular, everyone re-branded their analyst roles as data scientists leading to this concern.

Additionally, the post I'm replying to is pretty biased, as the OP talks about productionising models. While this is a major facet of DS work, it's not the whole thing. TBH, I can find people to productionise models a lot quicker than I can find people who can figure out what to model, and how to measure it.

Some of those people are most comfortable with Excel, and while I'd prefer they used a different tool, I can't argue with their output.

Also, the OP here is focused on deployment of Python ML models, which again is a subset of a very, very broad field.

That being said, i agree with most of the categorisations, except that the two critical attributes of good data scientists are a strong background in statistics and data common sense.

Data common sense is a weird attribute where when you look at the numbers and see if they are reasonable. For example, if you are running a mobile gaming company and see an ARPU of $5, something has either gone horribly wrong, or you're going to be a billionaire (assuming you have equity).

This attribute is actually not that common amongst DS people, so it tends to be the limiting factor, rather than ability with containers and deployment (which I do agree is very important).


> I'd call myself more of a "data plumber"

I think the actual term is Data Engineer.


Presumably one of those Data Scientist instances is meant to be Data Engineer?

But it is just an assumption. I work as a data scientist for 5+ years and from practical point of view, it is not just data wrangling. It is worth to mention that going through that logic we assume that programmer fully understand how to develop model in production and how to handle it in some border cases, which is not true.

Analytics Engineer is a clear one for this, as teej said.

The title is strongly associated with the dbt community, so it could imply you’re using dbt for your data modeling (not necessarily a bad thing, as it sounds like it would be a good tool for your use case).


I agree with you, is this the type of work a data engineer does? I actually kinda like that.

I think that is OP's intent. Data Scientists, Data Engineers, Data Analysts, Data X

Modeling is just a small part of data science (the percentage of time I've spent modeling as a data science is in the single digits).

Automating modeling is a bit easier than automating the other parts, though.

next

Legal | privacy