This article just reinforces that management thinks that data scientists spend most of their time building models. In reality, data scientists spend the majority of their time munging and wrangling disparate data types (that are typically a total mess), understanding the data well enough to fix problems, translating business requirements to code, and converting model outputs to human understandable presentations. Building models is a trivial amount of the effort for most projects, typically 10% or less of the time. Most organizations will eventually move over to AutoML type solutions, but then will be shocked when they fail to achieve any significant gains in productivity because they're optimizing one of the shortest steps in the whole process.
My guess (and this is a reasonably educated guess): a data scientist creates models and usually has an advanced math or stats degree (or similar). A data analyst uses models created by data scientists, and often has a business/econ or other undergraduate degree.
Of course this is a generalization, but in my perusal of job openings over the past year, this seems to be roughly what companies mean.
That seems like a pretty inaccurate job title then. Data Engineers are people working with data pipelines, storage, and schemas. They can lean towards more software engineering or towards analytics with dashboarding/machine learning but their primary responsibilities are the former.
This is a deeply misleading (though somewhat accurate) comment.
The reason it's misleading is because the 70% above (who may be called data scientists) are not actually data scientists, at best they are data analysts.
In general, the core difference between data scientists and data analysts is that the former can code in at least one language (SQL doesn't count, unfortunately).
However, because the term data science became so popular, everyone re-branded their analyst roles as data scientists leading to this concern.
Additionally, the post I'm replying to is pretty biased, as the OP talks about productionising models. While this is a major facet of DS work, it's not the whole thing. TBH, I can find people to productionise models a lot quicker than I can find people who can figure out what to model, and how to measure it.
Some of those people are most comfortable with Excel, and while I'd prefer they used a different tool, I can't argue with their output.
Also, the OP here is focused on deployment of Python ML models, which again is a subset of a very, very broad field.
That being said, i agree with most of the categorisations, except that the two critical attributes of good data scientists are a strong background in statistics and data common sense.
Data common sense is a weird attribute where when you look at the numbers and see if they are reasonable. For example, if you are running a mobile gaming company and see an ARPU of $5, something has either gone horribly wrong, or you're going to be a billionaire (assuming you have equity).
This attribute is actually not that common amongst DS people, so it tends to be the limiting factor, rather than ability with containers and deployment (which I do agree is very important).
But it is just an assumption. I work as a data scientist for 5+ years and from practical point of view, it is not just data wrangling. It is worth to mention that going through that logic we assume that programmer fully understand how to develop model in production and how to handle it in some border cases, which is not true.
Analytics Engineer is a clear one for this, as teej said.
The title is strongly associated with the dbt community, so it could imply you’re using dbt for your data modeling (not necessarily a bad thing, as it sounds like it would be a good tool for your use case).
reply