I suspect the long term plan is to create a training dataset labelled by the 3000 people and, when they have sufficient training data, let machine learning / AI take over.
It sounds to me they want people to learn about deep learning and tensorflow. In the long term there will be more professionals and the salaries will go down. That is how they will make profit from this type of thing.
And companies that can generate the training data at scale will be the highest value ones.
These LLMs are the shiniest thing in AI/ML right now b/c the data they use for training is already freely available and massive in scale via the internet.
There's an entire universe of data that hasn't been collected, curated, and leveraged in the right way yet:
- The DNA sequences of all living things
- The DNA sequences of all of humanity
- The CAD files for all manufactured goods
- The motion data for humans working manual jobs
- The words spoken by multiple peoples across their entire lifetime
People keep saying that training is tricky, which is true, you need people with experience, but they're not so rare that this will present a moat forever since you can very much hire the people who know how to do this.
My experience with ML projects is that while there is churn in the modeling, most of the effort for a long lived system still goes into the data, but engineers really want to work on the modeling/infra pieces much more than data quality.
Which is to say, I have a lot of skepticism that this is a long term business.
Imo personal opinion, 'AI' at this point is about augmentation of human action to reduce costs (time, materials, human attention, compute, etc), and actually, if you know what you're doing, it works and can make you money.
My group works extremely heavily in this space. We use a combination of human annotation and ML to speed up human annotation and improve the products of the ML component. Rinse, wash hands, recur until 95% of predictions are 95% accurate or better. Use ML to find the 5% of predictions that aren't up to snuff and lay hands on them (this is the part where you have to pay people). There is nothing shameful about including humans in the process.
AI and 'Big Data' (as trends) aren't really overlapping in my view. Of course training these LLM models requires a huge amount of data but that's very different from the prospect of spinning up a Spark cluster and writing really badly performing Python code to process something that could have easily been done in a reasonable time anyway on a decent workstation with 128gb of RAM and a large hard drive/SSD, which was a large part of what the hype train was a few years ago.
reply