Hacker Read

eurekin · 2023-09-27 10:16:49

One of aspects is putting latest knowledge to the model. They could, given enough requests, provide a special subscription service, where you get the latest model every N DURATION time.

Or, maybe a finetuned version for your particular dataset?

Of course I have no idea, just speculating

EDIT: I'm speculating they might be just investing some marketing budget into this model, hoping, it would allow for capturing enough target audience to upsell related services in the future

reply

mullingitover | karma 11177 | avg karma 4.32 · | 2023-04-23 20:04:12

I wonder if there might already be enough datasets laying around that even if we stopped making new ones, models can generate entire new datasets on their own, process/label them on their own, and generate new and improved models in a closed feedback loop. The only data that's really needed going forward is information on current events.

callmekit | karma 490 | avg karma 4.15 · | 2021-12-01 22:25:25

I think you need to actually get the new data from Github and finetune the model. Maybe it's done very frequently, but unlikely.

haddr | karma 1018 | avg karma 2.38 · | 2016-04-03 16:54:17+00:00

Very cool indeed. I wonder if it permits incremental model generation, so that i could be usable for classifying new data and new topics?

dimaor | karma 100 | avg karma 2.94 · | 2024-06-20 10:35:24

not sure, but maybe it's possible only to update the model in a specific time? are there other uses to the data apart from learning and validation?

gdubs | karma 9763 | avg karma 4.6 · | 2019-02-15 20:13:49

This is probably naive, but I’m imagining something like the US Library of Congress providing these models in the future. E.g., some federally funded program to procure / create enormous data sets / train.

noptd | karma 694 | avg karma 2.62 · | 2021-09-21 01:43:30

Naturally - how else would their ML models personalize the default value for your TP subscription upsell?

MikahDang | karma 12 | avg karma 1.0 · | 2023-10-28 02:46:50

This is a great question. Actually, I have two opinions about this: - It is not economically useful for foundational model providers to build this, or at least integrate it with their products, because it will cost them double. They would prefer RLHF and something intrinsic to their models. - Our dataset is proprietary. We have to collect and write a huge number of prompts to fine-tune it well. And our model is not 100% LLM, we use a version of the evolutionary algorithm to control the quality.

I would say we have a tiny moat of data and algorithm. But at the end of the day, it is about delivering something people find useful. If we could be replaced, let it be. But it works for now.

reply

AdilZtn | karma 55 | avg karma 1.62 · | 2023-04-05 14:22:54

That's amazing! This model is a huge opportunity to create annotated data (with decent quality) for just a few dollars. People will iterate more quickly with this kind of foundation model.

jorgemf | karma 1506 | avg karma 2.28 · | 2018-01-17 20:29:52+00:00

I don't think they have been doing it for 4 years as AutoML was quite recent. The idea maybe was there before but noone publish any paper about it before.

Bear in mind that this service creates the architecture of the model for you. I think ClarifyAI has a predefined model that is fine tune with the data, which it is not even similar.

reply

dualityoftapirs | karma 16 | avg karma 1.33 · | 2023-09-19 07:04:04

They probably won't share how they did it, but there's been a lot of research over the past 6 months showing how you don't have to retrain the entire model to add in new sources. I know nothing about this stuff, but my limited understanding from blog posts is it's easier than anyone had thought to add in new data to a pre-existing model.

malux85 | karma 4888 | avg karma 3.37 · | 2018-01-22 22:37:55+00:00

Because having large datasets allows me to merge the data with proprietary sets, or increase my own predictive models.

I have real estate models, Energy Trading models, cryptocurrency models, mechanical failure predictive models, insurance models, sentiment models and clustering models

Everything is monetised

reply

jhoelzel | karma 930 | avg karma 2.11 · | 2022-12-28 05:50:07

i like this thank you!

how are you going to ensure that your model continues learning. and what will it learn?

How are you going to pay for unlimited use? :D

reply

scarface_74 | karma 3762 | avg karma 1.19 · | 2024-05-29 16:48:18

Do we need for the model to be be continuously updated from data sources or is it good enough that they can now figure out either by themselves or with some prompting when they need to search the web and find current information?

https://chatgpt.com/share/0a5f207c-2cca-4fc3-be33-7db947c64b...

Compared to 3.5

https://chatgpt.com/share/8ff2e419-03df-4be2-9e83-e9d915921b...

reply

Ozzie_osman | karma 6995 | avg karma 6.84 · | 2023-01-20 23:42:31

I feel like within 6 months the models will have adapted to not need these "clever" tricks. Presumably, if for many cases the trick is to say "Let's think step by step", that's something the model can learn to do on its own without the prompt.

The real interesting thing will be feeding alternative data into these models. Whether it's certain structured corpus, silo'd enterprise data, or personal data.

reply

Lucasoato | karma 718 | avg karma 2.48 · | 2023-08-26 04:20:08

Maybe the real game changer in the future will be the ability to train the same model on very different kind of inputs like video, images, text, audio... Imagine also all these data cleaning tasks are already automated, you just need to feed the model PDFs and automatically a support model will extract all the relevant metadata... or probably you'll just be able to select a set of books from an online library and your model will train on them as well (of course for a non trivial subscription lol)

d13 | karma 398 | avg karma 1.3 · | 2023-06-20 01:55:38

Where is it getting the content from? My understanding is that, as a statistical model, it doesn’t contain a database of complete works.

eigenvalue | karma 5515 | avg karma 5.35 · | 2024-06-20 14:43:40

Awesome, can’t wait to try this. I wish the big AI labs would make more frequent model improvements, like on a monthly cadence, as they continue to train and improve stuff. Also seems like a good way to do A/B testing to see which models people prefer in practice.

montanalow | karma 607 | avg karma 5.84 · | 2022-05-02 13:32:16

That's exactly the target use case. Models make online predictions as part of Postgres queries, and can be periodically retrained in a cadence that makes sense for the particular data set. In my experience the real value of retraining at a fixed cadence is so that you can learn when your data set changes, and have fewer changes to work through when there is some data bug/anomaly introduced into the eco system. Models that aren't routinely retrained tend to die in a catastrophic manner when business logic changes, and their ingestion pipeline hasn't been updated since they were originally created.

spyder | karma 1543 | avg karma 1.82 · | 2022-08-25 08:52:48

Stability.ai and probably others too already working on video and audio models too and also I think I heard a service to train / finetune the model with your own dataset.