I willing to take a punt and guess that the models they’re using are using short term/isolated data, not 10 years worth of your entire browsing history.
I tried it, and the results aren't up to par with whatever I'm searching. They'll be forced to relinquish data like all the other companies when they get big enough.
Not saying this destruction is happening or not, but their data to support their claims is dreadful. All circumstantial viewing of links and pages, google news trends, and Alexa profiles. Sadly, these days claims that may be correct can be ignored when the supporting data supplied is weak (and vice versa).
Sorry, I wasn't clear. I meant, is there any reason to believe that they'd not filter out those kinds of exchanges before updating the model? I am absolutely certain they don't train on user data indiscriminately because so much of it would be garbage.
Just playing devil's advocate here: They mention "aggregate user behavior", so they could be building a large, aggregated Markov chain that stores no user data whatsoever -- just site transition data for the world.
I haven't read in-depth analysis of how they do their stuff, though.
From a research standpoint this data set is much less interesting than a bunch of students/faculty/bots/apps clicking and surfing their way around the whole Internets.
Who the hell paid 60M for their garbage data? This can't be real. Also another reason why scraping content to use for models has to be free for everyone or incumbents are gonna pull stuff like this.
If that's what's actually happening on the back end, they're doing a pretty bad job with my recommendations. What I suspect is really happening, though, is that the data collection and analytics are being used to optimize revenue, whether it be from advertising, minimizing production costs or ascertaining just how much bullshit users will put up with, and not to build or release higher quality products. That, or the data is sold.
They aren't good. Hence the dragnet approach of collecting all data, and then waiting for Google or some such entity to come up with the research for mining methodology.
Notice how the actual data on these pages is exactly the same and from 2022-2023?
This is what I mean by making up links that don't work. The links further down the page don't even go back that far.
You ought to inspect the links before posting them as proof. By not doing so, you demonstrate the limitations and fallacies of humans putting to much faith in these tools.
Are you ready to rethink your exuberance for ChatGPT?
---
I performed two searches, both from image search, took 1 minute to find a good graph. I suspect ChatGPT took longer to write the combined, inaccurate responses, based on my experience at how slow it streams the answers back.
I understand how Compete works. I'm just showing you one immediate example of how comically wrong their data often is, both in absolute value and in misreporting trends. This has always been my experience with their data on both small and large sites, unless you use their paid product that allows you to self-report. Kind of reminiscent of BBB/DUNs/Yelp type protection rackets actually.
reply