Hacker Read

dijksterhuis · 2019-06-25 05:40:14+00:00

I willing to take a punt and guess that the models they’re using are using short term/isolated data, not 10 years worth of your entire browsing history.

mikeryan | karma 9784 | avg karma 4.18 · | 2009-09-04 14:11:37+00:00

They also state their data is 2 months old.

kwestro | karma 757 | avg karma 14.84 · | 2013-06-20 06:43:57+00:00

I tried it, and the results aren't up to par with whatever I'm searching. They'll be forced to relinquish data like all the other companies when they get big enough.

mwexler | karma 3347 | avg karma 4.16 · | 2016-06-03 10:27:25

Not saying this destruction is happening or not, but their data to support their claims is dreadful. All circumstantial viewing of links and pages, google news trends, and Alexa profiles. Sadly, these days claims that may be correct can be ignored when the supporting data supplied is weak (and vice versa).

anon373839 | karma 2168 | avg karma 4.98 · | 2024-04-16 05:19:27

Sorry, I wasn't clear. I meant, is there any reason to believe that they'd not filter out those kinds of exchanges before updating the model? I am absolutely certain they don't train on user data indiscriminately because so much of it would be garbage.

the_biot | karma 1188 | avg karma 2.81 · | 2022-07-15 14:23:30

Yeah, but they're still providing a dataset that's just plain bad. It's hardly relevant how many sites link to some other site, if it's dead.

vernie | karma 582 | avg karma 0.76 · | 2017-05-16 18:40:08+00:00

The Flash viewer and ridiculous TOS doesn't give me much confidence in their data.

IAmGraydon | karma 3725 | avg karma 2.96 · | 2023-11-26 22:42:22

It’s bad data. They changed the way they collect data when the price of API access went up. The drop you see in July 2023 isn’t real.

Nate75Sanders | karma 1224 | avg karma 4.78 · | 2011-09-29 13:22:47+00:00

Just playing devil's advocate here: They mention "aggregate user behavior", so they could be building a large, aggregated Markov chain that stores no user data whatsoever -- just site transition data for the world.

I haven't read in-depth analysis of how they do their stuff, though.

reply

scarejunba | karma 2432 | avg karma 1.07 · | 2019-11-07 03:48:41

How do you expect to get that data? It's buried in their Mediamath and Optimizely accounts.

kordless | karma 4030 | avg karma 1.55 · | 2013-01-20 23:10:47+00:00

From a research standpoint this data set is much less interesting than a bunch of students/faculty/bots/apps clicking and surfing their way around the whole Internets.

paulie_a | karma 1667 | avg karma 0.68 · | 2020-07-13 13:46:27+00:00

It doesn't seem like even with analytics they know

It just gives a false confidence in bullshit.

Most internet analytics and the billions spent on data mining is all for naught.

reply

jonplackett | karma 9277 | avg karma 3.66 · | 2021-11-08 07:28:21

Not talking about raw data. Clearly that isn’t needed for the web. But my point is that it got thrown out with the bath water.

artninja1988 | karma 825 | avg karma 2.35 · | 2024-02-17 00:46:17

Who the hell paid 60M for their garbage data? This can't be real. Also another reason why scraping content to use for models has to be free for everyone or incumbents are gonna pull stuff like this.

heavyset_go | karma 25689 | avg karma 3.63 · | 2022-03-21 17:33:10

If that's what's actually happening on the back end, they're doing a pretty bad job with my recommendations. What I suspect is really happening, though, is that the data collection and analytics are being used to optimize revenue, whether it be from advertising, minimizing production costs or ascertaining just how much bullshit users will put up with, and not to build or release higher quality products. That, or the data is sold.

sn41 | karma 3973 | avg karma 3.77 · | 2018-12-30 07:12:10+00:00

They aren't good. Hence the dragnet approach of collecting all data, and then waiting for Google or some such entity to come up with the research for mining methodology.

verdverm | karma 6501 | avg karma 0.84 · | 2023-04-27 19:12:40

Notice how the actual data on these pages is exactly the same and from 2022-2023?

This is what I mean by making up links that don't work. The links further down the page don't even go back that far.

You ought to inspect the links before posting them as proof. By not doing so, you demonstrate the limitations and fallacies of humans putting to much faith in these tools.

This graph made 7 years ago would appear to contradict the numbers ChatGPT gave you: https://www.reddit.com/r/dataisbeautiful/comments/3s3c8o/rel...

Are you ready to rethink your exuberance for ChatGPT?

---

I performed two searches, both from image search, took 1 minute to find a good graph. I suspect ChatGPT took longer to write the combined, inaccurate responses, based on my experience at how slow it streams the answers back.

1. windows version usage by year, 2000 - 2010

2. operating system usage by year, 2000 - 2010

reply

cies | karma 7348 | avg karma 2.3 · | 2022-09-01 01:33:26

I also do not trust their data as much.

Encosia | karma 2916 | avg karma 3.87 · | 2015-05-27 04:23:07+00:00

I understand how Compete works. I'm just showing you one immediate example of how comically wrong their data often is, both in absolute value and in misreporting trends. This has always been my experience with their data on both small and large sites, unless you use their paid product that allows you to self-report. Kind of reminiscent of BBB/DUNs/Yelp type protection rackets actually.

laythea | karma 548 | avg karma 0.95 · | 2019-03-13 17:39:58

This is no surprise to me and I wager that far worse is happening with your data.