Google pays publishers to test AI tool that scrapes sites to craft new content

shp0ngle | karma 6104 | avg karma 3.67 · 2024-02-28 12:50:08

One friend once told me, even a few years before ChatGPT, that they use AI to generate daily horoscopes in his online newspaper. Using far more simple models.

Nobody ever noticed.

reply

kozinc | karma 37 | avg karma 1.54 · 2024-02-28 13:12:53

Well horoscopes do tend to be general enough they might as well apply to nearly everybody, so I'd believe it.

Filligree | karma 7814 | avg karma 2.44 · 2024-02-28 13:13:06

Didn't they notice the accuracy of the predictions went down?

/s

reply

a_gnostic | karma 135 | avg karma 0.47 · 2024-02-28 14:19:23

I did. I was supposed to be frozen solid underwater, and burned to a crisp. None of that happened.

hef19898 | karma 16236 | avg karma 1.77 · 2024-02-28 15:35:57

On average so...

rolph | karma 7722 | avg karma 1.84 · 2024-02-28 15:43:39

oneday i had a fortune cookie tell me nasty like that

"complete loss, no prosperity, utter ruin. " !?

reply

stabbles | karma 2467 | avg karma 3.87 · 2024-02-28 13:15:37

The AI might as well look like this

   old_horoscopes[rand() % N]

and nobody will ever notice.

Aurornis | karma 11562 | avg karma 9.54 · 2024-02-28 13:50:57

Context “mixers” that put together copy and rearranged sentences were popular in black hat SEO for years. The early versions of ChatGPT have been used to generate SEO and marketing copy since ChatGPT 2.

This has been going on for a long time. The same low-quality sites that you ignored 2 years ago are just getting slightly better content that you’ll never read.

reply

pradn | karma 4976 | avg karma 4.37 · 2024-02-28 14:19:22

Text generation using word/sentence/phrase-level ad-libs is so simple its a common homework assignment for first-year programmers. You really don't need much.

There's even tools like Tracery that let you construct grammars and vocabularies as a form of textual generative art.

reply

mtmail | karma 10094 | avg karma 1.54 · 2024-02-28 15:59:01

20 years ago (HTML was still king, Javascript hardly used) I worked on a horoscope product. The data provider had all daily/weekly/monthly horoscopes of the coming year ready. Whatever their process, I assume manual, must have been straight forward.

beezlebroxxxxxx | karma 1474 | avg karma 4.37 · 2024-02-28 13:18:36

"Handmade" or "manmade" is about to have a whole new relevance on the internet.

I'm sure I'm in the minority, but the mere suggestion that some purportedly creative "content" (ugh, that word sounds like saying "sausage meat") is AI generated makes me completely lose interest. Soon enough that might make me lose interest in whole swathes of the internet, but I can't deny that's been an ongoing process regardless for a while anyways.

reply

smashed | karma 1035 | avg karma 4.5 · 2024-02-28 13:40:44

Nothing wrong with AI generated content if it's info I need and it's factually correct.

The problem I have is that it's usually junk fluff pieces with way too much text for SEO optimization.

I feel anything computer generated should be short bullet points or comparison tables, not long form text as it's just not good at that.

For example think of a blog post where the author walks you through its thought process, trial and error and ultimate solution. That is very humane and relatable, something an LLM can approximate/copy but it will never be genuine. I'd prefer in that case just the raw solution, not a fake and deceptive walkthrough.

reply

actionfromafar | karma 6262 | avg karma 1.82 · 2024-02-28 16:03:23

I fully expect the norm to become some LLM layer which takes such fluff and condenses it into such fact dense text, maybe as a browser plugin.

bratbag | karma 669 | avg karma 2.59 · 2024-02-28 17:38:20

If that happens the next generation of SEO will be creating content designed to bubble up through that layer.

actionfromafar | karma 6262 | avg karma 1.82 · 2024-02-29 08:18:55

Of course! But at least there will be less text to skim through.

mistrial9 | karma 4203 | avg karma 0.87 · 2024-02-28 17:39:53

except, translated to other languages with zero effort by the publisher.

I agree with the sentiment, at the same time commercial drivers will fill the pipes with low-quality content quickly, because $MONEY. The literate, subtle and artsy are once again buried in commercial content. The only way forward is rebuilding chains of editorial recommendation. Ordinary browsing will become a junk-fest. It appears the pandora's box is opened once again.

What was really jarring recently was to browse literate, subtle and artsy content only to be repeatedly interrupted by genuinely offensive and/or borderline scam content ads, then back to the chosen content. real.

reply

ItsMattyG | karma 141 | avg karma 2.43 · 2024-02-28 14:38:41

Interesting, for me this is only true as long as the AI generated content is worse.

There's some things I really care about the human experience for, but much content it just matters to me if it's a joy to ride.

reply

ravenstine | karma 20620 | avg karma 4.29 · 2024-02-28 14:53:50

Yeah, right now most human written content isn't that good, either. Quality writing has largely been abandoned for verbosity and formality, offends no one, and often lacks substance or that human touch. I'm guessing AI content will be about as flavorless. But time will tell.

randomdata | karma 8455 | avg karma 1.21 · 2024-02-28 16:06:08

Yeah. It is difficult see AI supplanting humans for the things you go outside for, but any human involvement on the internet has always just been an implementation detail.

ActionHank | karma 1133 | avg karma 3.16 · 2024-02-28 14:51:47

After their recent over-censorship of Gemini I look forward to Google's fair, even handed, completely unbaised approach to the delicate topic of race relations and tensions as the world's population slowly seems to descend into more hatred for each other over silly differences. /s

greenie_beans | karma 1990 | avg karma 1.67 · 2024-02-28 15:21:51

agreed, but i see this as an opportunity for creative people to make art and communicate things that a computer can't. i'm skeptical that an AI can write a great novel about the matters of the heart and feels like it's alive (not without trying). i think, in the age of generic AI content, that some people will be seeking authenticity even more than we do now.

dogman144 | karma 2538 | avg karma 2.78 · 2024-02-28 15:27:16

Creative people hate this trend and don’t work that way (see possible optimizations via the grateful computer overlords). Source - know a lot of artists and writers.

The creative classes already trusted this stuff the first round in 2008+, and it ate their lunch.

reply

greenie_beans | karma 1990 | avg karma 1.67 · 2024-02-28 15:31:13

i think you're missing what i said.

i'm a writer. the only thing i can do is keep making my art with the insight of the human condition, an insight that an ai can try to mimic but will never be able to replace. and i think that people will try to seek out human created art because they're bored by AI mimicry, they'll want real connection with a piece of art or a novel etc.

the real ones will continue to make their art regardless of market success, a payday, or technological changes

reply

actionfromafar | karma 6262 | avg karma 1.82 · 2024-02-28 16:05:33

I can really understand the fear of having your paychecks stop. Many artists will continue to be artists, but some found jobs doing what they love and there is a shakeup going. Everything will settle at some point, but the transition will be rough for many.

greenie_beans | karma 1990 | avg karma 1.67 · 2024-02-28 16:14:51

you're right. it's hard for me to be sympathetic towards that though, because i chose not to make writing my day job.

but my partner is a journalist and we're not over here freaking out. perpetual lay offs have always been a problem in media during our lifetime. we'll adapt.

reply

ipaddr | karma 6901 | avg karma 1.37 · 2024-02-28 17:31:10

And new ones are created. I started selling AI art but never once sold a human made art. Art is evolving and so must artists.

xetplan | karma 24 | avg karma 0.36 · 2024-02-29 11:48:30

Yea all creative people are the same. No creative person would ever be interested in new mediums of expression.

The truly creative are just looking to do the same things over and over lol.

reply

rchaud | karma 15048 | avg karma 2.21 · 2024-02-28 16:35:13

So as long as creative people can overturn the momentum of the largest corporations sinking billions of financial and political capital into making AI synonymous with computing, we'll be OK? /s

greenie_beans | karma 1990 | avg karma 1.67 · 2024-02-28 16:55:37

https://en.wikipedia.org/wiki/Arts_and_Crafts_movement

Qwertious | karma 3248 | avg karma 2.18 · 2024-02-28 15:49:06

If the content is actually intended to be informative, then it's a poisoned well unless it's been manually vetted for hallucination. If it's entertainment though, then if it works it works.

JohnFen | karma 30257 | avg karma 1.9 · 2024-02-28 15:51:47

We may be in the minority, but there are plenty of people who share our opinion on this.

Whole swaths of the internet became uninteresting to me a long time ago. AI promises to make the rest of it worthless to me as well.

The fact that AI models are being trained on public-facing web content without the consent of the authors has already driven me and others to remove all our content from the public-facing web, as there is no other way to protect ourselves. That eliminates a large part of the value of the web right there.

reply

smnrg | karma 139 | avg karma 2.01 · 2024-02-28 18:02:50

What strategies did you use? Other than quitting social media, blocking crawlers and enforcing an "email-wall" is what I just started to do with my personal website. I feel that it's hard to stay on top of "pretty please" robots.txt requests, especially with new, undeclared ones. I prefer word-of-mouth diffusion of links, skipping the algorithmic promotion game altogether. I decided that I won't ever rely on my online writings, recordings, or photography as main income again (I did that once: it worked, but it felt gradually poisoned by the need to build engagement.) But it feels weird that we got to the point where I have to shield a blog from bulimic information machines that pollute valuable knowledge, and I wonder what the right strategy is.

JohnFen | karma 30257 | avg karma 1.9 · 2024-02-28 21:05:59

> What strategies did you use?

I see no other strategies that could possibly work. All it takes is one crawler to get through or ignore your defenses and your work has been used to help train a model. At that point, the horse is out of the barn. Any partial defense seems the same as no defense to me on this point.

But it depends on what you're trying to protect. My concern isn't actually that my content may get reproduced through an LLM. I just don't want any of my work to be used to develop or improve these models, period.

reply

smnrg | karma 139 | avg karma 2.01 · 2024-02-29 02:34:22

Yep, we agree on ethical stands. It's not that the uniqueness of my work is lost or anything like that. It's the choice of not feeding.

add-sub-mul-div | karma 1891 | avg karma 2.23 · 2024-02-28 15:56:24

You might be in the minority in the context of this site's bubble but I don't think you're in the minority overall.

kurthr | karma 8308 | avg karma 3.17 · 2024-02-28 16:00:40

Best solution I've seen so far is to make the publisher liable for "generative" content, if they can't point to a human creator. Also, prevent any copyright protection unless you can point to a transformative human creator. There might be people "faking" the creation, but then it's not a corp, just a human.

tmnvix | karma 2202 | avg karma 2.54 · 2024-02-28 16:18:15

I was thinking just the other day that a recognisable 'AI Free' badge on sites that want to assure people that none of their content is AI generated could gain a lot of traction.

rchaud | karma 15048 | avg karma 2.21 · 2024-02-28 16:31:31

If the badge ever gained traction, every garbage AI site would slap one on.

rchaud | karma 15048 | avg karma 2.21 · 2024-02-28 16:29:09

The web has been headed down this road for years. We've all seen reputable blogs sink to publishing "Here are our favourite smart speakers for Xmas", to keep the lights on. What is that but a soulless piece of data-driven sponsored content pushing products with the highest affiliate payout rates?

Google only cares about these types of sites now, because they are the ideal customer for Google Ads. They churn out content constantly, they advertise heavily, with Google getting a cut of every click and impression.

And let's face it, YouTube is basically QVC for electronics, toys and makeup, all multi-billion $ industries that Google is happy to slap ads on top of.

reply

Havoc | karma 16454 | avg karma 2.29 · 2024-02-28 17:15:30

The era of artisanal internet is upon us! Everything AI and a couple of folks still doing it like on the good old days haha

touisteur | karma 1513 | avg karma 1.37 · 2024-02-29 17:03:31

I haven't touched the spice in years, but I foresee some kind of content butlerian jihad very soon. It's already started on music, where YouTube is filled with ai-generated 'songs' from e.g. Sia, si much that I stopped using YouTube but for recording of live shows, and I guess soon this will be enshittified too, bye YouTube.

Maybe very little in the scheme of things but not a 'view' or click from me.

reply

spking | karma 18070 | avg karma 5.92 · 2024-02-28 13:37:03

What happens once the stipend goes away? What happens when the tool stops being “free”? I know these publishers are desperate for lifelines at the moment but I hope they are thinking ahead to the time that Google no longer needs them or their content.

RyanHamilton | karma 620 | avg karma 2.55 · 2024-02-28 13:55:18

More like what if long term google cuts out the publishers. I mean when someone googles X, show or generate the article now. They only need the publishers now to get a human feedback loop.

ipaddr | karma 6901 | avg karma 1.37 · 2024-02-28 17:38:16

They still want social and kagi traffic. But the lack of google traffic will turn on them.

a_gnostic | karma 135 | avg karma 0.47 · 2024-02-28 14:22:38

There is exactly one form of organization that wants the sort of Gleichschaltung that comes from this sort of tool, and it is not funded voluntarily.

bdjsiqoocwk | karma 467 | avg karma 0.61 · 2024-02-28 13:37:09

Google is doing the thing that it punishes other publishers for doing? Whaaaaa? /s

john___matrix | karma 82 | avg karma 5.12 · 2024-02-28 13:40:40

I don't really see how this isn't just content theft at this point. Pointing at "inspiration sites" and just rewording their content feels pretty scummy at best.

At what point are content creators and publishers going to be paid for or given the option to block AI scraping tools using their material so blatantly to generate income for other companies and publishers?

reply

a_gnostic | karma 135 | avg karma 0.47 · 2024-02-28 14:25:00

You wouldn't download a car.

randomdata | karma 8455 | avg karma 1.21 · 2024-02-28 16:18:19

> Pointing at "inspiration sites" and just rewording their content

Sounds like Reddit, minus the AI part. Any time someone might be on the verge of an original thought there it gets shot down with something akin to "[citation needed]", followed by strong social pressure for the user to stop participating if they can't manage just rewording "inspiration sites".

What's special about AI here?

reply

emmo | karma 152 | avg karma 2.49 · 2024-02-28 20:45:43

Scope

randomdata | karma 8455 | avg karma 1.21 · 2024-02-28 20:53:19

What kind of scope?

Delumine | karma 131 | avg karma 2.91 · 2024-02-28 13:51:23

Why are they funding this? I don't want to read AI-generated news.

rightbyte | karma 9157 | avg karma 1.83 · 2024-02-28 14:00:14

There are some sites that don't use Adsense.

I guess Google want adfarms with adsense copying their content?

reply

hef19898 | karma 16236 | avg karma 1.77 · 2024-02-28 15:37:30

There is no way this behaviour would be illegal, right?

WarOnPrivacy | karma 7150 | avg karma 1.96 · 2024-02-28 14:06:07

> I don't want to read AI-generated news.

I didn't think you were right. But you're right.

create aggregated content more efficiently by indexing recently published reports generated by other organizations, like government agencies and neighboring news outlets, and then summarizing and publishing them as a new article.

reply

JohnMakin | karma 5555 | avg karma 4.22 · 2024-02-28 16:03:00

Would anyone? They don't care. It's so beyond the pale now it's almost gone into satire.

WarOnPrivacy | karma 7150 | avg karma 1.96 · 2024-02-28 13:54:37

What is being scraped:

using factual content from public data sources—like a local government’s public information office or health authority.

edit: what is also being scraped:

like government agencies and neighboring news outlets

I yield to the pessimists.

reply

starbugs | karma 2990 | avg karma 4.22 · 2024-02-28 15:31:52

> I yield to the pessimists.

To give you an optimistic perspective: I don't think it qualifies as pessimism to point out obvious issues with some of the "approaches" currently developing in this field. If enough people do it, maybe something will change for the better?

reply

ametrau | karma 183 | avg karma 0.54 · 2024-02-28 14:14:13

I don’t think this is a new problem. It’s was solved with copyright laws. And you can now enjoy original output. The law just needs to catch up (and hopefully quickly).

a_gnostic | karma 135 | avg karma 0.47 · 2024-02-28 14:17:43

I wonder who curates the AI blocklist, and how skipping sites will affect the internal bias.

ilrwbwrkhv | karma 5424 | avg karma 2.2 · 2024-02-28 14:36:37

Can't wait for google to die out. Glad to see they are on their way.

zooq_ai | karma 131 | avg karma 0.56 · 2024-02-28 15:40:42

Ha Ha! This is the exact sentiment HN had about Meta/Facebook about 18 months ago.

This entire thread is a reminder as to how out of touch and naive most HN commentary are when it is outside their area of a very narrow field of expertise.

reply

ipaddr | karma 6901 | avg karma 1.37 · 2024-02-28 17:57:34

Facebook usage/imagine has dropped. Fights with congress. Average person moving to other socials. Mark was planning a run for president which is dead. Metaverse stalled. Plenty of money, ad revenue and other social networks they control.

IBM is still around. It took yahoo forever to die. The corpse of altavista and ask jeeves still make money.

reply

zooq_ai | karma 131 | avg karma 0.56 · 2024-02-29 20:18:59

Step outside the HN bubble my friend

https://www.statista.com/statistics/346167/facebook-global-d...

reply

visarga | karma 12425 | avg karma 1.65 · 2024-02-28 15:13:50

> It’s hard to argue that stealing people’s work supports the mission of the news. This is not adding any new information to the mix.

Not so different from current publications. Most human written news is also based on repackaging Twitter and press agencies. Few journalists actually gather original content. Should be OK to copy news information from any source without consent, its fact based not creative data.

reply

999900000999 | karma 3774 | avg karma 2.36 · 2024-02-28 16:10:22

Ironically I had a side project that essentially summarized stories from RSS feeds and reposted them on my own blog. I provided full attribution and links to the original articles.

Applied for AdSense and was denied as a low quality website. Not that I expected to make much anyway.

Now basically the same thing is an official Google project.

reply

htrp | karma 3449 | avg karma 2.55 · 2024-02-28 16:19:58

It's cool when they do it, it's a problem when you do it

andro_dev | karma 8 | avg karma 0.62 · 2024-02-28 16:59:58

Podcast: Why Google Is Shit Now

This week we go long on Google News and AI, and why Google Search is worse now. We also discuss a phone spy tool that can monitor billions.

https://www.404media.co/podcast-why-google-is-shit-now-404-m...

reply

ChrisLTD | karma 2670 | avg karma 3.66 · 2024-02-28 21:00:57

This seems like a great way to poison the well. But maybe Google already plans on search dying out and they are scrambling to find the next big thing?

gundmc | karma 3181 | avg karma 5.19 · 2024-02-28 22:01:27

I could see this going horribly wrong, but honestly a LLM that scrapes local government data sources, police blotters, etc, highlights important or interesting information and makes it understandable sounds pretty great.