One friend once told me, even a few years before ChatGPT, that they use AI to generate daily horoscopes in his online newspaper. Using far more simple models.
Context “mixers” that put together copy and rearranged sentences were popular in black hat SEO for years. The early versions of ChatGPT have been used to generate SEO and marketing copy since ChatGPT 2.
This has been going on for a long time. The same low-quality sites that you ignored 2 years ago are just getting slightly better content that you’ll never read.
Text generation using word/sentence/phrase-level ad-libs is so simple its a common homework assignment for first-year programmers. You really don't need much.
There's even tools like Tracery that let you construct grammars and vocabularies as a form of textual generative art.
20 years ago (HTML was still king, Javascript hardly used) I worked on a horoscope product. The data provider had all daily/weekly/monthly horoscopes of the coming year ready. Whatever their process, I assume manual, must have been straight forward.
"Handmade" or "manmade" is about to have a whole new relevance on the internet.
I'm sure I'm in the minority, but the mere suggestion that some purportedly creative "content" (ugh, that word sounds like saying "sausage meat") is AI generated makes me completely lose interest. Soon enough that might make me lose interest in whole swathes of the internet, but I can't deny that's been an ongoing process regardless for a while anyways.
Nothing wrong with AI generated content if it's info I need and it's factually correct.
The problem I have is that it's usually junk fluff pieces with way too much text for SEO optimization.
I feel anything computer generated should be short bullet points or comparison tables, not long form text as it's just not good at that.
For example think of a blog post where the author walks you through its thought process, trial and error and ultimate solution. That is very humane and relatable, something an LLM can approximate/copy but it will never be genuine. I'd prefer in that case just the raw solution, not a fake and deceptive walkthrough.
except, translated to other languages with zero effort by the publisher.
I agree with the sentiment, at the same time commercial drivers will fill the pipes with low-quality content quickly, because $MONEY. The literate, subtle and artsy are once again buried in commercial content. The only way forward is rebuilding chains of editorial recommendation. Ordinary browsing will become a junk-fest. It appears the pandora's box is opened once again.
What was really jarring recently was to browse literate, subtle and artsy content only to be repeatedly interrupted by genuinely offensive and/or borderline scam content ads, then back to the chosen content. real.
Yeah, right now most human written content isn't that good, either. Quality writing has largely been abandoned for verbosity and formality, offends no one, and often lacks substance or that human touch. I'm guessing AI content will be about as flavorless. But time will tell.
Yeah. It is difficult see AI supplanting humans for the things you go outside for, but any human involvement on the internet has always just been an implementation detail.
After their recent over-censorship of Gemini I look forward to Google's fair, even handed, completely unbaised approach to the delicate topic of race relations and tensions as the world's population slowly seems to descend into more hatred for each other over silly differences. /s
agreed, but i see this as an opportunity for creative people to make art and communicate things that a computer can't. i'm skeptical that an AI can write a great novel about the matters of the heart and feels like it's alive (not without trying). i think, in the age of generic AI content, that some people will be seeking authenticity even more than we do now.
Creative people hate this trend and don’t work that way (see possible optimizations via the grateful computer overlords). Source - know a lot of artists and writers.
The creative classes already trusted this stuff the first round in 2008+, and it ate their lunch.
i'm a writer. the only thing i can do is keep making my art with the insight of the human condition, an insight that an ai can try to mimic but will never be able to replace. and i think that people will try to seek out human created art because they're bored by AI mimicry, they'll want real connection with a piece of art or a novel etc.
the real ones will continue to make their art regardless of market success, a payday, or technological changes
I can really understand the fear of having your paychecks stop. Many artists will continue to be artists, but some found jobs doing what they love and there is a shakeup going. Everything will settle at some point, but the transition will be rough for many.
you're right. it's hard for me to be sympathetic towards that though, because i chose not to make writing my day job.
but my partner is a journalist and we're not over here freaking out. perpetual lay offs have always been a problem in media during our lifetime. we'll adapt.
So as long as creative people can overturn the momentum of the largest corporations sinking billions of financial and political capital into making AI synonymous with computing, we'll be OK? /s
If the content is actually intended to be informative, then it's a poisoned well unless it's been manually vetted for hallucination. If it's entertainment though, then if it works it works.
We may be in the minority, but there are plenty of people who share our opinion on this.
Whole swaths of the internet became uninteresting to me a long time ago. AI promises to make the rest of it worthless to me as well.
The fact that AI models are being trained on public-facing web content without the consent of the authors has already driven me and others to remove all our content from the public-facing web, as there is no other way to protect ourselves. That eliminates a large part of the value of the web right there.
What strategies did you use? Other than quitting social media, blocking crawlers and enforcing an "email-wall" is what I just started to do with my personal website. I feel that it's hard to stay on top of "pretty please" robots.txt requests, especially with new, undeclared ones. I prefer word-of-mouth diffusion of links, skipping the algorithmic promotion game altogether. I decided that I won't ever rely on my online writings, recordings, or photography as main income again (I did that once: it worked, but it felt gradually poisoned by the need to build engagement.) But it feels weird that we got to the point where I have to shield a blog from bulimic information machines that pollute valuable knowledge, and I wonder what the right strategy is.
I see no other strategies that could possibly work. All it takes is one crawler to get through or ignore your defenses and your work has been used to help train a model. At that point, the horse is out of the barn. Any partial defense seems the same as no defense to me on this point.
But it depends on what you're trying to protect. My concern isn't actually that my content may get reproduced through an LLM. I just don't want any of my work to be used to develop or improve these models, period.
Best solution I've seen so far is to make the publisher liable for "generative" content, if they can't point to a human creator. Also, prevent any copyright protection unless you can point to a transformative human creator. There might be people "faking" the creation, but then it's not a corp, just a human.
I was thinking just the other day that a recognisable 'AI Free' badge on sites that want to assure people that none of their content is AI generated could gain a lot of traction.
The web has been headed down this road for years. We've all seen reputable blogs sink to publishing "Here are our favourite smart speakers for Xmas", to keep the lights on. What is that but a soulless piece of data-driven sponsored content pushing products with the highest affiliate payout rates?
Google only cares about these types of sites now, because they are the ideal customer for Google Ads. They churn out content constantly, they advertise heavily, with Google getting a cut of every click and impression.
And let's face it, YouTube is basically QVC for electronics, toys and makeup, all multi-billion $ industries that Google is happy to slap ads on top of.
I haven't touched the spice in years, but I foresee some kind of content butlerian jihad very soon. It's already started on music, where YouTube is filled with ai-generated 'songs' from e.g. Sia, si much that I stopped using YouTube but for recording of live shows, and I guess soon this will be enshittified too, bye YouTube.
Maybe very little in the scheme of things but not a 'view' or click from me.
What happens once the stipend goes away? What happens when the tool stops being “free”? I know these publishers are desperate for lifelines at the moment but I hope they are thinking ahead to the time that Google no longer needs them or their content.
More like what if long term google cuts out the publishers. I mean when someone googles X, show or generate the article now. They only need the publishers now to get a human feedback loop.
I don't really see how this isn't just content theft at this point. Pointing at "inspiration sites" and just rewording their content feels pretty scummy at best.
At what point are content creators and publishers going to be paid for or given the option to block AI scraping tools using their material so blatantly to generate income for other companies and publishers?
> Pointing at "inspiration sites" and just rewording their content
Sounds like Reddit, minus the AI part. Any time someone might be on the verge of an original thought there it gets shot down with something akin to "[citation needed]", followed by strong social pressure for the user to stop participating if they can't manage just rewording "inspiration sites".
create aggregated content more efficiently by indexing recently published reports generated by other organizations, like government agencies and neighboring news outlets, and then summarizing and publishing them as a new article.
To give you an optimistic perspective: I don't think it qualifies as pessimism to point out obvious issues with some of the "approaches" currently developing in this field. If enough people do it, maybe something will change for the better?
I don’t think this is a new problem. It’s was solved with copyright laws. And you can now enjoy original output. The law just needs to catch up (and hopefully quickly).
Ha Ha! This is the exact sentiment HN had about Meta/Facebook about 18 months ago.
This entire thread is a reminder as to how out of touch and naive most HN commentary are when it is outside their area of a very narrow field of expertise.
Facebook usage/imagine has dropped. Fights with congress. Average person moving to other socials. Mark was planning a run for president which is dead. Metaverse stalled. Plenty of money, ad revenue and other social networks they control.
IBM is still around. It took yahoo forever to die. The corpse of altavista and ask jeeves still make money.
> It’s hard to argue that stealing people’s work supports the mission of the news. This is not adding any new information to the mix.
Not so different from current publications. Most human written news is also based on repackaging Twitter and press agencies. Few journalists actually gather original content. Should be OK to copy news information from any source without consent, its fact based not creative data.
Ironically I had a side project that essentially summarized stories from RSS feeds and reposted them on my own blog. I provided full attribution and links to the original articles.
Applied for AdSense and was denied as a low quality website. Not that I expected to make much anyway.
Now basically the same thing is an official Google project.
I could see this going horribly wrong, but honestly a LLM that scrapes local government data sources, police blotters, etc, highlights important or interesting information and makes it understandable sounds pretty great.
Nobody ever noticed.
reply