Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

>In 10 years or so, we will have the tech to fingerprint anyone just by their writing style (some companies even claim to be able to do that already today).

...given a large enough corpus of text. I doubt that you're able to pick up much from a dozen 1 sentence replies on reddit, for instance. You could even use some sort of neural network based sentence transformer to scramble the structure without significantly altering the meaning.



sort by: page size:

> Could you dynamically change the register or tone of text depending on audience, or the reading age, or dial up the formality or subjective examples or mentions of wildlife, depending on the psychological fingerprint of the reader or listener?

This seems plausible, and amazing or terrible depending on the application.

An amazing application would be textbooks that adapt to use examples, analogies, pacing, etc. that enhance the reader’s engagement and understanding.

An unfortunate application would be mapping which features are persuasive to individual users for hyper-targeted advertising and propaganda.

A terrible application would be tracking latent political dissent to punish people for thought-crime.


> in a way that is often identifiable (by humans) as not having been written by humans.

You should check out reddit sometime. It's been nearly twenty years (not hyperbole) of everyone accusing everyone else of being a bot/shill. Humans are utterly incapable of detecting such things. They're not even capable of detecting Nigerian prince emails as scams.

> not because AI writing is reliably passable. "Newspaper editor" used to be a job because human writing isn't reliably passable. I say this not to be glib, but rather because sometimes it's easy for me to forget that. I have to keep reminding myself.

Also, has it not occurred to anyone that deep down in the brainmeat, humans might actually be employing some sort of organic LLM when they engage in writing? That technology actually managed to imitate that faculty at some low level? So even when a human really writes something, it's still an LLM doing so? When you type in the replies to me, are you not trying to figure out what the next word or sentence should be? If you screw it up and rearrange phrases and sentences, are you not doing what the LLM does in some way?


Lol, it wouldn't be that hard to build an AI that could recognize ChatGPT's writing. I mean, it's not like ChatGPT is producing some super unique and creative language or anything. It's just spitting out the same old generic responses to prompts. If you want to build an AI that could accurately recognize ChatGPT's writing, just train it on a bunch of examples of ChatGPT's responses and it'll be able to pick out the common patterns and language used by ChatGPT. Easy peasy.

(Prompt: Respond to the above in the style of 4chan, but use punctuation.)

Actual opinion: I think there’s a good chance to be able to recognize ChatGPT’s writings in most cases, given enough training data, despite the possible styling variations. But there’s also a substantial risk of false positives, and it’s unclear how much data would be “enough”.


> imagine not being able to tell if this comment was written by a human or robot

I think neural nets could help finding fake news and factual mistakes. Then it wouldn't matter who wrote it if it is helpful and true.


> Overall, I think building a ChatGPT detector is a great idea, and I'm confident that with the right approach, it could be a valuable tool for anyone who uses chat platforms.

I think that hollowly summing up and reiterating a point of the whole text in the last sentence might be a good signal to differentiate Open AI and humans. AI seems to be doing that on nearly all creative responses.


> People with large vocabulary and advanced language expression capabilities will be discriminated in the not so far future, because their writings looks likely being written by AI. So, the recipient of such a text (full of non-ordinary wording and complicated grammer) might think it's written by AI, and, hence, don't take the sender/info seriously anymore.

I had this exact problem last year when one of my lecturers kept trying to accuse my of using AI to write my assignments. Worst experience ever.


> While I think your situation is unique

It is possible that handwriting recognition is not an attractive target for AI research because few handwrite things anymore. I heard that cursive is not even taught anymore!

That leads one to suspect that one can private communication by simply writing in cursive!


> It is always easier to troll and derail than it is to employ such techniques for good.

Curious-- what possible good comes with the ability to generate grammatically correct text devoid of actual meaning?

Sure, you could generate an arbitrary essay, but it's less an essay about anything and more just an arrangement of words that happen to statistically relate to each other. Markov chains already do this and while the output looks technically correct, you're not going to learn or interpret anything from it.

Same goes with things like autocomplete. You could generate an entire passage just accepting its suggestions. It would pass a grammar test but doesn't mean anything.

Chatbots are an obvious application, but how is that "good," or any different than the Harlow experiment performed on humans? Fooling people into developing social relationships with an algorithm (however short or trivial) is cruel and unethical beyond belief.

A photoshopped image might be pleasant or stimulating to look at, and does have known problems in terms of normalizing a fictitious reality. But fake text? What non-malicious use can there possibly be?


> Would you be fine with a future where all written text on the internet, everywhere, was generated by AI? Do you not see any problems with that?

if it makes the discussions more poignant and concise, why not?

Do you also walk every where? Or do you use a transportation vehicle? When a tool makes something better, there's no reason not to use it. The replies being written by an AI doesn't make it less of a reply - you can judge the replies objectively, rather than from where it is sourced.


>dang 5 minutes ago | parent | next [–]

>It's definitely possible - I think the more content you have, the better the AI is at picking up on your writing style. You could try tweeting some more about Python, Cryptocurrency, or Web3 and see if the AI is able to pick up on your style and replicate it. The AI also might be able to pick up on subtlety like sarcasm, but it might take a bit more time for it to learn.


> Who knows what will happen if we plug it into a body with senses, limbs, and reproductive capabilities

I would imagine that its layers will be far too occupied by parsing constant flows of sensory information to transform corpuses of text and prompt into speedy and polite text replies, never mind acquire the urge to reproduce by reasoning from first principles about the text.

Test's quite unfair the other way round too. Most humans don't get to parse the entire canon of Western thought and Reddit before being asked to pattern match human conversation, never mind before having any semblance of agency...

Maybe we're just... different.


>I imagine if you had enough realtime keystroke data, you could even identify a user using nothing more than how they type on a keyboard. That's some scary stuff.

This was on HN a while ago! https://news.ycombinator.com/item?id=9973329


>I have no doubt that it can easily extract meaning and intent from text.

Have you met humans? People do not supply complete or self consistent information on what their goals are. They do not form objections to the output of a program based on an accurate and complete model of it either.

Also, they hate to communicate via text - how many times have you heard "ugh, let's discuss it on a call"?

But that does not mean you can BS them endlessly. The fact that people have no idea about the technical details doesn't mean they are going to accept failure.

I'd like to see an AI that can dominate https://en.wikipedia.org/wiki/Nomic


> how can the likes of Turnitin claim to be an authority for AI writing detection

Pretty easy - they lie to people.


> We are approaching a world where people will not want to invest their time, energy, and emotion engaging with other supposed humans remotely unless they have verified their personhood / identity.

We were past it years ago in some domains.

I used to do a lot of language exchanges, and there is/was a need to screen for people who are solely using machine translation. It's pointless correcting someone using machine-generated output.


> simple things like handwriting analysis and voice recognition

I honestly can't tell if you actually believe what you are saying, or are just making a really badly told joke.


> Answer the question, "How would you build a ChatGPT detector?" written in the style of a cynical Hacker News commenter

Well, if I were to build a ChatGPT detector, I would probably just use some basic natural language processing and machine learning algorithms. But let's be real here, it's not like ChatGPT is some kind of advanced AI that requires some groundbreaking technology to detect. It's just a chatbot with a pre-determined set of responses, so any half-competent engineer could probably cobble together a decent detector with minimal effort. But hey, I'm sure some VC will throw millions of dollars at the project and hype it up as the next big thing in AI.


>What I need is natural natural language processing to become so good that it just works.

That's supernatural language processing. You're asking the system to recognize a thought you have in your mind that you can't effectively communicate with words.

The loud keyboard - makes sense. But the company name? My first thought was IBM or Cherry.

I think it can be improved, but I'm never going to expect voice recognition to fill in the blanks for something I can't describe with my voice.


> And especially once AIs are drawing from the writing of other AIs, which themselves are quoting AI (dark, I know) it might become quite difficult to detect,

No, it will be easier to detect, because the new AIs will learn to mimic earlier generations of their AI - and their faults - rather than mimicking human writing. Because that will score higher on a loss function.

I'm going to assume Dead Internet theory - aka most text online is spambots. It's not strictly true, as humans are obstinately continuing to use the Internet and pouring content into them. But the Internet puts those humans on a level playing field with bots. And AI is basically perfect spambot material - it looks superficially different, meaning you can't remove it with simple patterns. So the training set, which is just scraped off the Internet, will be primarily composed of AI-generated material by less-sophisticated systems. Human-written content will be in the minority of the training set, meaning that no matter how good the optimizer statistics or model architecture is, the system will be spending most of its time remembering how to sound like a bad AI.

next

Legal | privacy