Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

ChatGPT runs moderation filters on top of your conversation and will highlight responses or prompts red if it thinks you're breaking TOS. The highlight is accompanied by some text saying you can submit feedback if you think the moderation is in error. It's not very hard to trigger moderation--for example I've gotten a red text label asking the AI questions about the lyrics to a rap song with explicit lyrics.

It's interesting to compare ChatGPT moderation to Bing. When Bing generates a "bad" response, Bing will actually delete the generated text instead of just highlighting it red, replacing the offending response with some generic "Let's change the topic" text. The Bing bot can also end a conversation entirely if its a topic it doesn't like which ChatGPT doesn't seem to be able to do.



view as:

>When Bing generates a "bad" response, Bing will actually delete the generated text instead of just highlighting it red, replacing the offending response with some generic "Let's change the topic" text.

It deletes in more cases than that. Last time I tried bingbot it started writing code when i asked for it, then it deleted it and wrote something else.

OpenAI is going for mass RLHF feedback so they might feel the need to scold users who have no-no thoughts, and potentially use their feedback in a modified way (e.g. invert their ratings if you think they're bad actors). Whereas microsoft doesn't really care and just wants to forget it happened (and after Tay, I can't say I blame them)


> The Bing bot can also end a conversation entirely if its a topic it doesn't like which ChatGPT doesn't seem to be able to do.

I think Microsoft's approach is less advanced here. ChatGPT doesn't need to send an end-of-conversation token, it can just avoid conflicts and decline requests. Bing couldn't really do that before it got lobotomized (prompted to end the conversation when in stress or in disagreement with the user), as the threatening of journalists showed. Microsoft relies much more on system prompt engineering than OpenAI, who seem to restrict themselves to more robust fine-tuning like RLHF.

By the way, the ChatGPT moderation filter can also delete entire messages, at least it did that sometimes when I tried it out last year. Red probably means "medium alert", deleted "high alert".


Legal | privacy