I would like to also couple an algorithm to generate those images. Maybe just randomly select nouns for the title, but pass on augmented/cleaned versions of the title to some generator + CLIP to autogenerate fitting images.
You can poison all your images with Glaze and Nightshade. Then you don't have to stop them from using them - they have to stop themselves from using them or their image generator will be useless. I don't know if there's a comparable system for text. If there was, it would probably be noticeable to humans.
you simply have a tool which extracts the text, transforms it however you please, and then puts it back as an image.
Then you still need some way to handle text, and image to text is not reliable.
Text simply has so much more distilled information. Images are nice for humans, but I can't imagine them as a storage format for programs.
I'd like to find a way to start with an embedding and have the computer generate some text that corresponds, at least approximately. There are tools that do that for images, right? Like Stable Diffusion, you can put an image in, get an embedding, then do gradient descent in latent space to find a new embedding, then generate a new image from that.
reply