Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I still find it funny we managed to get image generation working so much better than text.

If you care about veracity then image generation works about as well as text. Frequently you can find details of the image that are just bizarrely wrong, such as hands or food or other basic things. It's the same basic problem: there's no intelligence behind what it's doing, it just regurgitates mostly realistic-seeming pixels that are pretty good at fooling the casual viewer.

Really, it's like those moths with eyespots on them: good at fooling the brain's heuristics but obviously not real.



sort by: page size:

I think image generation is an interesting case, because even if a human is always in the loop, and you need to try several times before you get a good image for your prompt of interest, that's likely still faster and cheaper than photoshopping exactly what you want (or certainly faster than hiring an illustrator). And the images produced are sometimes really quite good. A model which produces some amount of really messed up images can still be 'useful'.

_However_ the kinds of failures it makes highlight that these models still lack basic background knowledge. I'm willing to let the stuff about compositionality slide -- that's asking kind of a lot. But I do draw a very straight line from DeepDream in 2015 producing worm-dogs with too many legs and eyes, style-gan artifacts where the physical relationship between a person's face or clothing and surroundings was messed up, and the freakish broken bodies that stable diffusion sometimes creates. Knowing about the structure of _images_ only tells you a limited amount about the structure of things that occupy 3d space, apparently. It knows what a John Singer Sargent portrait looks like, but it's not totally sure that humans have the same number of arms when they're hugging as when they're not.

In the same way, large language models know what text looks like, but not facticity.

So I don't know that an AI winter is called for. But maybe we should lean away from the AI optimism that we can keep getting better models by training against the kinds of data that are easiest to scrape?


Seems like that is another kind of labeling to me: is our generated image good enough to fool a human?

A very interesting take, but I am not sure I agree. First, this elaborates the idea on the image generation, but then applies it to all "generative AI", but no argument supports that in the post.

As of images, if anything, it was possible before with some good photo editing skills, or even with a good back story for a fuzzy photo (UFOs, Loch Ness Monster, etc.). If anything, I am glad that it is even easier now with the generative AI: I hope it forces more people to think through the novel information and decide for themselves whether they are accepting it or not, or if they want to research further. This is something we would do after reading a blog post, for example, so why would images need to be treated differently? We are too much used to treat the photos as proof or a fact.


From what I've seen, AI generated images tend to be locally-consistent but holistically-inconsistent.

Which works, because most people on the internet tend to be detail-oblivious!


Ai Image generators fail spectacularly at many kinds of prompts. Is this really remarkable ?.

I don't want to be overly pessimistic, but I have a strange feeling about AI image or text generators. Who can prove that it actually does what it says?

I have a conspiracy theory: Somebody found a way to quickly generate pictures in this style in a semi-manual fashion, and hired a bunch of people to fake this, mechanical turk style. I have never found an interactive generator that completely convinced me, neither for images nor for text. Either the results are really bad, or the responses take so long that a person could have made it.

But why would somebody fake something like this? There is a lot of money in AI. It seems totally plausible that someone invests five or six figures to set up a fake demonstration, in order to raise millions in investments.

Now I want to make clear while I get a big Theranos vibe from these kind of demos, I don't want to accuse anybody concretely. I think a lot is technically possible and I do believe a lot of research is real, but I also believe some demos are "fake it till you make it" territory.


It's an image generator, so relying on it for any "factual" images is misguided.

Potentially, but I don't think in practice, because most image generation is not trying to be photorealistic. Most models are trying to stylize their outputs, either by default (e.g. a Miyazaki-anime-like model), or in response to the prompt.

Using AI image generation for misinformation is by far the vast minority of its use.

And if the output is stylized, it's going to be much more obvious if it has come from an AI model, because a human is going to have a much harder time reproducing a specific ML model's style on-demand (e.g. if their art teacher asks them to sketch a face in the same style, to prove their homework wasn't faked).


I assume most images and videos on the internet are doctored and have been since the invention of photoshop.

I don't feel the need to check for reality. I've seen screenshots from games that look more realistic than some pictures I've taken (Forza on max settings at the right angle might as well be a photograph) and this trend will only continue.

This tool can be considered more of an automated version of meme communities, where dedicated members will spend an hour photoshopping muppets into historic events and making the pictures look absolutely believable. The only novelty I see is that the computer now does a lot (but not all) work for you.

There are nice opportunities here. If you need a stock photo of something very specific, you'll soon be able to generate one with the right query and the right AI. Small companies can generate fancy brands without paying professional designer's fees, especially if all they need is a billboard and not a whole suite of office supplies. You can generate your own posters and decorations featuring interesting landscapes and scenes in any style you want.

The current iterations of these algorithms are quite limited in many aspects and sometimes uncanny or even horrifying, but I look forward to a future where I can imagine something, describe it, and have it rendered into digital art just like I pictured, without having to spend decades on honing my skills as an artist.


I'm not surprised that the generated imagery was sourced from already existing content. What is alarming is that it seems the AI simply regurgitated a source image almost unretouched! Where's the intelligence?

Reminds me of how AI image generators draw stuff slightly wrong

Yeah, that's the unfortunate side effect of generated images, there's no comprehension of generation on the AI's side of things. People rarely come out correctly right now, but animals tend to do a bit better. Inanimate objects are mostly useable.

This is more of a proof of concept for us, and I'm trying to solve for some of this by adding voting and improved prompts on the back end. I'm hopeful that in a few weeks we'll have a much improved version out with less "horror show" images overall.


I think the root problem is assuming that these generated images are representations of anything.

Nobody should.

They’re literally semi-random graphic artifacts that we humans give 100% of the meaning to.


Honesty I find the pictures it generates are very boring to look at, in fact the original picture has some cute quality in it, although being terse.

I think this applies to all AI generated images though.


No snark intended but I think image generation is probably the least interesting thing AI is doing these days. It's a tutor in a box.

There's a quality spectrum of AI-generated images, sure, but they're all equally artistically void.

Is it really "better" if the image is simulated nonsense? Better would be conceding that nothing is truly decipherable.

That's not how the AI works. It also ignores all the work from the Language model that goes into the art. The language model can fill massive gaps in the image generation.

All the negative examples are also instructing the AI how to make an image, not just the most similar images.

This is a bad joke that reinforces poor understanding of how image generation works.


I think it's not just that they're good enough that it's worthwhile discussing how it fails. It's also that it's the same way it fails as has been the weak point of AI generated images for years.

It's great and I love it, but the wow factor is fading and the easily recognizable problems remain.

next

Legal | privacy