I agree with most of this, but I do disgree with the thing about producing specific imagery. It's absolubtely a skill one can develop. I spend a lot of time helping people leearn to simplify their prompts and choose the right language for image generation AIs. For some reason people put a lot of unnessacary junk into them, I guess a form of superstition (this sentence fragment worked well the last few times).
As the article mentions, the hybrid approach (using this as a tool in a series of other tools) is the way forward
There are concepts the AI simply will not grasp. For example right now midjourney will extremely struggle with "bulldozer", "centaur", "fantasy archer" etc. These will inevitably fixed (and have in the past) be fixed with new model versions with better training data.
The real struggle comes with either small details or semantic information. For example, its hard to ask it to make a lifelike/photograph scene with everything including the background in focus. Even with "focus stacking" type keywords. "selfie" is about the best word we came up with but unforunately that has significant side effects lol. Perhaps there just isnt enough instances of people specifically describing that property in the training data, but honestly its difficult to even learn english words for these concepts to describe with!
As for small details, it is indeed true that the current approach will probably never scale to handle something like "six blue cubes with a red triangle on each, arranged in a pyramid shape, with a yellow ball balanced on top". But as the author points out, such things will likely be handled with a minimum of photoshop skill using assets made individually
None of which has anything to do with creativity or the original (and visually trained) thought required to conceive of imagery that's commercially useful - which is an actual skill learned through years of study and experience, and which is so routinely ignored by managers and IT people that you completely failed to mention it in your take on the technical issues with prompts.
The issue with prompts is not stacked cubes. It's more like this: Ask 10 software engineers or 10 people from the sales department or 3 people from upper management to come up with visual ideas for ads, and you will have a bunch of shit on black backgrounds, robots, anime, bad copies of things people have seen and subconsciously remember, and zero actual visual ideas that fundamentally work. Designers and illustrators have to fight against and override their unoriginality and terrible ideas all the fucking time just to make a decent product.
That's some real unwarranted hostility. I'm responding to a specific point from the article? We can't just go in circles having an "is it good or isnt it" argument...
Once again - I agree with most of what the author says, including the part about it being a tool in an illustrator's kit
I mean, I've spent 25+ years as an art director, a.k.a. a diffusion model trying to generate what managers and salespeople think they see in their heads, and I tell you, they have no imagination. None whatsoever. As my brother, a photographer, used to say: The problem isn't having a cheap camera, it's who's behind it.
Also, the hostility towards a prompt expert adding another layer of technical "know how" into the process between requests and art in the name of justifying a new job title is entirely warranted.
whatever man. I am not here to carve out a job title or whatever it is you're accusing me of. I commented on a post contributing with my experience of helping others.
I feel like you're projecting a whole lot more onto me than what i'm actually saying.
>Also, the hostility towards a prompt expert adding another layer of technical "know how" into the process... is entirely warranted
I don't know either of you and I have no stake in this but as an outside observer I think you come off pretty unreasonable here still. You seem to think your hostility was justified because you've basically made this person the scapegoat for your frustration about this topic.
Also an outsider - I don't understand how they're being hostile here.
They're relaying their experience and saying that there's more to creating than describing something to be drawn, and that most people lack the training and knowledge of what goes into that.
It's not about learning prompts, it's about learning how to actually design... and then learning prompts.
I feel like the original comment is taking things personally instead of seeing the point they're making through example of their frustration working with others.
Interesting! From my view the responder came off as attacking unnecessarily and very angry. Honestly not sure how else to relay it though. Maybe it's all just how I'm reading it. Have a great one!
There are new models such as from Google which work differently and handle things like counting etc. much better than open source models I don't know if any of them are available yet but they have papers. Like Imagen and something better that came out afterwards.
As the article mentions, the hybrid approach (using this as a tool in a series of other tools) is the way forward
There are concepts the AI simply will not grasp. For example right now midjourney will extremely struggle with "bulldozer", "centaur", "fantasy archer" etc. These will inevitably fixed (and have in the past) be fixed with new model versions with better training data.
The real struggle comes with either small details or semantic information. For example, its hard to ask it to make a lifelike/photograph scene with everything including the background in focus. Even with "focus stacking" type keywords. "selfie" is about the best word we came up with but unforunately that has significant side effects lol. Perhaps there just isnt enough instances of people specifically describing that property in the training data, but honestly its difficult to even learn english words for these concepts to describe with!
As for small details, it is indeed true that the current approach will probably never scale to handle something like "six blue cubes with a red triangle on each, arranged in a pyramid shape, with a yellow ball balanced on top". But as the author points out, such things will likely be handled with a minimum of photoshop skill using assets made individually
reply