Stable Diffusion Textual Inversion

kelseyfrog | karma 8211 | avg karma 2.69 · 2022-08-29 16:23:56

And in a blink of an eye, the career potential of all aspiring "Sr. Prompt Engineer"s vanished into the whirlpool of automatable tasks.

On a more serious note, this opens up the door to exploring fixed points of txt2img->img2txt and img2txt -> txt2img. It may open the door to more model interpretability.

reply

keepquestioning | karma -15 | avg karma -0.04 · 2022-08-29 16:32:21

ELI5 - why has there been a cavalcade of Stable Diffusion spam on HN recently? What does it all mean?

fxtentacle | karma 18712 | avg karma 5.34 · 2022-08-29 16:36:12

You can now fire artists/designers and replace them with AI. Obviously, that's cheaper.

dougmwne | karma 13708 | avg karma 5.33 · 2022-08-29 16:53:53

Someone will surely come by soon and tell us, “well actually… artists and graphic designers are irreplaceable.”

But for real, plenty of people are going to start rolling their own art and skipping the artist. Not Coca-Cola, but small to medium businesses doing a brochure or PowerPoint? Sure!

reply

visarga | karma 12425 | avg karma 1.65 · 2022-08-29 16:59:09

I think there's going to be plenty of work in stacking multiple AI prompts or manual retouching to fix rough spots. It automates a task, not a job. Some people won't use it at all and other people will use it only for reference - in the end doing everything by hand, as usual, because they have more control and because AI art has a specific smell to it and people will associate it with cheap.

But it's not just for art and design, it has uses in brainstorming, planning, and just to visualise your ideas and extend your imagination. It's a bicycle for the mind. People will eat it up, old copyrights and jobs be damned. It's a cyborg moment when we extend our minds with AI and it feels great. By the end of the decade we'll have mature models for all modalities. We'll extend our minds in many ways, and applications will be countless. There's going to be a lot of work created around it.

reply

PcChip | karma 276 | avg karma 1.93 · 2022-08-29 18:57:53

>AI art has a specific smell to it and people will associate it with cheap.

It might now, but I feel like that will be trained out of it a few more papers down the line

reply

Larrikin | karma 5756 | avg karma 2.83 · 2022-08-29 21:14:49

People rarely write assembly code nowadays because we mostly all use higher level abstractions that let us write more powerful code with fewer lines.

There are plenty of small shops now where somebody knows a little Photoshop and can eek out a design that they otherwise wouldn't be able to using pen and paper.

There are also professionals that use the Adobe suite to enhance their abilities they've cultivated for years.

AI art will simply be a tool that enhances artists but might take away some low hanging fruit jobs similar to how web frameworks pushed people out of the job of webmaster and into more specific roles.

reply

adrianN | karma 29995 | avg karma 2.78 · 2022-08-30 01:36:07

If you automate part of a job, you need fewer people doing the job. We still have farmers, but we automated enough of the job to do the same work with a thousand times fewer farmers.

blensor | karma 1058 | avg karma 2.91 · 2022-08-30 08:48:37

I think the first business to crack will be stock image sites like shutterstock.

Those were already used in a "let's find something that roughly fits what I want to communicate with this text" way.

Today I created a quick get well soon card using an image from the new Midjourney beta and I have to say the result was exactly as good as if I had used Shutterstock but it took me much less time because the search prompt matched created something I wanted on the third try.

Comparing that to sifting through pages and pages of vaguely relevant images it's a clear win and a lot cheaper

reply

djmips | karma 3793 | avg karma 1.45 · 2022-08-29 17:07:31

For sure it automates some work. For example, my sometime hobby of making silly photoshops looks like it will now be a whole lot easier... Visual memes can just be a sentence now. For more serious work I wonder... But it does give pause about what it means for other forms of work.

Eji1700 | karma 1392 | avg karma 5.31 · 2022-08-29 17:11:39

It'll be an interesting line to be sure.

Right now the tech still requires some nuance to be able to slap it all together into what I think most people would want.

While i expect the interface and the like to get a lot better, all good tutorials of this tech so far show many iterations over many different parts of an image to get something "cohesive". Blending those little mini iterations together is VASTLY easier than just making the whole thing, but not just plug and play for something professional.

Still there will be a huge dent in how long it takes to make certain styles of work and that will lower demand considerably, and there's a large market of artists who thrive on casual commissions which this might replace.

reply

boredtofears | karma 598 | avg karma 1.98 · 2022-08-29 19:03:00

Weren’t the small to medium businesses already mostly using stock images anyway?

If anyone was to be worried I’d think it would be Getty Images.

reply

kazinator | karma 30751 | avg karma 1.78 · 2022-08-29 19:33:28

There is plenty of reason for artists who are hired for one-of-a-kind work to be worried.

Getty Images will just get with the program and stock up on bazillions of AI generated images, indexed by the prompts used to generate them.

Someone looking for stock images doesn't want to deal with artists, photographers or feeding prompts after prompt into some AI software, while not quite getting the desired result.

If Getty makes it easier for someone to find some existing AI-generated image than to generate one, they still have something.

A lot of the AI images we see in online blogs and galeries have been curated; people tinkered for hours with the stuff, and cherry-picked the best results. There could be some business model in that, at least for a while.

reply

CMCDragonkai | karma 578 | avg karma 1.52 · 2022-08-30 04:32:20

Getty would just be a cache with the prompts acting as the initial search index and buyers are just typing keywords in to buy cached images.

kazinator | karma 30751 | avg karma 1.78 · 2022-08-30 10:54:56

That is not exactly so, because the prompts are not reproducible input cases; you don't get the same image every time for a given prompt. An association between a prompt and, say, around ten images would be something resembling a cache.

CuriouslyC | karma 5185 | avg karma 1.89 · 2022-08-29 16:37:55

The stable diffusion model just got officially released recently, and in the last week a lot of easy to install repositories have been forked off the main one, so it's very accessible for people to do this at home. Additionally, the model is very impressive and it's a lot of fun to use it.

keepquestioning | karma -15 | avg karma -0.04 · 2022-08-29 17:20:09

How does it compare to DALL-E

CuriouslyC | karma 5185 | avg karma 1.89 · 2022-08-29 17:29:12

Worse at image cohesion and prompt matching, but competitive in terms of final image quality in the better cases.

astrange | karma 14132 | avg karma 1.32 · 2022-08-29 18:08:10

Its image quality is often better, mainly because you can run it on your own machine and increase the quality/time controls. DALLE2 lacks settings and doesn't run enough diffusion steps to have fine detail.

foobiekr | karma 5921 | avg karma 2.73 · 2022-08-29 18:22:07

Having had access to DallE for awhile, I find both Midjourney and stable diffusion to be quite a bit more powerful.

soperj | karma 7260 | avg karma 2.08 · 2022-08-29 18:21:30

For weird inputs I find DALL-E to be better, but both have failed at really specific stuff that I've tried to create as art. ie: a factory with human lungs.

With stable diffusion, it really creates some nice stuff for what is already available, like a pizza with specific toppings [0]. I've been using it to add pictures to any of the recipes that are added to my wiki site without a picture. I originally tried this with DALL-E with similar prompts and the results are less appetizing.

[0] - https://www.reciped.io/recipes/mushroom-and-onion-pizza/

reply

skybrian | karma 22817 | avg karma 2.5 · 2022-08-29 20:17:38

Some arbitrary comparisons I did:

Piano keyboard: https://mobile.twitter.com/skybrian/status/15629121632697303...

Model train layout: https://mobile.twitter.com/skybrian/status/15629200301441064...

Turtle sand sculpture: https://mobile.twitter.com/skybrian/status/15640856320323174...

reply

hbn | karma 7069 | avg karma 5.5 · 2022-08-29 16:42:33

It's an impressive new technology, and there's nothing else out there like it in terms of the model being publicly available and able to be run on consumer GPUs.

dougmwne | karma 13708 | avg karma 5.33 · 2022-08-29 16:49:27

First, it was recently released, so there’s novelty. Second, the code and model weights were also released so it is open and extensible, which this community loves. Thirdly, these high quality image generation models are mind blowing to most and it’s not hard to imagine how transformative it will be to the arts and design space.

If if has any greater meaning, we might all be a little nervous that it’ll come for our jobs next, or some piece of them. First it came for the logo designers, but I was not a logo designer, and so on.

reply

foobiekr | karma 5921 | avg karma 2.73 · 2022-08-29 18:21:23

the other issue is DallE and them Google/Imagen made a big deal about _not_ giving broad access to it and making it an approval-only beta (DallE) or simply you-can't-have-it (imagen).

So people were hyped up.

reply

hooloovoo_zoo | karma 848 | avg karma 2.86 · 2022-08-29 21:58:50

dang | karma 18142 | avg karma 0.25 · 2022-09-01 15:40:27

If you have evidence, let alone clear evidence, please let us know at hn@ycombinator.com. Please don't post insinuations about it here—those perceptions are a dime a dozen and the overwhelming majority turn out not to be supported by data.

This is in the site guidelines:

"Please don't post insinuations about astroturfing, shilling, bots, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data."

https://news.ycombinator.com/newsguidelines.html

https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme...

reply

hwers | karma 1057 | avg karma 3.15 · 2022-08-29 17:18:51

We could always do im2tex via just clip embedding the image. The idea that you could hide/sell prompts is silly. (Having human interpretable im2tex is cool tho.)

tough | karma 2337 | avg karma 2.04 · 2022-08-29 19:24:45

Yep https://replicate.com/pixray/text2image

naillo | karma 1899 | avg karma 4.36 · 2022-08-30 05:19:47

Are you literally a bot? This is the second time I've seen you link to replicate.com (this time the link isn't relevant to the parent).

tough | karma 2337 | avg karma 2.04 · 2022-08-30 11:02:41

Hmm, no, are you dang? Are you the link police? Fuck off, look at my history if you want, I'm trying to be helpful. I'm a happy customer of replicate, yes, but I also ordered a GPU to run on local... You say it's not relevant but actually I'm giving exactly the functionality that parent is asking, giving back a prompt without having to pay for it, that's what text2image does.

Also replicate links you to github, but in case you cannot run (not have a GPU at home) you can just run some free queries on it. What's to hate about that?

I don't care what you think, You can also check my other post on this thread where I link to the official paper, and has 11 upvotes.

Go bot yourself

Oh, also, very funny to have a 2m old account giving me lectures about what I'm allowed to post or not, you can't even downvote...

reply

naillo | karma 1899 | avg karma 4.36 · 2022-08-30 11:38:08

The parent is about im2text not text2im (i.e. converting an image to human readible text). Easy thing to misread. Apologize for upsetting you. Have a good day :)

tough | karma 2337 | avg karma 2.04 · 2022-08-30 22:59:06

So sorry about my prior comment, you're indeed right, I shared the wrong link.

This is is what I wanted to share originally https://replicate.com/methexis-inc/img2prompt

Have a nice day to you too, and sorry again for misreading the whole situation!

reply

smusamashah | karma 7954 | avg karma 4.48 · 2022-08-29 18:00:13

As someone mentioned in another comment, selling prompts isn't a joke https://promptbase.com/

kelseyfrog | karma 8211 | avg karma 2.69 · 2022-08-29 19:33:33

It has a lot more joke potential if prompts can simply be generated from images.

recov | karma 184 | avg karma 2.36 · 2022-08-29 20:29:34

Something about this just turns me off, not sure why.

boppo1 | karma 3181 | avg karma 1.9 · 2022-08-29 22:13:22

Don't worry. People who can actually paint will mercilessly attack the insecurities of anyone who is too serious about prompts. The artists hardly made money before and the AI will just take more off the table. All they (we) will have left is spite.

Kiro | karma 10888 | avg karma 1.51 · 2022-08-30 02:19:24

A craft you can't earn money on is called a hobby. I have a lot of hobbies I don't earn a dime on. Why would art be any different?

throwaway82388 | karma 213 | avg karma 4.84 · 2022-08-30 01:08:05

I feel it too, the idea of rent seeking in the least difficult part of the process. But people calling themselves prompt engineers (without irony) is at least amusing, like a sanitation expert or a maintenance engineer.

_1 | karma 558 | avg karma 4.33 · 2022-08-30 06:27:32

hmmm.... https://promptbase.com/prompt/hacker-news-viral-title-genera...

reayn | karma 314 | avg karma 3.41 · 2022-08-30 07:32:17

The given example is shockingly similar to what may actually pop up on the HN front page.

kelseyfrog | karma 8211 | avg karma 2.69 · 2022-08-31 13:34:21

What a great new possibility - generate a prompt, do the little project or whatever, write the blogpost, post the link. I guess status/karma/cred is the W?

shrimpx | karma 2668 | avg karma 2.45 · 2022-08-31 01:21:31

Is PromptBase basically useless now?

VMG | karma 8688 | avg karma 3.0 · 2022-08-30 03:14:39

The input image is just another kind of prompt.

These models have hundreds of input parameters, not just prompts. There are many ways to configure the different models and various techniques and link up the processing stages.

Getting the best results in a short amount of time requires highly specialized knowledge. The job description isn't "prompt engineer" but something close to that.

reply

frebord | karma 206 | avg karma 2.82 · 2022-08-29 16:27:18

How many years until we can generate a feature length film from a script?

bitwize | karma 18323 | avg karma 1.63 · 2022-08-29 16:34:03

I want to see the Batman film where the Joker gives Batman a coupon for new parents but it is expired. That should really be a real film in theatres.

goldenkey | karma 2244 | avg karma 0.68 · 2022-08-29 16:57:07

I loled.

djmips | karma 3793 | avg karma 1.45 · 2022-08-29 17:15:16

you 'might' enjoy. Teen titans fixing the timeline.

anigbrowl | karma 91010 | avg karma 3.36 · 2022-08-29 16:43:00

5

You could do storyboards from a shooting script* now, but generalizing to synthesizing character and camera movement as well as object physics is a ways off.

* A version of the script used mainly by director and cinematographer with details of each different angle to be used covering the scene.

reply

bottlepalm | karma 1528 | avg karma 2.15 · 2022-08-29 16:47:58

It looks like this is trending towards making our dreams/thoughts reality in a way in that what we imagine can easily be turned into media - music, books, movies, etc.. Pair this up with VR 'the metaverse' and you literally do get the ability to turn thoughts into personalized explorable realities.. what happens after that?

* Do we get lost in it?

* Does today's 'professional' fiction become a lot less lucrative when we can create our own?

* Is there a to leverage this technology the improve the human condition somehow?

reply

xdfgh1112 | karma 714 | avg karma 2.87 · 2022-08-29 16:53:10

I can create and explore realities using my imagination alone, though. I personally don't think having it become actual 2d or 3d art will have a lasting impact. It might be fun for a while, but it will get old.

bottlepalm | karma 1528 | avg karma 2.15 · 2022-08-29 17:03:30

It's kind of like your imagination on steroids as the system creates worlds using you imagination as the seed and augments it with summation of all the human creations used to train the network. Give Stable Diffusion a sentence for example, it will create something way beyond what you could of imagined and/or created on your own.

mejutoco | karma 1678 | avg karma 1.21 · 2022-08-30 02:21:29

Stable diffusion is impressive, but still is a subset of what one _can_ imagine.

Loveaway | karma 425 | avg karma 5.59 · 2022-08-29 17:11:27

this probably already all happened before mate

afro88 | karma 1351 | avg karma 2.62 · 2022-08-29 17:12:21

I think it will encourage novel ideas in all forms of art. In other words, genuinely new styles and expression will be scarce, because there wasn't thousands of forms of it to train a model on yet.

We will also adjust to AI generated art like have other creative technologies and the novelty will wear off. We will become good at identifying AI generated art and think of it as cheap.

Still, extremely exciting.

reply

namrog84 | karma None | avg karma None · 2022-08-29 23:48:21

> Do we get lost in it?

That is one hypothesis for the fermi paradox, Kardashev scale, and the great filter. At some point all civilizations essentially create infinite dreams/thoughts/Matrix style tech in where we all will retreat inward and have an infinite world to play with and essentially become gods in a virtual reality.

reply

wnkrshm | karma 1173 | avg karma 1.92 · 2022-08-30 08:01:45

Nobody would be interested in it really, since everyone (would have) (edit) their own thing they want others to check out. It's like fan fiction collections online, or the 80% of deviantart that you really don't want to spend time with - only now everything looks hollywood polished.

Eji1700 | karma 1392 | avg karma 5.31 · 2022-08-29 17:17:11

I suspect at least 10+ depending on your definition.

Tools like this will absolutely be used by professionals to cut out portions of the workload, but there's still a large gap between something like this and actually making a coherent, cohesive, consistent, paced, well framed and lit story from text alone.

reply

globalvisualmem | karma 13 | avg karma 1.86 · 2022-08-29 16:29:17

There is also recent work by Google called DreamBooth, though similar to Imagen/Parti Google refuses to release any model or code.

https://dreambooth.github.io/

reply

habitue | karma 5055 | avg karma 4.37 · 2022-08-29 18:00:36

Yeah, they allude to supporting more than one token for the identifier, which would be nice

GaggiX | karma 3608 | avg karma 2.43 · 2022-08-29 18:19:14

Textual inversion also supports more than one embedding for identifier, just change num_vectors_per_token in the yaml config. Example: https://www.reddit.com/r/StableDiffusion/comments/wzf1qk/sd_...

lucidrains | karma 1177 | avg karma 3.32 · 2022-08-29 18:47:46

so is dreambooth worth open-sourcing then given textual inversion?

beyondarmonia | karma 9 | avg karma 1.5 · 2022-08-29 19:04:45

From the textual inversion guy's own comment on Twitter

>The objective is similar, but it's: (1) A different approach - they also fine tune the model itself, and they get much much better identity preservation!

reply

GaggiX | karma 3608 | avg karma 2.43 · 2022-08-29 19:15:54

dreambooth retains higher fidelity as the model is finetuned, but to be honest I think textual inversion is actually more applicable as you can just add some embeddings to inject new knowledge into the model and not an whole new model just for a single concept (if you want to share it with others). Also I have not seen dreambooth being applied to replicate styles.

macrolime | karma 796 | avg karma 2.55 · 2022-08-30 05:23:59

Sounds like there is a chance might open source a version for Stable Diffusion. Let's see though.

From Twitter : >Awesome job! That really extends the applicability of powerful generative models nowadays. Could I ask if you have any timetable for releasing the code please? >We are working on plans for implementation on other open source models

https://twitter.com/jason_dingzc/status/1563578510958297089

reply

bottlepalm | karma 1528 | avg karma 2.15 · 2022-08-29 16:32:33

Wow, this is pretty cool. Instead of turning a picture back into text, turn it into a unique concept expressed as variable S* that can be used in later prompts.

It's like how humans create new words for new ideas, use AI to process a visual scene and generate a unique 'word' for it that can be used in future prompts.

What would a 'dictionary' of these variables enable? AI with it's own language with orders of magnitude more words. Will a language be created that interfaces between all these image generation systems? Feels like just he beginning here..

reply

tough | karma 2337 | avg karma 2.04 · 2022-08-29 19:26:35

Right on, you could see a marketplace of those mini- pretrained weights to have stuff a styles much more available in a UI like setting without needing to add style manually... Very interesting.

daenz | karma 17019 | avg karma 6.11 · 2022-08-29 16:36:05

This is a big deal! This adds a super power to communication, similar to how a photo is worth 1000 words. An inversion is worth 1000 diffusions!

hbn | karma 7069 | avg karma 5.5 · 2022-08-29 16:46:06

I saw talk the other day how these ML art models aren't really suited to doing something like illustrating a picture book because it can synthesize a character once but wouldn't be able to reliably recreate that character in other situations.

Didn't take long for someone to resolve that issue!

reply

zone411 | karma 3884 | avg karma 4.81 · 2022-08-29 17:24:43

It's not quite at that level yet. The paper introducing it recommends using only 5 images as the fine-tuning set so the results are not yet very accurate.

MacsHeadroom | karma 2958 | avg karma 2.23 · 2022-08-29 20:48:56

The model was released exactly one week ago. At this rate we'll be well past "there" in another week.

Gigachad | karma 15126 | avg karma 2.88 · 2022-08-29 19:32:33

This doesn't feel too far off. With the img2img stuff you can give a picture of a character and the tool can spit out new images of that character or transpose them in to new art styles.

It doesn't feel like this tech has hit any hard limits on things that are impossible or very far away yet. Every limitation seems to be getting broken at rapid pace.

reply

bravura | karma 6327 | avg karma 4.22 · 2022-08-29 16:53:12

Is there a colab or easy to use demo of this?

tough | karma 2337 | avg karma 2.04 · 2022-08-29 19:21:40

This tutorial on reddit best close thing so far. https://www.reddit.com/r/StableDiffusion/comments/wvzr7s/tut...

macrolime | karma 796 | avg karma 2.55 · 2022-08-30 05:27:19

Note that generating "inappropriate" images on colab could result in your entire Google account getting banned. I wouldn't risk it.

zone411 | karma 3884 | avg karma 4.81 · 2022-08-29 17:13:24

It should be noted that the official repo now also supports Stable Diffusion: https://github.com/rinongal/textual_inversion.

ionwake | karma 1156 | avg karma 1.16 · 2022-08-29 17:13:41

Anyone else starting to feel uncomfortable with the rate of progress?

nmca | karma 1431 | avg karma 2.83 · 2022-08-29 17:47:18

Yes. https://80000hours.org/problem-profiles/artificial-intellige...

smt88 | karma 21138 | avg karma 2.28 · 2022-08-29 18:10:29

I'm not worried about unemployment, although that is a problem. I'm more worried about bad actors being able to flood the web (even more than it already is) with realistic-enough content that makes it utterly unusable and unreliable.

Imagine entire subreddits consisting of posts, comments, memes, and photos, and 100% of it is pro-[insert authoritarian regime] and it essentially only cost $1M to do it.

reply

flycaliguy | karma 1472 | avg karma 2.43 · 2022-08-29 19:51:31

Honestly, I’m worried that shocking pornographic depictions of every women who’s ever posted her face online is coming. AI’s first big splash in our society is going to be a traumatic sexual assault of all women.

thomashop | karma 303 | avg karma 1.76 · 2022-08-29 20:13:41

How is drawing imaginary pictures sexual assault?

userbinator | karma 78987 | avg karma 4.37 · 2022-08-29 21:32:44

In the same way that speech is now considered violence by an unfortunately large amount of the population, I guess.

tgv | karma 8955 | avg karma 2.31 · 2022-08-30 04:43:38

That's not assault, but it can definitely be used for harassment and other forms of social damaging. I posted this an excerpt from an article (on anonymous Telegram groups) a few days ago:

Filing charges is pointless, says Ezra. Since two years, she's being harassed on Telegram. It started when she was sixteen: photoshopped nudes with her snapchat account were circulated. They had taken selfies from her social media, and those of her family, and combined them with porn fragments. She doesn't know the perpetrator, but that person takes a lot of trouble to ruin her. "Nowadays, the boys have so many ways to make it look real." source: de Groene Amsterdammer,146/33, p. 21.

reply

epups | karma 1403 | avg karma 2.07 · 2022-08-31 03:15:31

If anything, widespread use and understanding of this technology will help with situations like these. Teenagers in 10 years would absolutely not be impressed with a nude picture that has not been somehow verified as legitimate.

tgv | karma 8955 | avg karma 2.31 · 2022-08-31 07:40:11

That's entirely unfounded optimism, or –less politely put– sticking your head in the sand. Hasn't the printing press shown how easy it is to slander? Has internet taught you nothing about misinformation?

epups | karma 1403 | avg karma 2.07 · 2022-08-31 09:59:40

And do you propose to stop either to deter misinformation? Or maybe the pros outweigh the cons?

password54321 | karma None | avg karma None · 2022-08-30 06:19:49

Many seem to forget that Photoshop exists. People have been taking others faces and overlaying them to all sorts of images for years. Nothing about this is new and it hasn’t put society on fire.

MacsHeadroom | karma 2958 | avg karma 2.23 · 2022-08-29 20:52:59

I can do all of this right now from a handful of FREE Google accounts with Colab right now.

The only part that could be improved with money is getting dedicated mobile modems for unique IP addresses, to evade spam detection.

reply

bottlepalm | karma 1528 | avg karma 2.15 · 2022-08-29 23:30:50

Yep this technology is super impressive, there’s a chance some tweak can turn it into something scary.

* Train network on thousands of assembly instructions.

* Prompt ‘some bad weapon of this size and material’

* Result simple instructions how to go build it.

reply

i_like_apis | karma 929 | avg karma 2.89 · 2022-08-30 00:14:46

It’s actually incapable of doing any of that, for now. Deep learning can’t generalize and it doesn’t understand plans or schematics.

Lots of smart people have been trying to get it to have capabilities anywhere close to what you’re describing for years now, to no luck.

We’re safe for now.

reply

GaggiX | karma 3608 | avg karma 2.43 · 2022-08-30 05:31:53

Neural networks generalize, otherwise they would not be as powerful as they are today (and I don't know how you can deny that). If your neural network does not generalize then the model is overfitted.

i_like_apis | karma 929 | avg karma 2.89 · 2022-08-30 16:54:16

They don't generalize *well. "Deep Learning" as it is done currently, is very limited in its ability to generalize to out of distribution tasks. This is a major area of discussion in research.

The type of generalization necessary to perform what the parent was talking about for instance (synthesizing schematics) is (currently) not possible.

reply

cube2222 | karma 8803 | avg karma 6.36 · 2022-08-29 17:23:22

It's impressive how all of this is quickly picking up steam thanks to the Stable Diffusion model being open source with pertained weights available. It's like every week there's another breakthrough or two.

I think the main issue here is the computational cost, as - if I understand correctly - you basically have to do training for each concept you want to learn. Are pretrained embeddings available anywhere for common words?

reply

xur17 | karma 4127 | avg karma 2.3 · 2022-08-29 18:09:17

100% agreed. That's one thing that really bothered me about "openai".

optimalsolver | karma 9286 | avg karma 4.17 · 2022-08-29 19:00:32

Did OpenAI (Dall-E) and Google (Imagen) know Stable Diffusion was coming?

I'm sure they were looking forward to many months of maintaining highly exclusive access and playing "too dangerous to release" games before SD completely upended the table.

reply

dannyw | karma 12975 | avg karma 4.07 · 2022-08-29 19:22:22

Too dangerous to release, a codeword for selling access at 100% markup.

jwitthuhn | karma 652 | avg karma 7.58 · 2022-08-29 19:29:47

Bonus points: If dall-e rejects an output image because it thinks the image is inappropriate they won't show it. But they will still charge you for the prompt.

O__________O | karma 6259 | avg karma 3.7 · 2022-08-29 19:59:57

Is there even a public keyword list of words you’re not supposed to include?

cjbgkagh | karma 3587 | avg karma 2.36 · 2022-08-29 20:14:15

I wonder if they keep that private to avoid people finding work arounds.

speedgoose | karma 8135 | avg karma 2.13 · 2022-08-30 00:14:01

It’s perhaps not a simple list but an AI classifier, perhaps GPT-3 based.

kvakkefly | karma 62 | avg karma 1.88 · 2022-08-30 00:47:32

Nuclear was apparently confirmed on the list, but I have recently used it to generate cool things around nuclear power. So suppose it is like you say.

jwitthuhn | karma 652 | avg karma 7.58 · 2022-08-29 20:17:05

There is no public list. Someone on reddit compiled a very limited list here: https://old.reddit.com/r/dalle2/comments/wa3jt6/banned_words...

The problem is that they will automatically ban accounts that trigger the filter too much, so people would have to burn a whole lot of accounts to assemble an even remotely-complete list.

reply

redox99 | karma 2736 | avg karma 3.25 · 2022-08-29 23:10:56

Holy shit that list is insane. A lot of those words being banned would limit you enormously if you're trying to create art.

jeffreygoesto | karma 1189 | avg karma 1.89 · 2022-08-30 02:22:22

But it never has been about art?

bongoman37 | karma 753 | avg karma 1.28 · 2022-08-29 23:28:16

This list seems incomplete. I've had the filter triggered on words like 'thong' as in thong slippers as well.

josephcsible | karma 22500 | avg karma 3.06 · 2022-09-05 22:44:46

How is this legal? If someone commissions a painting from you, you're not allowed to just say "I'm not giving you the painting, but I'm keeping your money." Why does doing it with a computer make it okay?

refulgentis | karma 3142 | avg karma 1.08 · 2022-08-29 19:30:13

Anyone moderately plugged into the AI art knew for at least 2 months.

I'm guessing those teams didn't know in that they're AI researchers, and in my own employment at Google, I've been regularly reminded that being a technologist and being someone obsessed with a technology and pursuing it socially are different things.

Even without knowing the precise individuals that'd do it, I knew in February that by August there would be an open source model challenging state of the art back then, if only because given 6 months _some_ open source team would try scaling to a bigger model.

Another thing to point out is these teams are descendants of open source, Katherine Crawsons open source breakthroughs led to substantial improvements in DallE. Everyone should be saying her name 1000x more often.* She also helped create Stable Diffusion specifically, in substantial ways

* I think. Maybe I misunderstand the technology dramatically. But I think it's just poorly understood how much she's been involved.

reply

astrange | karma 14132 | avg karma 1.32 · 2022-08-29 19:59:13

> Another thing to point out is these teams are descendants of open source, Katherine Crawsons open source breakthroughs led to substantial improvements in DallE. Everyone should be saying her name 1000x more often.* She also helped create Stable Diffusion specifically, in substantial ways

Not only that, but OpenAI didn't seem to know their CLIP model could be used to generate images (via Advad's CLIP+VQGAN) at all, otherwise they wouldn't have released it. So they did unintentionally start the "AI art" movement even if they didn't release a trained DALLE.

reply

karmasimida | karma 1537 | avg karma 1.87 · 2022-08-29 20:54:42

Well Google's paper showed you don't need CLIP anyway. T5 and other languages model can be used regardless.

CLIP isn't the true blocker to entry, the dataset and compute is.

reply

astrange | karma 14132 | avg karma 1.32 · 2022-08-29 22:49:59

StableDiffusion has an open dataset, was funded by one guy, and apparently took "much less than $600k" to train on AWS (https://twitter.com/EMostaque/status/1563965366061211660).

So it seems there actually aren't many barriers to entry at all. There's certainly a lot of legal questions, but if it's this easy to create your own model then it's hard to enforce anything…

reply

mgpc | karma 112 | avg karma 6.59 · 2022-08-30 10:59:04

I don't think there are actually major legal concerns. Copyright protects reproduction of a specific image. Looking at an image and producing something in a similar style is not copyright infringement, it's called being an artist. The law on this seems pretty clear.

The UK recently announced plans to make this completely explicit, to remove any remaining doubt: "For text and data mining, we plan to introduce a new copyright and database exception which allows TDM for any purpose. Rights holders will still have safeguards to protect their content, including a requirement for lawful access."

https://www.gov.uk/government/consultations/artificial-intel...

reply

astrange | karma 14132 | avg karma 1.32 · 2022-08-30 16:23:44

I was wondering about trademark issues with a model that can draw new pictures of Mickey Mouse/Homer Simpson/Hatsune Miku if prompted with their names.

karmasimida | karma 1537 | avg karma 1.87 · 2022-08-29 20:50:54

I think the fact those diffusion models are smaller and compute efficient than gigantic GPT models are in fact make them easier to use and distribute.

BLOOM is out there, but not that many individuals with have like 8 3090 to host them, and the inference is still incredibly slow nevertheless

reply

goodside | karma 5539 | avg karma 8.44 · 2022-08-29 22:17:48

BLOOM also doesn’t have GPT-3’s RLHF tuning, so anyone who tries to ask it questions or give it instructions in the manner GPT-3 supports will be disappointed. You have to k-shot prompt it or fine-tune it yourself for it to be useful.

cbozeman | karma 3428 | avg karma 3.06 · 2022-08-29 22:49:35

I don't know if they saw it coming or not, but frankly I'm glad it did.

This idea of "technology gatekeeping" sickens me. I'm tired to death of people saying some non-sensical horseshit like, "The technology is too dangerous to be turned over to the hoi polloi!!"

Give me a break... as if someone running StableDiffusion on their home system and creating naked centaur-women out of pictures of Kate Beckinsale and anime waifus out of Ariana Grande photos are going to cause the downfall of the modern era.

StableDiffusion didn't upend the table... StableDiffusion gave the plans to the printing press to every person out there that wants to learn how to make their own print shop... and more power to them all, I say. I've had more fun and learned more about AI models in the past week than in I've had with AI in the past year, and I've been using img2img to feed my own art into SD to create whole new works that I've been able to touchup in Photoshop and upscale to print resolution.

This is truly the kind of computing revolution that I love to see, and that comes around all too infrequently. The good from this will far, far outweigh any negatives.

reply

bheadmaster | karma 5076 | avg karma 3.53 · 2022-08-30 03:44:56

Technology gatekeeping is completely antithetical to the hacker spirit.

Hackers built all this technology. There's no way a handful of megacorps are going to take it all for themselves.

reply

phh | karma 3339 | avg karma 5.59 · 2022-08-30 03:49:43

Right, like no megacorp ever prevented us from doing whatever we want for our smartphones. Oh wait. (No Pixels don't count, unless you can write your own TEE, your own sensor hub, and your own wake up word)

paskozdilar | karma 1410 | avg karma 2.83 · 2022-08-30 04:46:57

I'm not saying they can't try, and succeed for a while. But eventually, we will always break free.

Pixel exists (but apparently doesn't count because it's not perfect yet).

Librem exists.

PinePhone exists.

More will exist in the future.

reply

dymk | karma 8044 | avg karma 2.72 · 2022-08-31 01:00:20

The "Librem is to iPhone as Stable Diffusion is to DALL-E" analogy breaks down when you consider that Librem phone works about 10% as well as an iPhone, whereas SD works 110% as well as DALL-E.

paskozdilar | karma 1410 | avg karma 2.83 · 2022-08-31 03:58:24

Linux also used to be a "hobby" OS. Now it powers the internet. Things change.

NwpierratorR | karma 74 | avg karma 6.73 · 2022-08-31 11:05:03

It took Linux a couple of decades to get to that point. And it had immense business value in having such a massive infrastructure open for everyone.

For hardware our world is not there yet and won't be for quite a forseeable future.

reply

thegrimmest | karma 834 | avg karma 0.95 · 2022-08-31 12:04:42

"open source" hardware is never going to work the same way as open source software does. Hardware is fundamentally capital-intensive to produce. Software can be produced (compiled) using hardware that many people have readily available. This is a fundamental, intractable difference.

It's the difference between free knitting patterns and free cardigans.

reply

ben_w | karma 20467 | avg karma 1.69 · 2022-08-30 05:12:31

> This idea of "technology gatekeeping" sickens me. I'm tired to death of people saying some non-sensical horseshit like, "The technology is too dangerous to be turned over to the hoi polloi!!"

I think that misrepresents OpenAI's attitude. As I see it, their claim is closer to "let's discuss whether the stable door should be closed before we find out the hard way what makes the horse bolt".

Given how much trouble we already get from the Gell-Mann amnesia effect, and how many people take spirits and horoscopes seriously, it seems entirely plausible to me that some highly realistic centaur picture could be used as a casus belli for a popular uprising that effectively ends a nation.

(Similar rumours abound even without this tech, c.f. Catherine the Great or Malleus Maleficarum etc.; I suspect arbitrary photorealistic pictures make that kind of drama much more likely to occur and to stick harder when it does, but this suspicion is not strongly held).

Edit:

I want to add that my concerns from tech are less about the general public (most people are basically decent), but from the few percent who hate or fear who now have a much easier time promoting their views (the possibility having always existed is different from it being cheap), and also from those who don't realise the images are generated to fit the text and instead think it's a search engine of existing images (which appears to be a common view judging by the type of complaint certain artists have on any given demonstration of the tech, though public figures complaining about Google search results without knowing they're personalised is also a thing even for actual search).

reply

kazinator | karma 30751 | avg karma 1.78 · 2022-08-29 19:00:33

> I think the main issue here is the computational cost, as - if I understand correctly - you basically have to do training for each concept you want to learn. Are pretrained embeddings available anywhere for common words?

I just read (skimmed) through the paper.

That's in fact the key idea here: that the training model is untouched. Using the existing, trained model, they use this "inversion" procedure to discover some word that acts as a stable reference for a concept expressed in some images exposed to the model, which the model will understand as a reference to that concept.

There is a pretrained model with those common words, which knows how to do things like, say, "hamburger in the style of Picasso".

Now, without such model having been trained on the works of some artist, or other images, using a few samples (merely five or so), it's possible to uncover a latent word in the model which refers to the concept that those samples have in common. That word is stable in the sense that you can compose it with other words in prompts, and it really seems to denote the concept in those sample images.

In the paper these researches consistently call such a word as the meta-variable S*, and use it in prompts like, "flying monkey in the style of S*".

What I couldn't spot in the paper is a concrete example of what the S* word actually looks like for given examples. I'm guessing that it's some sort of gibberish. According to the concrete usage instructions, the process produces an embeddings.pt file, which you then upload, allowing you to use the pseudo-word * (asterisk) to refer to the concept.

People have been intuitively experimenting with gibberish words in prompts, discovering some stable behavior that seems to correspond to words that the AI "came up" with by itself (like a child, some have noted). This research seems like methodical way of discovering those internal words.

reply

staticautomatic | karma 3422 | avg karma 2.18 · 2022-08-29 19:19:29

Do latent variables “look like” anything at all? Like In a PCA, for example, a factor is some latent heuristic but does it even have an actual value?

kazinator | karma 30751 | avg karma 1.78 · 2022-08-29 19:40:19

Looking at this some more, I may have a slightly less flawed high level understanding. There is never actually a concrete word. There is an "embeddding" represented as an abstract vector, and that is forcibly associated with a pseudo-word like *. That * just recalls the vector; there is no intermediate gibberish that has a word representation: that vector is the gibberish.

galangalalgol | karma 3742 | avg karma 1.77 · 2022-08-29 20:23:44

I have seen images at openart with promots that considted entirely of different types of whitespace. They were haunting images of humanlike shapes. The prompt found some odd vocabulary that had trained to some concepts was my assumption. Is that impossible?

throwaway1851 | karma 389 | avg karma 2.74 · 2022-08-30 00:57:48

I think that’s a pretty apt comparison. A latent variable (or latent factor in PCA terms) is (basically) a direction in a n-dimensional space, where n is the length of the vector. The direction is correlated with some type of variance in the input data. Oftentimes this represents something that has some useful meaning (“dogness” vs “catness”, for example), but it could also just represent a correlation that has no interpretable meaning.

staticautomatic | karma 3422 | avg karma 2.18 · 2022-09-09 02:52:14

This is probably a dumb question but if we’re talking about language embeddings, are the latent vectors deterministically out of vocabulary? Is there any possibility of collision with an in-vocab n-gram’s vector?

phyzome | karma 3891 | avg karma 3.37 · 2022-08-30 06:59:54

Speaking of computational cost, I wonder how much electricity all this is using.

nodja | karma 791 | avg karma 4.77 · 2022-08-30 07:02:25

> I think the main issue here is the computational cost, as - if I understand correctly - you basically have to do training for each concept you want to learn. Are pretrained embeddings available anywhere for common words?

The basic SD model should have all the common words covered, this model's goal is to find a new concept that doesn't exist visually or textually in the dataset, like for example your own face, or a character you designed yourself. Note that this might not be possible to do, the corpus of data or the size of the model might not have held enough information that it can represent certain concepts, or at least represent them in detail. I.e. if you give it pictures of your dog, it might not look quite your dog during generation, even though those details existed in the pictures you gave the model.

If you want personalization that is also highly detailed, you'll have to fine tune the model itself with your own concepts, google has detailed how they did their own fine tuning and called it dreambooth[1].

[1] https://dreambooth.github.io/

reply

vinkelhake | karma 2060 | avg karma 4.96 · 2022-08-29 17:28:38

RIP promptbase.com - they had a good run.

kelseyfrog | karma 8211 | avg karma 2.69 · 2022-08-29 17:53:55

The won't even need to renew the domain name when it expires on 2023-02-28

bongoman37 | karma 753 | avg karma 1.28 · 2022-08-30 02:01:46

avocado2 | karma 240 | avg karma 3.93 · 2022-08-29 18:41:32

web demo for stable diffusion (txt2img): https://huggingface.co/spaces/stabilityai/stable-diffusion

github with web ui: https://github.com/hlky/stable-diffusion

dev repo (more features, may have bugs): https://github.com/hlky/stable-diffusion-webui

repo with docker: https://github.com/AbdBarho/stable-diffusion-webui-docker

colab repo (new): https://github.com/altryne/sd-webui-colab

can also run it in colab (includes img2img): https://colab.research.google.com/drive/1NfgqublyT_MWtR5Csmr

demo made with gradio: https://github.com/gradio-app/gradio

reply

frozencell | karma 273 | avg karma 0.68 · 2022-08-29 20:21:04

Hey the colab link is dead :(

nl | karma 29762 | avg karma 2.49 · 2022-08-29 18:52:43

People seem to have missed the point of what this is.

This is not "work out what the prompt is for an image"

Instead it lets you give the model a "thing" (as an image) and then use that thing in your prompts.

So for example they give it a picture of a statue and name it "S", and then can say "Elmo sitting in the same pose as S" and it correctly generates it.

reply

01100011 | karma 7985 | avg karma 3.41 · 2022-08-29 18:59:28

How far are we away from throwing a screenplay and some reference scenes at an AI and getting back a full movie? 10 years? Doesn't seem very far away now...

tough | karma 2337 | avg karma 2.04 · 2022-08-29 19:28:50

> https://github.com/rinongal/textual_inversion

Netflix UI of 2033: I want an episode of Senfield with X, Y, Z... Starting brand new (never aired) just generated by the AI episode now.

reply

ilaksh | karma 9227 | avg karma 1.28 · 2022-08-30 01:28:53

If you can ask for anything, why would it be Seinfeld?

Jack000 | karma 2438 | avg karma 3.28 · 2022-08-30 03:10:19

AI: remake seasons 5-8 of game of thrones, based on GRRM's final two books which he finally finished.

inawarminister | karma 307 | avg karma 0.68 · 2022-08-30 12:07:27

AI: remake seasons 5-8 of game of thrones, based on the critical consensus version of the final two books autocompleted by GRRM-style emulating generative network.

tough | karma 2337 | avg karma 2.04 · 2022-08-30 11:05:24

I wouldn't, tbh, It's just the example someone else gave used Seinfield, and I was trying to be relatable to an american audience, I never seen it. LOL

adrianN | karma 29995 | avg karma 2.78 · 2022-08-30 01:38:57

Just make sure not to ask for a Moriarty that can beat Data.

jjcm | karma 8401 | avg karma 7.11 · 2022-08-29 18:59:43

What I want to know here is can I seed it with a human face? If so, this really becomes a breakthrough for fast meme generation / other more nefarious uses. That will really be the thing that opens the floodgates on this.

MacsHeadroom | karma 2958 | avg karma 2.23 · 2022-08-29 20:54:35

Yes, you can. For well-known people you can already upload a meme and ask SD to replace the face.

I replaced a Biden meme with Shrek just yesterday using Stable Diffusion image2image in Colab.

reply

tough | karma 2337 | avg karma 2.04 · 2022-08-29 19:19:42

This is escalating quickly

I'm experimenting with generative art with all this new models, and what's coming as output is wild and beautiful.

You can basically make a full AI experience now as a one man show world building with the capabilities only prior to marvel or disney...

This might be a better link> https://textual-inversion.github.io/

Also this reddit tutorial might be useful to you https://www.reddit.com/r/StableDiffusion/comments/wvzr7s/tut...

reply

jcul | karma 449 | avg karma 1.47 · 2022-08-30 03:13:49

Genuinely mind blowing stuff.

That link is better in my opinion.

Showing multiple variables, styles Sx in the style of Sy; it's really amazing.

reply

debugnik | karma 748 | avg karma 2.59 · 2022-08-30 04:15:59

The fact that they chose to recreate Qinni's artstyle, who passed away just a couple of years ago and had a notable online following that still remembers her, makes me slightly uncomfortable; feels too soon for me, I guess.

Also, if models start to accept prompts like "in the style of Qinni", surely we're back to the copyright debate. They get away because everyone's art is mixed into a single model, but once distilling someone's artstyle is a feature…

reply

lbotos | karma 4414 | avg karma 2.94 · 2022-08-30 20:06:07

Art style is not copyrightable as far as I understand. It could be trademarked though if specific enough.

keepquestioning | karma -15 | avg karma -0.04 · 2022-08-29 19:26:49

I'm fearful of these algorithms, how can I ensure my economic status will not be affected? Any stock tickers to buy?

Vecr | karma 1224 | avg karma 0.89 · 2022-08-29 21:26:34

Amazon sells compute, and that's always needed for this sort of thing. They may develop new AI accelerator stacks (including racked hardware) in the future as well, so sure. I don't want to be one of those "the Internet is a fad" people, but I'm absolutely not telling you do do anything rash, go do your own research. Even if you bet at well as you can, the dot-com crash wiped people like you out, and that's assuming you get the general picture right.

cbozeman | karma 3428 | avg karma 3.06 · 2022-08-29 22:44:14

How can you ensure your economic status will not be affected?

How were farriers able to ensure their economic status wasn't affected by cars?

How were switchboard operators able to ensure their economic status wasn't affected by automatic switching?

How were travel agents able to ensure their economic status wasn't affected by Expedia, Priceline, and the dozen other sites out there?

No one can know when or how their job is going to go extinct. You just need to pay attention to the changing winds of technology and adjust your ship's sail accordingly.

reply

rpigab | karma 946 | avg karma 3.77 · 2022-08-30 07:39:46

Another example : https://en.wikipedia.org/wiki/Water_carrier

namrog84 | karma None | avg karma None · 2022-08-29 23:50:41

Invest in relevant technologies

e.g. GPU compute cycle companies (e.g. nvidia/amd/intel/arm/etc..) or invest in cloud compute companies (Amazon/Microsoft/Google/etc..)

Or learn relevant skills and try to have relevant marketable skills. (Always be learning)

reply

keepquestioning | karma -15 | avg karma -0.04 · 2022-08-30 01:14:57

Any courses you have completed recently?

bheadmaster | karma 5076 | avg karma 3.53 · 2022-08-30 03:48:02

> how can I ensure my economic status will not be affected?

You cannot.

It sucks, but it's inevitable. The same thing has been happening ever since the industrial revolution. Old skills become obsolete, and nobody cares what happens to people who've invested their entire lives into something that is no longer profitable.

reply

davesque | karma 6475 | avg karma 3.68 · 2022-08-29 20:29:52

It's interesting that, in a certain phase of the learning process, words in natural language already work this way. Before a language learner has enough knowledge to understand word etymology, they have to just form an association between a word and a certain set of sensory experiences that the word is meant to represent.

Seems like we have the same thing happening here. The input images are the sensory experiences. The dummy word "S*" is the linguistic symbol that is attached to them.

reply

andai | karma 8456 | avg karma 2.45 · 2022-08-30 01:09:26