Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I think this happens a lot with famous images since that image will be in the training set hundreds of times.

Even if deduplication efforts are done, that painting will still be in the background of movie shots etc.



sort by: page size:

They do sometimes, or at least they used to. I have some (very limited) visual art training, and one of the things I/we did in class was manually mash up already existing works. In my case I smushed the Persistence of Memory and the Arnolfini portrait. It was pretty clear copycat; the work was divided into squares and I poorly replicated the Arnolfini Portrait from square to square.

From what I've seen on the art side of things, the more a certain work has been copied in the real world (and thus in the training set multiple times), the more likely it is you're able to get a very close copy out of the model with the right prompts.

For example, I'm pretty sure this is why some models turn up a near exact version of The Girl With The Pearl Earring.


Their titles. Plus the training data, which would have many copies and variations of each painting.

My 2 cents would be:

Insure that the art is not being resized out of it's original proportions. Just like how nobody want's to see a pan and scan movie anymore, make sure you aren't robbing the art of it's compositional power by changing it's dimensions.

Also as a visual artist myself, I have actually noticed in the past that having a piece of art as my wallpaper has helped me build a better understanding of a master's work through repeated exposure. I'm just brain storming here, but it would be cool if say over the course of a month I was shown 4 or 5 different pieces (I prefer a long exposure in order to allow my interpretation time to evolve and grow) and each of these pieces were related somehow. It could become a bit of a puzzle to figure out what the last work had in common with this weeks.


Some very famous paintings it can almost reproduce, like the Mona Lisa and The Last Supper. It can’t get it quite right but I think it’s close enough to be considered copying. So there might be a copyrighted instance of that, but I haven’t seen one yet.

Actually that's partly how it works.

A trained model holds relationships between patterns/colours in artwork and their affinity to the other images in the model (ignoring the English tagging of images data within this model for a minute). To this degree, it holds relationships between millions of images and the degree of similarities (i.e. affinity weighting of the patterns within them) in a big blob (the model).

When you ask for a dragon by $ARTIST it will find within it's model an area of data with high affinity to a dragon and that of $ARTIST. What has been glossed over in discussion here is that there are millions of other bits of related images - that have lower affinity - from lots of unrelated artwork which gives the generated image uniqueness. Because of this, you can never recreate 1:1 the original image, it's always diluted by the relationships from the huge mass of other training data, e.g. a colour from a dinosaur exhibit in a museum may also be incorporated as it looks like a dragon, along with many other minor traits from millions of other images, chosen at random (and other seed values).

Another interesting point is that a picture of a smiling dark haired woman would have high affinity with Mona Lisa, but when you prompt for Mona Lisa you may get parts of that back and not the patterns from the Mona Lisa*, even though it looks the same. That arguably (not getting Mona Lisa) is no longer the copyrighted data.

* Nb. this is a contrived example, since in SD the real Mona Lisa weightings will out number the individual dark haired woman's many times, however this concept might be (more) appropriate for minor artists whose work is not popular enough to form a significantly large amount of weighting in the training data.


How different[0] is that from a painter viewing ~half a dozen Getty images, then repainting in combination to not pixel-perfect, but near, detail? Afaik[1] the hypothetical painter has neither committed infringement on the source works and has input sufficient creative effort into the new that it would be copywritable.

[0]Granted it differs a little, but not by that much either.

[1]IANAL


Models like MidJourney and StableDiffusion can generate very close renditions of some very well known paintings – and if it has enough of a conceptual understanding of even a less known painting, it can do a pretty impressive rendition of that as well.

But it's virtually impossible for these models to make an exact replica – a photocopy - of an existing painting, because that would make it break some laws of information theory probably. It's not a lossless compression engine. Paintings like, "Girl With the Pearl Earring" appear so frequently in the datasets that the models tend to overfit on them – which is actually not something you want when designing a model. It tends to create issues for you. But that's why a painting like that can be simulated somewhat accurately. But even then – it's never going to be 100%.


Humans don't usually do stroke for stroke copies of paintings. Or pixel for pixel sampling of photos, unless they get rights to the sources.

Use the model to generate image variations, filter out things that look too similar to the original. Then you can replace the original art works in the training set. Also remove artist names from the text, you can later create new style IDs. This will make it harder to duplicate the exact expression of an original work but still learn the ideas and visual styles in a more abstracted way.

For all the non-problematic training images you can use the originals. Some artists might want their names to become popular as style keywords.


Doesnt work if we can fairly find the artists of that picture - like the artists who made the pictures used in the training set.

I think it boils down to one question: can you prompt the model to show mostly unchanged pictures from artists? Then it's definitely problematic. If not, then I don't have enough knowledge of the topic to give a strong opinion. (my previous answer was just an use case that fits your argument)

Right. Apart from some (extremely famous) pieces of art that have been heavily repeated in the dataset you’re not going to be able to come close to recreating something directly.

That's absolutely not true.

You might get replicas of incredibly popular works like Mona Lisa due to overfitting but that's it.

If I am wrong please do provide an example as this is very relevant and interesting (and impossible from an information theory pov).


This is patently, obviously _wrong_ for anyone who has tried learning any artistic skill in their life. Sorry to be this straightforward, but it gets on my nerves every time I read it.

If you tried learning, let's say, the chiaroscuro technique from Caravaggio you'd be analyzing the way the painter simulated volumetric space by using white and dark tones in place of natural lighting and shadows. You wouldn't even think of splitting the whole painting into puzzle size pieces while checking how many how those look similar when put close one another.

Given somewhat decent painting skills, you'd be able to steadily apply this technique for the rest of your life just by looking at a very small sample of Caravaggio's corpus.

On the other hand if you tried removing even just a single work from the original Stable Diffusion data set you used to generate your painting, it would be absolutely impossible to recreate a similar enough picture even by starting from the same prompt and seed values.

Given how smart some of the people working on this are, I'm starting to believe they're intentionally playing dumb to make sure nobody is going ask them to prove this during a copyright infringement case.


Let's say we could replicate the painting stroke by stroke. Let's say we also use the same painter. Still, it doesn't matter. A copy is a copy. It's not the same as the first and initial incarnation of that artifact, physical or digital.

That's the thing though, a it's not uncommon for ML-generated images have major elements that are almost 1:1 copies of existing pieces, especially if the prompt includes an artist's name or a style that's closely tied to a particular artist, going well beyond an artist using existing pieces for reference/inspiration.

It seems that if they constructed scenes appropriate to the artist, the forgeries would be even more real?

For example, Bob Ross wasn't necessarily famous for portraits. I'm sure if you gave it a picture of a landscape and used a Bob Ross painting as a learning model, you would get a fairly close forgery.

All in all very cool tech!


There are lots of non unique paintings.

Artists often explore the same theme in steps and create several versions of the same picture. There are four paintings of "The scream" by Munch for example.

Some kind of art is created to be duplicated. Videoart and computer art, by design. Painters living much before computer age still managed to create thousands of "identical" copies of their pictures as a standard procedure. Auctions are full of lithographs, sometimes painted decades after the death of the artist.

next

Legal | privacy