But its not showing things directly, is it? Its interpreting a frequency and converting that to an image. Is it really that different to a music visualiser?
> It's the way in which audio and spectrogram are both representational while being encoded by the same information.
They're not, really. If you've spent much time reading spectrograms, you can look at the images and see which part is the sound. Occasionally the picture and sound are "chosen" to line up (notably the birdsong, disguised as distant blurry flower petals – notably not the large flowers, which don't really sound like birdsong at all), but usually the sound appears as clear, visually-distinct banding (e.g. the kittens meowing), and the image shows up as very audible distortion (e.g. the tigers).
Choosing sounds and images that line up properly might not be something a human has tried before, but I can already see how I'd do it, if I were better at drawing. It's like a multimedia ambigram: https://en.wikipedia.org/wiki/Ambigram. The diffusion model is missing a lot of tricks: I could get similar quality by making a collage of clip art.
Edit: just saw the third "painting of a farm" one. That one is actually quite good, the way it uses the solid space in the barns. The first "painting of a farm" has the right idea (if I may anthropomorphise the system), working the birdsong in as clouds and trees, but its execution is severely lacking.
> I've been trying to find a "proper" connection between audible sound and visible shape, a connection that would not only preserve all the information, but would also properly visualize the "symmetry" in sound, so that messy sound would turn into messy images and harmonic sound would turn into visually appealing images.
It is very exciting to come across others who are also interested in this topic. I am also very interested in the shape of sound but I have spent less time on empirical observations and more on imagining an abstract logic of numbers which can be visualized and heard. Real sound visualizations are also interesting to me but I decided to focus on abstract ideals because I thought it would be appropriate for a video game.
"Photoshop for audio," seems so obvious, I'm surprised we haven't seen this before. (After all, the underlying technology has been around for a while now.)
I once asked a friend of mine in uni if it was hard to write a program that generates audio from image files after listening to aphex twin.. Later that day he had this completed: http://134.74.16.64/wwwa/web/hardware/soundmural/
edit: since then, i've researched more on the topic, on signal decomposition and transforms and understand this a lot better :)
And really there is not much the image divulges, the audio was modified, recorded on unknown device, and the compression affected it, but there are hints.
This is interesting, but fairly easy to confuse. Esp. would be interesting to see what results come up when you use modified "artistic" spectographs like that of Windowlicker by Aphex Twin [1]. One thing I've learned from years of having worked with audio and images is that image representations of audio are horrible representations of it (other than for temporal changes).
>This means the frequencies emitted are very high (5 samples per period is 19.2 kHz) and it seems the audio output is being low pass filtered resulting in silly wobbly lines.
That effect actually looks amazing. I'd totally play a game with that aesthetic.
Me in the Stable Diffusion discord, 10/24/2022
The ppl saying this was a genius idea should go check out my other ideas
reply