Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

"https://en.wikipedia.org/wiki/Spectrogram - can we already do sound via image? probably soon if not already"

Me in the Stable Diffusion discord, 10/24/2022

The ppl saying this was a genius idea should go check out my other ideas



sort by: page size:

This is a brilliant idea.

Also, spectrographs will never generate plausible high quality audio. (I think)

So I think the next move is to map the generate audio back over to synthesizer and samples via midi …


Also check the similar work on arxiv:

Multi-instrument Music Synthesis with Spectrogram Diffusion:

https://arxiv.org/abs/2206.05408


You can embed images in spectrograms.. might sound weird though

Reminds me of this article on Aphex Twin (and others) embedding images in spectrograms: http://www.bastwood.com/aphex.php

You can try interpreting images as spectrograms, but the result will be a cacophonic mess.

There’s a reason why nobody does this. (Other than avantgarde experimental composers maybe, but they are looking for cacophony)


I'm sure its a cool project.

But its not showing things directly, is it? Its interpreting a frequency and converting that to an image. Is it really that different to a music visualiser?


It would be interesting to decode the visuals into sound.

> It's the way in which audio and spectrogram are both representational while being encoded by the same information.

They're not, really. If you've spent much time reading spectrograms, you can look at the images and see which part is the sound. Occasionally the picture and sound are "chosen" to line up (notably the birdsong, disguised as distant blurry flower petals – notably not the large flowers, which don't really sound like birdsong at all), but usually the sound appears as clear, visually-distinct banding (e.g. the kittens meowing), and the image shows up as very audible distortion (e.g. the tigers).

Choosing sounds and images that line up properly might not be something a human has tried before, but I can already see how I'd do it, if I were better at drawing. It's like a multimedia ambigram: https://en.wikipedia.org/wiki/Ambigram. The diffusion model is missing a lot of tricks: I could get similar quality by making a collage of clip art.

Edit: just saw the third "painting of a farm" one. That one is actually quite good, the way it uses the solid space in the barns. The first "painting of a farm" has the right idea (if I may anthropomorphise the system), working the birdsong in as clouds and trees, but its execution is severely lacking.



Reminds me of Aphex Twin (and some other artists) embedding images in the spectrograph render of their songs:

https://www.magneticmag.com/2012/08/the-aphex-face-visualizi...


> I've been trying to find a "proper" connection between audible sound and visible shape, a connection that would not only preserve all the information, but would also properly visualize the "symmetry" in sound, so that messy sound would turn into messy images and harmonic sound would turn into visually appealing images.

It is very exciting to come across others who are also interested in this topic. I am also very interested in the shape of sound but I have spent less time on empirical observations and more on imagining an abstract logic of numbers which can be visualized and heard. Real sound visualizations are also interesting to me but I decided to focus on abstract ideals because I thought it would be appropriate for a video game.

Hope you don't mind me sending some emails.


I have been working in this space [0] for a number of years, and I agree, it's becoming extremely promising.

[0] https://sonicmultiplicities.audio


I prefer to make my spectrograms by hand. https://youtu.be/HT0HH_fc4ZU

Yes, exactly. If the headline had been, "A neat way to turn sounds into pretty pictures" I would have had no problem with it.

"Photoshop for audio," seems so obvious, I'm surprised we haven't seen this before. (After all, the underlying technology has been around for a while now.)

I once asked a friend of mine in uni if it was hard to write a program that generates audio from image files after listening to aphex twin.. Later that day he had this completed: http://134.74.16.64/wwwa/web/hardware/soundmural/

edit: since then, i've researched more on the topic, on signal decomposition and transforms and understand this a lot better :)


sorry not something that would fit in 80 characters or so, start here:

https://en.wikipedia.org/wiki/Spectrogram

And really there is not much the image divulges, the audio was modified, recorded on unknown device, and the compression affected it, but there are hints.


This is interesting, but fairly easy to confuse. Esp. would be interesting to see what results come up when you use modified "artistic" spectographs like that of Windowlicker by Aphex Twin [1]. One thing I've learned from years of having worked with audio and images is that image representations of audio are horrible representations of it (other than for temporal changes).

The results are good though! Good work! :D

[1] http://twistedsifter.com/2013/01/hidden-images-embedded-into...


>This means the frequencies emitted are very high (5 samples per period is 19.2 kHz) and it seems the audio output is being low pass filtered resulting in silly wobbly lines.

That effect actually looks amazing. I'd totally play a game with that aesthetic.

next

Legal | privacy