Hacker Read

EZ-Cheeze · 2022-12-15 11:22:41

"https://en.wikipedia.org/wiki/Spectrogram - can we already do sound via image? probably soon if not already"

Me in the Stable Diffusion discord, 10/24/2022

The ppl saying this was a genius idea should go check out my other ideas

reply

whiddershins | karma 6988 | avg karma 3.65 · | 2022-12-16 06:21:51

This is a brilliant idea.

Also, spectrographs will never generate plausible high quality audio. (I think)

So I think the next move is to map the generate audio back over to synthesizer and samples via midi …

reply

xor99 | karma 530 | avg karma 2.62 · | 2022-12-15 18:47:53

Also check the similar work on arxiv:

Multi-instrument Music Synthesis with Spectrogram Diffusion:

https://arxiv.org/abs/2206.05408

reply

theGnuMe | karma 918 | avg karma 0.67 · | 2022-12-15 15:07:47

You can embed images in spectrograms.. might sound weird though

nerme | karma 225 | avg karma 3.52 · | 2010-05-24 21:39:22+00:00

Reminds me of this article on Aphex Twin (and others) embedding images in spectrograms: http://www.bastwood.com/aphex.php

layoutIfNeeded | karma 698 | avg karma 0.63 · | 2021-05-03 16:48:32

You can try interpreting images as spectrograms, but the result will be a cacophonic mess.

There’s a reason why nobody does this. (Other than avantgarde experimental composers maybe, but they are looking for cacophony)

reply

feralimal | karma -15 | avg karma -0.03 · | 2021-04-08 20:25:45

I'm sure its a cool project.

But its not showing things directly, is it? Its interpreting a frequency and converting that to an image. Is it really that different to a music visualiser?

reply

TooSmugToFail | karma 441 | avg karma 1.61 · | 2019-11-09 19:16:38

It would be interesting to decode the visuals into sound.

wizzwizz4 | karma 5694 | avg karma 1.61 · | 2024-05-21 21:30:17

> It's the way in which audio and spectrogram are both representational while being encoded by the same information.

They're not, really. If you've spent much time reading spectrograms, you can look at the images and see which part is the sound. Occasionally the picture and sound are "chosen" to line up (notably the birdsong, disguised as distant blurry flower petals – notably not the large flowers, which don't really sound like birdsong at all), but usually the sound appears as clear, visually-distinct banding (e.g. the kittens meowing), and the image shows up as very audible distortion (e.g. the tigers).

Choosing sounds and images that line up properly might not be something a human has tried before, but I can already see how I'd do it, if I were better at drawing. It's like a multimedia ambigram: https://en.wikipedia.org/wiki/Ambigram. The diffusion model is missing a lot of tricks: I could get similar quality by making a collage of clip art.

Edit: just saw the third "painting of a farm" one. That one is actually quite good, the way it uses the solid space in the barns. The first "painting of a farm" has the right idea (if I may anthropomorphise the system), working the birdsong in as clouds and trees, but its execution is severely lacking.

reply

Joeboy | karma 5083 | avg karma 2.8 · | 2015-02-16 16:29:43+00:00

http://photosounder.com/ is another "Photoshop of Sound". https://www.youtube.com/watch?v=mVuX1POU6Dw

tekstar | karma 1969 | avg karma 6.15 · | 2021-01-15 11:13:06

Reminds me of Aphex Twin (and some other artists) embedding images in the spectrograph render of their songs:

https://www.magneticmag.com/2012/08/the-aphex-face-visualizi...

reply

Hippasus | karma 5 | avg karma 1.67 · | 2020-11-09 22:37:58+00:00

> I've been trying to find a "proper" connection between audible sound and visible shape, a connection that would not only preserve all the information, but would also properly visualize the "symmetry" in sound, so that messy sound would turn into messy images and harmonic sound would turn into visually appealing images.

It is very exciting to come across others who are also interested in this topic. I am also very interested in the shape of sound but I have spent less time on empirical observations and more on imagining an abstract logic of numbers which can be visualized and heard. Real sound visualizations are also interesting to me but I decided to focus on abstract ideals because I thought it would be appropriate for a video game.

Hope you don't mind me sending some emails.

reply

AndrewUnmuted | karma 1646 | avg karma 2.0 · | 2019-12-11 22:43:15+00:00

I have been working in this space [0] for a number of years, and I agree, it's becoming extremely promising.

[0] https://sonicmultiplicities.audio

reply

_spduchamp | karma 429 | avg karma 2.25 · | 2022-12-16 07:17:29

I prefer to make my spectrograms by hand. https://youtu.be/HT0HH_fc4ZU

lisper | karma 54803 | avg karma 4.63 · | 2016-07-25 18:51:16+00:00

Yes, exactly. If the headline had been, "A neat way to turn sounds into pretty pictures" I would have had no problem with it.

jwebb99 | karma 63 | avg karma 1.54 · | 2016-11-07 18:18:06

"Photoshop for audio," seems so obvious, I'm surprised we haven't seen this before. (After all, the underlying technology has been around for a while now.)

yan | karma 9736 | avg karma 5.28 · | 2010-02-05 14:52:21

I once asked a friend of mine in uni if it was hard to write a program that generates audio from image files after listening to aphex twin.. Later that day he had this completed: http://134.74.16.64/wwwa/web/hardware/soundmural/

edit: since then, i've researched more on the topic, on signal decomposition and transforms and understand this a lot better :)

reply

mzs | karma 9728 | avg karma 3.35 · | 2017-10-12 22:55:51

sorry not something that would fit in 80 characters or so, start here:

https://en.wikipedia.org/wiki/Spectrogram

And really there is not much the image divulges, the audio was modified, recorded on unknown device, and the compression affected it, but there are hints.

reply

madmax108 | karma 5472 | avg karma 8.6 · | 2017-08-02 11:16:20+00:00

This is interesting, but fairly easy to confuse. Esp. would be interesting to see what results come up when you use modified "artistic" spectographs like that of Windowlicker by Aphex Twin [1]. One thing I've learned from years of having worked with audio and images is that image representations of audio are horrible representations of it (other than for temporal changes).

The results are good though! Good work! :D

[1] http://twistedsifter.com/2013/01/hidden-images-embedded-into...

reply

mistercow | karma 10714 | avg karma 3.06 · | 2014-12-29 21:31:53+00:00

>This means the frequencies emitted are very high (5 samples per period is 19.2 kHz) and it seems the audio output is being low pass filtered resulting in silly wobbly lines.

That effect actually looks amazing. I'd totally play a game with that aesthetic.

reply