Hacker Read

rfmw19 | karma 6 | avg karma 0.67 · 2020-07-21 18:34:32+00:00

This article seems to approach at a pretty low level. For another take, I recently worked on a hobby project that built a multi-label deep learning classifier using CNNs and got about 95% accuracy for the validation set. I am somewhat familiar with signal processing with my background, but really wanted to just scratch the surface with deep learning. But to be clear, it was just a fun project for me and I don't pretend to understand everything I did.

My goal was to detect instruments in complex, full-length music, as opposed to single sources with no background noise. My approach was to generate Mel spectrograms for small sections of songs and then run the deep learning classifier on these images to build a list of labels for later use, e.g. generating playlists. I more than doubled the accuracy by adding to and curating my own dataset. I used Resnet50 as the base model and found decent results even when applied to spectrograms!

I still want to do more analysis to figure out what kind of small features the model was actually picking up on. I also want to try experiments like scrambling the spectrogram time-wise and seeing if the results are still as accurate.

reply