Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Hybrid-Net: Real-time audio source separation, generate lyrics, chords, beat (github.com) similar stories update story
8 points by herogary | karma 96 | avg karma 4.17 2024-03-26 12:42:17 | hide | past | favorite | 67 comments



view as:

This is incredible

Indeed.

Only things I've seen it get wrong in a few minutes of testing are french lyrics (try a Serge Gainsbourg for example).

But it is really, really amazing.


Presented some strange unscrollable tables, then when trying to press play, it threw some error. One video was probably tabulated, but inaccessible in player. Meh, maybe will work when HN effect wears off, I was searching for such service lately but everything required logging in or was "for business and education use, trust us, it works".

Here's a question for folks who work on DL for audio: what are folks using for vocoders these days?

I feel like that's where a lot of artifacts are introduced (at least for TTS) and the best methods a while ago were slow and autoregressive.


In recent years, there has been substantial advancement in vocoders for DL audio applications. WaveGAN and MelGAN have emerged as promising solutions, harnessing the power of generative adversarial networks (GANs) to produce high-fidelity audio. Furthermore, parallel-waveGAN and HiFi-GAN have showcased improved efficiency with quicker inference times while maintaining exceptional audio quality.

Thanks!!

wow, I remember hearing about the Cocktail mix problem a few years back and I thought it may be near to impossible to solve, this is a good time to be alive.

You can try it on their website https://lamucal.ai/


Great experience

It doesn't make sense to show the progress of AI processing for featured songs which have been seen by dozens of visitors and surely are cached. Is this trick related to copyright issues?

Hopefully no DMCA of something like that affect this project, long time ago I wanted to build a lyrics translation website, but apparently you may get in trouble for using copyrighted lyrics.

The data sources come from YouTube or user-uploaded audio, and the lyrics are extracted from the audio using AI models.

There appears to be an issue with the lyric matching, which I'm guessing is based on caching the previously generated lyrics for different songs with the same name. I noticed it specifically on the song "Sweet Pea" by the Tobasco Donkeys, where it shows tabs and lyrics but they don't match the song.

Any details on the source separation performance (SDR and other BSS metrics)?

The SDR is 6.3

Am I missing something? What license is this released under?

It looks like the network structure of the model is provided, but the dataset is not.

No license, so the default "all rights reserved" applies. Can't really be used for anything.

Well done! Worked very well for a song I tried. This is a useful tool for learning music production and theory. Downloading the generated midi file it only gave me the basic chords - that's useful but it would be awesome to get midi of all the instruments. I will definitely be exploring this further, good stuff!

I'm a little confused, hopefully somebody can point me in the right direction. Can this be run locally, or are all the models proprietary and hidden somewhere? The repo seems to only contain the inference code.

[delayed]

I downloaded and tried their app, experiencing the audio source separation feature, and ended up with five tracks (piano, vocals, drums, bass, and others). It sounds pretty good, but unfortunately, there is no guitar track.

So cool!

I just tried it with a song with a fairly complicated chord progression - yesterday by the beatles. It did pretty well! But it got a couple parts wrong.

Is there support for modifying the results of the chords/lyrics? I don't see it immediately.


We're about to roll out features for chord and lyric modifications, and we'll continue to optimize the model going forward.

Well, from the PoV of someone who tries to learn how to play the guitar I must say that all this AI frenzy managed to produce some useful tools ;-)

I checked https://lamucal.ai/ with some example MP3:

- lyrics are OK (although I've seen tools that managed to do better),

- chords recognition wasn't bad,

- the UI is a bit rough around the edges (and I managed to get some Unity-related errors),

- pitch-aware speed adjustments is always a great tool when someone tries to learn how to play the song,

- transposing can be useful as well (although the web application does not support it).

I'm using (and paid for) some other similar application, although I primarily use that for tracks separation. Later I import tracks into Ardour and then record my own guitar lines. I use just a miniscule percentage features of the DAW, so if someone could provide an application with all that AI goodies coupled with recording ability that would be wonderful.

That said in personally I've found that one way or another I need to listen a lot to the song I'm trying to learn, make notes, break down the song structure (sections, strumming patterns, chords etc.). And a good video on YouTube that starts with a simple version of the song and then adds more and more feature are often the best help to start with, at least at my current level.


Thank you for raising the issue. We are continuously optimizing our model, and we are also constantly gathering various UI and business-related bugs. We will continue to optimize and resolve them in the future.

I found the lyric alignment quite good for the rap song I tested. What tool has better lyrics alignment?

LOL https://lamucal.ai/songs/john-coltrane/john-coltrane-giant-s...

It's definitely on to something. I wonder how would it perform if it was trained on Jazz.


Ha, Giant Steps was the first thing I tried as well. Not bad, certainly has some way to go.

[delayed]

Any chance of adding documentation for running it locally? venv / requirements.txt at the very least, or a docker image?

Tried it on a couple of my wife’s songs and she said it was quite inaccurate in terms of chords and tabs (the lyrics were pretty close though). This seems like one of those use cases where it’s not particularly useful until it gets above some minimum accuracy threshold.

Thanks for sharing your experience. We appreciate the feedback. It's clear that improving accuracy, especially with chords and tabs, is a priority for us. We're committed to enhancing the accuracy of our tool to meet your expectations and provide a more valuable experience.

At this point, I'm conflicted. The lyrics testing went well for me, but I'm not proficient in chords. Are the chord results really that poor? I wonder if I can use it to play my favorite songs in real life.

I play guitar/piano, and I concur that they’re not great. I tried “Vagabond (acoustic)” by Wolfmother. I figured it would have an easier time because it’s just a vocal and an acoustic guitar. Some of the notes in the tabs are right, but the melody is too simplistic. All of the embellishments are also missing. It’s interesting how the mile-high view isn’t so bad though. If I sat down to figure out the notes for a track no tabs exist for, this might look like a rough approximation of what I’d start with.

I tried the open source one that Spotify published a while ago on jazz trio music (just piano, double bass and drums) but it was pretty useless. My experiments with the trials of some commercial services, where you select an instrument and it extracts just that were much better.

The testing model for guitar separation is currently under development. The test results are somewhat unsatisfactory due to the significant variations in guitar instrument tones, especially for electric guitars. This adds to the difficulty of training

I tried it on one of my own tunes:

https://lamucal.ai/songs/adrian-holovaty/adrian-holovaty-the...

The beats/chords were consistently a full beat off, and the chords were probably only 50% right. I chose this tune because (to my ears) the harmony is pretty clear.

Compare this to my own manually created transcription of the same tune, and it's night-and-day difference:

https://www.soundslice.com/slices/tpbwc/

Beat detection and chord detection are hard problems, likely due to a lack of diverse training data. Chordify (another site that does this, which has been around for ages) has roughly similar performance.

Full disclosure: I run Soundslice, a website built around synced sheet music, in which there's no automatic transcription involved (maybe someday, but the tech isn't good enough yet!). I've been following these developments for 15+ years.


Thanks for your feedback. We're currently in the process of adjusting our dataset and model to address issues with chords and rhythm. We're looking forward to providing you with a better experience in the future.

sounds like your site could be a great training set, someone should reach out to you sounds like a good business opportunity.

Adrian, I made a Google Colab notebook to try a different beat detection algorithm with your tune. The results sound pretty good to me!

You can listen here:

https://colab.research.google.com/drive/1Pqgc9s-nBKxU_3Ap6K0...


Cool project, but it's not very accurate. I wouldn't charge for this yet

Thank you for providing feedback. We will continue to optimize and improve the model.

I tried a few songs and uploaded one of my favorite songs. The lyrics recognition result was excellent, but there were a few instances where some words were missing or incorrect. It would be great if there could be an option to edit the lyrics.

We are in the process of adding the lyrics editing feature, as well as chord and rhythm types. It will be released soon.

I've always wanted to implement a FFT from scratch and play with it to separate audio waves but then a full time job came along. I guess once you separate vocals from everything else you can just feed it to a speech to text?

To be completely honest, as a human that does not speak English natively, i find some lyrics hard to understand. I've seen native English speakers also having this problem. I think it's only neutral for a NN to do the same mistakes.


Source separation is commonly done by applying masks to the spectrogram. Deep learning is used to train the mask masks for different instruments' parameters. As you mentioned, this is the approach we will follow in the subsequent steps.

In the Android app I consistently get "Downloading model file" stuck at exactly -60830200% . Tried clearing data and caches and changing the connection.

Thank you for your feedback. Could you please email us the information of your phone model and system version? We will investigate promptly. In the meantime, you can try exiting the program and re-entering to see if that helps. Please also check your network connection.

does anyone know if virtualdj.com uses the same technique?

It looks like a DAW, and I'm not very familiar with their source separation. The technology we use has been publicly released on GitHub

I see a lot of “7M” chords in generated outputs, which isn’t a type of chord I’m familiar with. Is this meant to be “M7” (major 7)?

Yes, it's a major 7th chord.

[delayed]

My partner plays violin and was looking for a version of Lindsey Stirling's Crystallize without the violin part (or rather, with it turned down.)

I found plenty of tools online to do this, but they were all credit-based and a bit annoying to use. I eventually found they were mostly using Demucs from Facebook: https://github.com/facebookresearch/demucs

Really nice tool if you need simple splitting of things like drums, vocals, etc. It's not perfect, but it's a great start.


Yes, the Demucs mixing model has excellent SNR performance, but it is computationally intensive. It also incorporates a random mechanism, so each time it produces different spectrograms.

You can run Demucs directly in your browser (through WASM) on my website: https://freemusicdemixer.com/

No usage credits or cost, since it's all on your computer.


I would think that by comparison to image models synthetic data would be relatively easy to generate for audio model training. I’m curious then why it continues to be so difficult to build a nearly flawless audio separation model. Is synthetic data being widely used? Is it just too hard of a problem to train even with this data? I don’t have a good sense of what the most challenging aspects are of audio models.

Unlike images, audio signals are time-dependent and have complex temporal dynamics, making it more challenging to generate realistic synthetic data that captures the nuances of real-world audio. Meanwhile, the complex nature of audio signals, the scarcity of high-quality training data, and the subjective evaluation of audio quality collectively contribute to the ongoing challenges in building near-flawless audio separation models.

It doesn't seem very well pleased with The Dance of Eternity by Dream Theater. But it was quite impressive in some aspects with the likes of Nirvana.

Future music students are going to be fortunate to have these kinds of tool. An instant split and full analysis of any song. Remixes and backing tracks on tap, etc.

That said, we older students learned a lot from the process of doing this manually ourselves. Those lessons can still be learned and others besides, but the dynamics of the learning change with the tech.


Yes, let’s throw probably the most intricate progressive metal composition with over 100 time signature changes at it. I like the ambition! I tried it with “Glory” by John Legend and Common, which is literally 5 chords in C major scale and it stumbled. It identified (most of) the chords correctly but missed a bunch of them and the timing was off. It is pretty impressive anyway.

Finally, a new development in lyric transcription! I’ve been waiting to see this for ages!

def get_lyrics(waveform, sr, cfg): # asr and wav2vec2 raise NotImplementedError()


Insanely cool!

One bit of feedback: with this track [0], I noticed that the lyrics highlighting eventually fell behind the YouTube audio.

[0] https://lamucal.ai/songs/def-leppard/pour-some-sugar-on-me-e...


I tried something similar last week [https://guitar2tabs.klang.io], so I just ran yours with the same song [Surfin St. Helens by The Volcanos] to see how they compared.

* Lamucal did better on the chords--to my ear they're not perfect [e.g. D vs. D minor], but guitar2tabs couldn't do chords at all. * Guitar2tabs did pretty well at figuring out the melody; they couldn't grok the rhythm, but got a pretty decent sequence of notes. Lamucal didn't even try; the "tab" is just a list of chords.

One very nice feature of guitar2tabs is that you can play either the original audio or the extracted music, so you can hear how close it is. I'd recommend adding that--just playing the original is still useful, to see they sync with the extracted music, but playing the extraction is more so.


So far this [0] is the ONLY "AI stems splitter" I've found that really works great. Doesn't provide lyrics or chords, but it correctly splits any track in 4 stems (music, voice, drums, fx).

[0] https://vocalremover.org/splitter-ai


Couldn’t do Master of Puppets

Legal | privacy