Hacker Read

Hacker Read top | best | new | newcomments | leaders | about | bookmarklet

login

		Hybrid-Net: Real-time audio source separation, generate lyrics, chords, beat (github.com) similar stories update story
		8 points by herogary \| karma 96 \| avg karma 4.17 2024-03-26 12:42:17 \| hide \| past \| favorite \| 67 comments

view as:

tremarley | karma 282 | avg karma 1.94 2024-03-26 14:19:24 | [–] similar comments

This is incredible

floathub | karma 176 | avg karma 2.32 2024-03-26 14:30:35 | [–] similar comments

Indeed.

Only things I've seen it get wrong in a few minutes of testing are french lyrics (try a Serge Gainsbourg for example).

But it is really, really amazing.

yetihehe | karma 3015 | avg karma 2.2 2024-03-26 14:51:13 | [–] similar comments

Presented some strange unscrollable tables, then when trying to press play, it threw some error. One video was probably tabulated, but inaccessible in player. Meh, maybe will work when HN effect wears off, I was searching for such service lately but everything required logging in or was "for business and education use, trust us, it works".

jszymborski | karma 7610 | avg karma 3.53 2024-03-26 14:21:08 | [–] similar comments

Here's a question for folks who work on DL for audio: what are folks using for vocoders these days?

I feel like that's where a lot of artifacts are introduced (at least for TTS) and the best methods a while ago were slow and autoregressive.

herogary | karma 96 | avg karma 4.17 2024-03-26 14:46:08 | [–] similar comments

In recent years, there has been substantial advancement in vocoders for DL audio applications. WaveGAN and MelGAN have emerged as promising solutions, harnessing the power of generative adversarial networks (GANs) to produce high-fidelity audio. Furthermore, parallel-waveGAN and HiFi-GAN have showcased improved efficiency with quicker inference times while maintaining exceptional audio quality.

jszymborski | karma 7610 | avg karma 3.53 2024-03-26 15:41:12 | [–] similar comments

Thanks!!

timetraveller26 | karma 728 | avg karma 2.74 2024-03-26 14:29:28 | [–] similar comments

wow, I remember hearing about the Cocktail mix problem a few years back and I thought it may be near to impossible to solve, this is a good time to be alive.

You can try it on their website https://lamucal.ai/

MaryCsb | karma 5 | avg karma 0.56 2024-03-26 14:31:17 | [–] similar comments

Great experience

benob | karma 181 | avg karma 3.07 2024-03-26 15:13:45 | [–] similar comments

It doesn't make sense to show the progress of AI processing for featured songs which have been seen by dozens of visitors and surely are cached. Is this trick related to copyright issues?

timetraveller26 | karma 728 | avg karma 2.74 2024-03-26 14:37:17 | [–] similar comments

Hopefully no DMCA of something like that affect this project, long time ago I wanted to build a lyrics translation website, but apparently you may get in trouble for using copyrighted lyrics.

herogary | karma 96 | avg karma 4.17 2024-03-26 14:55:40 | [–] similar comments

The data sources come from YouTube or user-uploaded audio, and the lyrics are extracted from the audio using AI models.

kej | karma 2370 | avg karma 3.31 2024-03-26 15:46:10 | [–] similar comments

There appears to be an issue with the lyric matching, which I'm guessing is based on caching the previously generated lyrics for different songs with the same name. I noticed it specifically on the song "Sweet Pea" by the Tobasco Donkeys, where it shows tabs and lyrics but they don't match the song.

sevagh | karma 977 | avg karma 2.26 2024-03-26 14:46:45 | [–] similar comments

Any details on the source separation performance (SDR and other BSS metrics)?

herogary | karma 96 | avg karma 4.17 2024-03-26 14:59:09 | [–] similar comments

The SDR is 6.3

tgkudelski | karma 16 | avg karma 3.2 2024-03-26 15:02:35 | [–] similar comments

Am I missing something? What license is this released under?

MaryCsb | karma 5 | avg karma 0.56 2024-03-26 15:15:12 | [–] similar comments

It looks like the network structure of the model is provided, but the dataset is not.

mkl | karma 11432 | avg karma 2.64 2024-03-26 21:25:40 | [–] similar comments

No license, so the default "all rights reserved" applies. Can't really be used for anything.

fallinditch | karma 374 | avg karma 2.54 2024-03-26 15:04:38 | [–] similar comments

Well done! Worked very well for a song I tried. This is a useful tool for learning music production and theory. Downloading the generated midi file it only gave me the basic chords - that's useful but it would be awesome to get midi of all the instruments. I will definitely be exploring this further, good stuff!

iamjackg | karma 520 | avg karma 4.64 2024-03-26 15:16:44 | [–] similar comments

I'm a little confused, hopefully somebody can point me in the right direction. Can this be run locally, or are all the models proprietary and hidden somewhere? The repo seems to only contain the inference code.

timlod | karma 288 | avg karma 3.51 2024-03-26 15:19:00 | [–] similar comments

[delayed]

collinmehle | karma 3 | avg karma 1.0 2024-03-26 15:39:54 | [–] similar comments

I downloaded and tried their app, experiencing the audio source separation feature, and ended up with five tracks (piano, vocals, drums, bass, and others). It sounds pretty good, but unfortunately, there is no guitar track.

peab | karma 79 | avg karma 1.61 2024-03-26 15:20:11 | [–] similar comments

So cool!

I just tried it with a song with a fairly complicated chord progression - yesterday by the beatles. It did pretty well! But it got a couple parts wrong.

Is there support for modifying the results of the chords/lyrics? I don't see it immediately.

herogary | karma 96 | avg karma 4.17 2024-03-26 16:01:20 | [–] similar comments

We're about to roll out features for chord and lyric modifications, and we'll continue to optimize the model going forward.

hipnoizz | karma 48 | avg karma 1.71 2024-03-26 15:20:21 | [–] similar comments

Well, from the PoV of someone who tries to learn how to play the guitar I must say that all this AI frenzy managed to produce some useful tools ;-)

I checked https://lamucal.ai/ with some example MP3:

- lyrics are OK (although I've seen tools that managed to do better),

- chords recognition wasn't bad,

- the UI is a bit rough around the edges (and I managed to get some Unity-related errors),

- pitch-aware speed adjustments is always a great tool when someone tries to learn how to play the song,

- transposing can be useful as well (although the web application does not support it).

I'm using (and paid for) some other similar application, although I primarily use that for tracks separation. Later I import tracks into Ardour and then record my own guitar lines. I use just a miniscule percentage features of the DAW, so if someone could provide an application with all that AI goodies coupled with recording ability that would be wonderful.

That said in personally I've found that one way or another I need to listen a lot to the song I'm trying to learn, make notes, break down the song structure (sections, strumming patterns, chords etc.). And a good video on YouTube that starts with a simple version of the song and then adds more and more feature are often the best help to start with, at least at my current level.

herogary | karma 96 | avg karma 4.17 2024-03-26 18:25:04 | [–] similar comments

Thank you for raising the issue. We are continuously optimizing our model, and we are also constantly gathering various UI and business-related bugs. We will continue to optimize and resolve them in the future.

dunkmaster | karma 73 | avg karma 2.92 2024-03-27 00:26:00 | [–] similar comments

I found the lyric alignment quite good for the rap song I tested. What tool has better lyrics alignment?

koiueo | karma 394 | avg karma 8.04 2024-03-26 15:21:25 | [–] similar comments

LOL https://lamucal.ai/songs/john-coltrane/john-coltrane-giant-s...

It's definitely on to something. I wonder how would it perform if it was trained on Jazz.

benzible | karma 823 | avg karma 3.96 2024-03-26 17:03:06 | [–] similar comments

Ha, Giant Steps was the first thing I tried as well. Not bad, certainly has some way to go.

rerdavies | karma 534 | avg karma 1.34 2024-03-27 01:44:36 | [–] similar comments

[delayed]

prophesi | karma 2376 | avg karma 2.23 2024-03-26 16:35:12 | [–] similar comments

Any chance of adding documentation for running it locally? venv / requirements.txt at the very least, or a docker image?

eigenvalue | karma 5515 | avg karma 5.35 2024-03-26 16:54:57 | [–] similar comments

Tried it on a couple of my wife’s songs and she said it was quite inaccurate in terms of chords and tabs (the lyrics were pretty close though). This seems like one of those use cases where it’s not particularly useful until it gets above some minimum accuracy threshold.

herogary | karma 96 | avg karma 4.17 2024-03-26 17:44:30 | [–] similar comments

Thanks for sharing your experience. We appreciate the feedback. It's clear that improving accuracy, especially with chords and tabs, is a priority for us. We're committed to enhancing the accuracy of our tool to meet your expectations and provide a more valuable experience.

CMLab | karma 189 | avg karma 4.72 2024-03-26 18:07:40 | [–] similar comments

At this point, I'm conflicted. The lyrics testing went well for me, but I'm not proficient in chords. Are the chord results really that poor? I wonder if I can use it to play my favorite songs in real life.

CSSer | karma 1673 | avg karma 2.5 2024-03-26 18:26:34 | [–] similar comments

I play guitar/piano, and I concur that they’re not great. I tried “Vagabond (acoustic)” by Wolfmother. I figured it would have an easier time because it’s just a vocal and an acoustic guitar. Some of the notes in the tabs are right, but the melody is too simplistic. All of the embellishments are also missing. It’s interesting how the mile-high view isn’t so bad though. If I sat down to figure out the notes for a track no tabs exist for, this might look like a rough approximation of what I’d start with.

weinzierl | karma 23117 | avg karma 5.55 2024-03-26 17:09:42 | [–] similar comments

I tried the open source one that Spotify published a while ago on jazz trio music (just piano, double bass and drums) but it was pretty useless. My experiments with the trials of some commercial services, where you select an instrument and it extracts just that were much better.

herogary | karma 96 | avg karma 4.17 2024-03-26 18:22:13 | [–] similar comments

The testing model for guitar separation is currently under development. The test results are somewhat unsatisfactory due to the significant variations in guitar instrument tones, especially for electric guitars. This adds to the difficulty of training

adrianh | karma 3527 | avg karma 13.11 2024-03-26 17:12:09 | [–] similar comments

I tried it on one of my own tunes:

https://lamucal.ai/songs/adrian-holovaty/adrian-holovaty-the...

The beats/chords were consistently a full beat off, and the chords were probably only 50% right. I chose this tune because (to my ears) the harmony is pretty clear.

Compare this to my own manually created transcription of the same tune, and it's night-and-day difference:

https://www.soundslice.com/slices/tpbwc/

Beat detection and chord detection are hard problems, likely due to a lack of diverse training data. Chordify (another site that does this, which has been around for ages) has roughly similar performance.

Full disclosure: I run Soundslice, a website built around synced sheet music, in which there's no automatic transcription involved (maybe someday, but the tech isn't good enough yet!). I've been following these developments for 15+ years.

herogary | karma 96 | avg karma 4.17 2024-03-26 17:40:10 | [–] similar comments

Thanks for your feedback. We're currently in the process of adjusting our dataset and model to address issues with chords and rhythm. We're looking forward to providing you with a better experience in the future.

luckydata | karma 2942 | avg karma 2.64 2024-03-26 18:51:43 | [–] similar comments

sounds like your site could be a great training set, someone should reach out to you sounds like a good business opportunity.

rrherr | karma 1469 | avg karma 7.94 2024-03-27 15:00:22 | [–] similar comments

Adrian, I made a Google Colab notebook to try a different beat detection algorithm with your tune. The results sound pretty good to me!

You can listen here:

https://colab.research.google.com/drive/1Pqgc9s-nBKxU_3Ap6K0...

999900000999 | karma 3774 | avg karma 2.36 2024-03-26 17:31:57 | [–] similar comments

Cool project, but it's not very accurate. I wouldn't charge for this yet

herogary | karma 96 | avg karma 4.17 2024-03-26 20:25:21 | [–] similar comments

Thank you for providing feedback. We will continue to optimize and improve the model.

CMLab | karma 189 | avg karma 4.72 2024-03-26 18:04:45 | [–] similar comments

I tried a few songs and uploaded one of my favorite songs. The lyrics recognition result was excellent, but there were a few instances where some words were missing or incorrect. It would be great if there could be an option to edit the lyrics.

herogary | karma 96 | avg karma 4.17 2024-03-26 18:18:46 | [–] similar comments

We are in the process of adding the lyrics editing feature, as well as chord and rhythm types. It will be released soon.

atum47 | karma 4188 | avg karma 2.52 2024-03-26 18:16:52 | [–] similar comments

I've always wanted to implement a FFT from scratch and play with it to separate audio waves but then a full time job came along. I guess once you separate vocals from everything else you can just feed it to a speech to text?

To be completely honest, as a human that does not speak English natively, i find some lyrics hard to understand. I've seen native English speakers also having this problem. I think it's only neutral for a NN to do the same mistakes.

herogary | karma 96 | avg karma 4.17 2024-03-26 18:32:03 | [–] similar comments

Source separation is commonly done by applying masks to the spectrogram. Deep learning is used to train the mask masks for different instruments' parameters. As you mentioned, this is the approach we will follow in the subsequent steps.

avallach | karma 52 | avg karma 2.36 2024-03-26 18:49:27 | [–] similar comments

In the Android app I consistently get "Downloading model file" stuck at exactly -60830200% . Tried clearing data and caches and changing the connection.

herogary | karma 96 | avg karma 4.17 2024-03-26 18:54:16 | [–] similar comments

Thank you for your feedback. Could you please email us the information of your phone model and system version? We will investigate promptly. In the meantime, you can try exiting the program and re-entering to see if that helps. Please also check your network connection.

may4m | karma 27 | avg karma 1.29 2024-03-26 19:12:07 | [–] similar comments

does anyone know if virtualdj.com uses the same technique?

herogary | karma 96 | avg karma 4.17 2024-03-26 19:19:26 | [–] similar comments

It looks like a DAW, and I'm not very familiar with their source separation. The technology we use has been publicly released on GitHub

pseudocomposer | karma 269 | avg karma 2.54 2024-03-26 19:47:40 | [–] similar comments

I see a lot of “7M” chords in generated outputs, which isn’t a type of chord I’m familiar with. Is this meant to be “M7” (major 7)?

herogary | karma 96 | avg karma 4.17 2024-03-26 19:55:17 | [–] similar comments

Yes, it's a major 7th chord.

rerdavies | karma 534 | avg karma 1.34 2024-03-27 01:38:17 | [–] similar comments

[delayed]

johnmaguire | karma 3292 | avg karma 5.65 2024-03-26 19:51:24 | [–] similar comments

My partner plays violin and was looking for a version of Lindsey Stirling's Crystallize without the violin part (or rather, with it turned down.)

I found plenty of tools online to do this, but they were all credit-based and a bit annoying to use. I eventually found they were mostly using Demucs from Facebook: https://github.com/facebookresearch/demucs

Really nice tool if you need simple splitting of things like drums, vocals, etc. It's not perfect, but it's a great start.

herogary | karma 96 | avg karma 4.17 2024-03-26 19:59:46 | [–] similar comments

Yes, the Demucs mixing model has excellent SNR performance, but it is computationally intensive. It also incorporates a random mechanism, so each time it produces different spectrograms.

sevagh | karma 977 | avg karma 2.26 2024-03-26 20:45:27 | [–] similar comments

You can run Demucs directly in your browser (through WASM) on my website: https://freemusicdemixer.com/

No usage credits or cost, since it's all on your computer.

joshuak | karma 1024 | avg karma 2.62 2024-03-26 20:53:58 | [–] similar comments

I would think that by comparison to image models synthetic data would be relatively easy to generate for audio model training. I’m curious then why it continues to be so difficult to build a nearly flawless audio separation model. Is synthetic data being widely used? Is it just too hard of a problem to train even with this data? I don’t have a good sense of what the most challenging aspects are of audio models.

herogary | karma 96 | avg karma 4.17 2024-03-26 21:05:05 | [–] similar comments

Unlike images, audio signals are time-dependent and have complex temporal dynamics, making it more challenging to generate realistic synthetic data that captures the nuances of real-world audio. Meanwhile, the complex nature of audio signals, the scarcity of high-quality training data, and the subjective evaluation of audio quality collectively contribute to the ongoing challenges in building near-flawless audio separation models.

crtified | karma 926 | avg karma 2.46 2024-03-26 22:06:22 | [–] similar comments

It doesn't seem very well pleased with The Dance of Eternity by Dream Theater. But it was quite impressive in some aspects with the likes of Nirvana.

Future music students are going to be fortunate to have these kinds of tool. An instant split and full analysis of any song. Remixes and backing tracks on tap, etc.

That said, we older students learned a lot from the process of doing this manually ourselves. Those lessons can still be learned and others besides, but the dynamics of the learning change with the tech.

mynegation | karma 4940 | avg karma 3.71 2024-03-27 01:38:41 | [–] similar comments

Yes, let’s throw probably the most intricate progressive metal composition with over 100 time signature changes at it. I like the ambition! I tried it with “Glory” by John Legend and Common, which is literally 5 chords in C major scale and it stumbled. It identified (most of) the chords correctly but missed a bunch of them and the timing was off. It is pretty impressive anyway.

corn-cheese | karma 2 | avg karma 0.67 2024-03-26 22:27:57 | [–] similar comments

Finally, a new development in lyric transcription! I’ve been waiting to see this for ages!

def get_lyrics(waveform, sr, cfg): # asr and wav2vec2 raise NotImplementedError()

CoastalCoder | karma 6391 | avg karma 2.71 2024-03-27 02:31:36 | [–] similar comments

Insanely cool!

One bit of feedback: with this track [0], I noticed that the lyrics highlighting eventually fell behind the YouTube audio.

[0] https://lamucal.ai/songs/def-leppard/pour-some-sugar-on-me-e...

uranium | karma 750 | avg karma 6.0 2024-03-27 03:06:42 | [–] similar comments

I tried something similar last week [https://guitar2tabs.klang.io], so I just ran yours with the same song [Surfin St. Helens by The Volcanos] to see how they compared.

* Lamucal did better on the chords--to my ear they're not perfect [e.g. D vs. D minor], but guitar2tabs couldn't do chords at all. * Guitar2tabs did pretty well at figuring out the melody; they couldn't grok the rhythm, but got a pretty decent sequence of notes. Lamucal didn't even try; the "tab" is just a list of chords.

One very nice feature of guitar2tabs is that you can play either the original audio or the extracted music, so you can hear how close it is. I'd recommend adding that--just playing the original is still useful, to see they sync with the extracted music, but playing the extraction is more so.

mdrzn | karma 1414 | avg karma 1.19 2024-03-27 09:13:03 | [–] similar comments

So far this [0] is the ONLY "AI stems splitter" I've found that really works great. Doesn't provide lyrics or chords, but it correctly splits any track in 4 stems (music, voice, drums, fx).

[0] https://vocalremover.org/splitter-ai

pas256 | karma 74 | avg karma 1.57 2024-03-27 22:38:10 | [–] similar comments

Couldn’t do Master of Puppets

Legal | privacy