What you're describing is more or less why noise suppression algorithms in general cannot really improve intelligibility of the speech. Unless they're given extra cues (like with a microphone array), there's nothing they can do in real-time that will beat what the brain is capable of with "delayed decision" (sometimes you'll only understand a word 1-2 seconds after it's spoken). So the goal of noise suppression is really just making the speech less annoying when the SNR is high enough not to affect intelligibility.
That being said, I still have control over the tradeoffs the algorithm makes by changing the loss function, i.e. how different kinds of mistakes are penalized.
The RRNoise suppression is less appealing to my ear than the Speex suppression... But:
- the approach is pretty cool!
- as mention in the article, it might be very useful when applied to multiple speakers (conferencing)
- it might be very interesting for speech recognition softwares
Also, as a sound guy, when I have a noisy signal I sometimes remove it a bit too heavily -> I mask the artifacts with some background music. I will definitely try that with the RNNoise suppression !
I will say, using Krisp, it has the same problem that basically all these 'AI' based noise cancelling seem to exhibit: sound quality deteriorates when outside noise is suppressed, and people seem to sometimes not meet the threshold and get completely cut out from talking in some scenarios.
It's still better than food noises, but I have noticed that as a disadvantage.
Personally, I can't filter out background noise properly.
This means I can understand a conversation _much_ more clearly if I'm wearing active noise cancelling headphones. Yes, it makes _you_ quieter, but it also means I'm not trying to pick out your speech from complicated background noises.
Noise cancelling doesn't work with speech that well. It's way better for steady background noise such as on a small plane if you're the pilot or on a large plane if you're a passenger.
NVIDIA has ML-based noise suppression functionality in the form of RTX Voice.
There is also Krisp.ai, a similar product for noise canceling; they have written up an overview of the difficulties involved on the NVIDIA Developer Blog, interestingly enough (it seems they were called 2hz.ai back then):
The point is that the sum of added and reduced noise is smaller in the noise cancellation than in non-noise cancellation, unless you are in an optimized environment.
I wonder if there will be progress in higher-frequency canceling given there’s an engineering reason for limiting canceling to lower frequencies. The current ANC technology makes outside speech quieter yet more intelligible, and that very much increases distraction for me.
Noise cancellation is designed to cut out the ambient noise - which is mostly white - of which there is a lot in a big city. It does this with a phase cancellation of what is probably a fairly predictable waveform. Human voices I can imagine are not that predictable. Maybe advances in machine learnings and processing may be able to provide better cancellation techniques.
I'll bet most earbuds with active noise reduction and equalization could be tweaked to provide augmentation similar to a hearing aid. The current algorithms for active noise reduction are designed to suppress external sounds -- all they have to do is invert the logic and amplify external sounds. Just add EQ to make my wife's voice come in better.
a) Noise reduction usually requires a powered component, or some kind of neck brace to contain the battery required for active noise cancelling. This adds expense and weight.
b) Sometimes it's good to be aware of your surroundings! When I'm listening to music, noise is usually sufficiently blocked by the sheer volume of the track, and during podcasts and audiobooks I'm also not bothered by the sounds of the streets.
If anything, it's good to hear a siren, or the ramblings of a nearby crazy person in order to avoid them.
The only time I find noise cancellation useful is on airplanes.
If active noise canceling seems to lower conversation volume, it is because you've convinced yourself it should. No DSP located on your ear can analyze and cancel an unpredictable signal like conversation before it reaches your ear. It can be effective against drone sounds like motors, rushing air, etc. because the same cancellation signal works now as did 100ms ago. This is not true of human speech.
Opt for a pair of well-fitting in-ears. If you have the money, see an audiologist for a custom fit. With high-quality earbuds and a good seal, you can play music at a very low level and still 1) hear all its detail, and 2) not perceive outside sounds.
That's true, of course. But it's much harder to actively cancel higher frequencies. This is why noise cancelling works brilliantly on an airplane (relatively low frequency background noise) but it does almost nothing to filter out the sounds of conversations around you.
Sorry. I should have specified I that I talking specifically about background noise cancellation for voice input. I haven't seen any other of this tech in action so can't comment on how good it is.
While the noise cancellation is active it will attempt to neutralize (destructively interfere with) sounds from the outside, including those generated by your speaker. You could indeed adversarially engage through something like a spontaneous phase shift (so the interference will become constructive, making the resulting signal louder) or generating a frequency the ANC can't compensate.
That being said, I still have control over the tradeoffs the algorithm makes by changing the loss function, i.e. how different kinds of mistakes are penalized.
reply