Jump to content
IGNORED

Neural Network/deep learning for crowd noise removal


Recommended Posts

This just occurred to me as a thing that could be done, so thought I'd post on here to see if anyone knows anything about this subject, as I know basically what I have read in articles in Wired. So possibly even less than nothing.

 

So any ideas? It would be really nice to clean all chatter etc off of bootlegs...autechre 2008, 2010, 2016 I am looking at you.

I have worked with neural networks and machine learning stuff a bit, although I feel I probably know fuck all about deep learning (convolutional neural networks). I think I can grasp the general concept though. 

 

Generally this stuff is done by training the neural network with samples of crowd noise and samples of music, so the thing would eventually be able to tell the difference between the two. However it seems really difficult to get the data for training, because you probably don't have a lot of noise recorded from the same event that you are trying to clean up and using sound from a different event may throw off the neural network (different crowd type, room acoustics, sound system etc.). Similarly you would ideally need "pure" samples of the same audio that was played at the event (although maybe album versions would do in a pinch if the neural network is good enough).

 

Assuming you get all the data you need, then what you basically want is to feed in the training data - samples of crowd noise and "pure" audio (maybe try to feed it through some reverb to approximate venue sound) - and build the model, then feed in the noisy audio. I think the model would then basically go through every sample (realistically that's more like 1-2 seconds of audio at a time) and add/substract what it thinks is the "noise" part of signal amplitude. On paper this looks simple enough, but I think there could probably be some audible artifacts: for instance when there's a more quiet part and some asshole hollers over it in the recording, then the neural network is going to have a hell of a time to recreate what audio actually should be there. Basically, what I want to say here is that it's quite possible that the neural network model is going to add unpredictable audio artifacts to the result no matter what, so even if you have successfully removed the noise, there still will be some extra sound that wasn't actually played by any artists.

 

However all this stuff is back of the envelope theory, would be nice to have the opportunity to actually try it out in real life. There's some amazing shit possible with the deep learning neural networks, so I would not be surprised if someone puts together a thing for noise removal. I have not checked any science about this really, I think there is definitely something being researched.

I'm guessing a recurrent neural network would be the way to go, though I have no idea how much training data would be necessary, nor how one would go about picking hyperparameters.

A comathematician is a device for turning cotheorems into ffee.

QKcDskr.gif

GET A LOAD OF THIS CRAP

  Reveal hidden contents

 

My guess/intuition: don't bother.

 

Unlikely to outperform already existing methods. But who knows? Whatever technique you're going to use, it'll probably do two things:

1. Extract sounds from waveform

2. Interpolate the changed waveform to restore assumed ideal waveform without extracted sounds.

 

Both steps use implicit assumptions and estimations, regardless of the technique you're going to use. Deep neural nets or something else (example of techniques use in photoshop to edit images, which is similar to sounds). My guess is that a complex/expensive technique like deep neural nets will hardly, if at all, outperform a simple technique that's already available.

 

Reasoning is that it's unlikely to find a solution which can model "crowd noise" better than whats already out there. Deep neural nets tend to depend on lots of data and needs problems with a static set of rules. Like games, or language. When it comes to crowd noise and especially autechre gigs I'm afraid you'll be limited by a lack of static rules defining crowd noise, or defining autechre to not be noise.

 

You could try though. But be prepared to spend a lot of time without a likely benefit, other than experience and lessons learned.

 

In the next tweet is a link to an interesting article on the possibilities/limitations of deep neural nets. It's non-technical/ readable .

 

What you should take away from it, imo: you're not learning to label images or playing a game. To an extent you're trying to label noise. But additionally, you're also predicting sounds as you need to restore sounds after extraction of noise. Or otherwise, you'll be predicting what the noise would sound like without the music. So you'll be predicting crowd noise. Which might be more problematic as it doesnt look that will follow strict rules. Less so than music, arguably. But who knows, right?

Hmmmm. Yes, it's a rabbit hole.

 

I was more hoping that someone had already done the heavy lifting on this one.

 

I've got a mate who's a lecturer in music tech at a University, I might see if he can get his students on it. That's what they're for right?

 

Then, the clean autechre recordings can be mine. ALL MINE. Mwhahahaha etc

Izotope's RX repair tools (particularly De-noise & Dialogue Isolate) is probably the closest we have at the moment:

 

I haven't eaten a Wagon Wheel since 07/11/07... ilovecubus.co.uk - 25ml of mp3 taken twice daily.

  On 1/4/2018 at 10:30 AM, mcbpete said:

Izotope's RX repair tools (particularly De-noise & Dialogue Isolate) is probably the closest we have at the moment:

 

 

 

Yeah, that's what I would recommend. I recently used it to (partially) remove crowd noise from an interview

Doing a little reading around, it seems the majority of the research going on at the moment is geared around isolating the human voice from background noise (usually other human voices). Some really impressive stuff, including some that can run faster than realtime on a Raspberry Pi.

 

However, I couldn't find anything regarding the separation of musical information from background noise specifically. I would think that it would be possible theoretically (especially given music like autechre that is more or less all synthetic), but not without great difficulty and low likelihood of the quality I am hankering after - artifacts are likely to be almost as irritating as that guy chuntering away in the background. Also I imagine preserving the sound of the space would be very difficult.

 

I guess in my simplistic pre-sleep mind I was hoping for a process like: input dataset A (autechre oeuvre) > input dataset B (crowd noise) > run iterative learning processes on datasets > remove B from A. Alas it turns out that cutting edge tech is more complicated than that! Who would have known?

Edited by EXTRASUPER81
  On 1/4/2018 at 12:07 PM, EXTRASUPER81 said:

including some that can run faster than realtime

it's really impressive that investigation of noise removal led to the discovery of time travel

  On 1/4/2018 at 12:24 PM, span said:

 

  On 1/4/2018 at 12:07 PM, EXTRASUPER81 said:

including some that can run faster than realtime

it's really impressive that investigation of noise removal led to the discovery of time travel
Boom

 

The latest Raspberry Pi is really powerful.

Edited by EXTRASUPER81

Pretty impossible but go for it, I think you're more likely to get a direct recording off an artist if they see you trying to train neural networks, right
?

  On 1/5/2018 at 12:51 AM, fenton said:

Pretty impossible but go for it, I think you're more likely to get a direct recording off an artist if they see you trying to train neural networks, right

?

 

I am thinking that if you train a neural network on some artist's material, you can then generate brand new stuff that sounds like that artist.

Also - I think this has been discussed here already - you could in theory take recordings of some famous pianist or guitarist, annotate them with note data, train your network and feed it your own piano roll to automagically have your solos played in their style. If it works for text, it might work for other audio too. Might be worth trying out some day.

I think most artists' music is too complex to just ram the audio into an NN and expect it to shit out something recognizably theirs.

 

That said, I reckon it stands a chance against Nickelback.

  • 2 months later...

Been reading about neural networks and machine learning. So incredible.

Edited by Lane Visitor
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   1 Member

×
×