Rust Audio

Splitting an audio stream based on volume/silence

I’ve got an arbitrary audio stream (might come from the microphone jack, an audio file, etc) which contains short periods of speech followed by silence/static. What would be the best way to split the incoming stream up so I can save each burst of speech to disk? Are there any existing frameworks or libraries that would let me do this sort of audio processing?

For context, I’m wanting to make a tool used to record snippets of radio chatter to make it easier to track what was said when.

Hi, welcome to this forum!

At this moment, there isn’t a go-to default framework in Rust. The only “framework” that I know of is rsynth, which I maintain, but it’s in an early stage of development and it mainly focuses on creating plugins and applications for music production, so I wouldn’t recommend it for your application.

For your application, it’s important to know that writing audio to disk cannot be done in the real-time thread, because it may cause the application to stutter. So the strategy is to have two threads: one that does the real-time capturing of the audio and another thread which saves the audio to disk. This is not as straightforward as it sounds, because in the real-time audio-thread you are not allowed to lock a mutex or even to allocate memory if you want to avoid stuttering.

With this background out of the way, let’s have a look at what you need. For your application, you need three components:

  1. A crate to read the audio in real-time. I must admit that I’m no expert in this field. I usually use the jack crate crate, but, while it works on other operating systems as well, it’s probably not a good match if you don’t use Linux. You can also have a look at this discussion about CPAL vs portaudio.
  2. A crate to send the audio from the real-time thread to the “disk thread”. Unfortunately, I don’t know about anything you can use for this that has already been stabilized. The problem here is that you need a synchronization mechanism that is wait-free (does not block and does not allocate memory). This PR is promising. This can be used in combination with “buffer juggling” (sending buffers back and forward between the threads) to send the audio to the “disk thread”. Let me know if you want me to explain “buffer juggling” in more detail.
  3. A way to save audio to disk. I use the hound crate for saving .wav files, but there are others.

I hope this helps!

One more thing: if I were you, I would sidestep the whole real-time audio thing, at least for a while, and test the assumption that the audio can easily be split up into parts in a meaningful way by looking at silence. I would first start writing a batch application that does that and see if it works. Maybe such a batch application can already help you forward!

1 Like

The sound mechanism you describe is basically a noise gate. It’s what radio stations use to lower the music to background volume when someone talks. Thus if you only record when the gate is open, it skips the silent parts. Don’t know of any Rust implementations yet though.

1 Like

Thank you for such a detailed response @PieterPenninckx! I was hoping to implement things so my noise gating algorithm (cheers for the terminology @scalarwaves, that’ll make searching easier) wouldn’t care whether batches of samples come from a WAV parser or the audio thread via some channel mechanism.

I’m guessing I’d implement the “noise gate” by buffering the last X samples and checking whether they are all within a certain noise threshold. I had something like this in mind, although at the moment don’t have a nice set of test recordings to throw at it…

fn main() {
  let rx = spawn_audio_thread_and_get_fancy_spsc_channel();
  let mut noise_gate = NoiseGate::new(...);

  loop {
    let samples = rx.recv();
    noise_gate.on_audio_received(samples);
    // "samples" gets dropped and the buffer returned to our buffer
    // pool
  }
}

struct NoiseGate<R> {
  noise_threshold: Sample,
  /// The number of samples below `noise_threshold` before we declare 
  /// the audio to be "silent".
  silence_length: usize,
  recorder: R,
  buffer: Vec<Sample>,
}

impl<R: Recorder> NoiseGate<R> {
  fn on_audio_received(&mut self, samples: &[Sample]) {
    self.buffer.extend(samples);

    let len = self.buffer.len();
    let sample_bound = if len < self.silence_length { 
      len
    } else { 
      len - self.silence_length
    };
    
    if audio_below_threshold(&self.buffer[sample_bound..], self.threshold) {
      // we've seen enough quite samples, tell the recorder we've reached 
      // the end of a transmission
      self.recorder.end_of_transmission();
    } else {
      // tell the recorder to append more samples to the current recording
      // (starting a new recording if necessary).
      self.recorder.record(samples);
    }

    // make sure buffer only contains last `self.silence_length` samples
    self.buffer.drain(..sample_bound);
  }
}

trait Recorder {
  /// Append more samples to the current recording.
  fn record(&mut self, samples: &[Sample]);
  /// Reached the end of the samples, do necessary cleanup (e.g. flush to disk).
  fn end_of_transmission(&mut self);
}

I’m guessing “buffer juggling” is a similar kind of thing to how you’ll do double/triple buffering in graphics? So you’d pre-allocate a handful of buffers (e.g. VecDeque<Vec<Sample>>), the producer (audio thread) grabs the next unused buffer and fills it, then passes it off to the consumer so they can process the data before moving the used buffer back to the pool.

Archive.org, specifically the Prehlinger archives always a good source of audio recordings. The pseudocode here looks just fine, though having some way to tweak the threshold setting in realtime will help.

1 Like

Yup. You’ve got it. I like the analogy you give here :slight_smile:, I didn’t think of it before.

Thanks for all the help everyone! I actually did a write-up for implementing this Noise Gate and published the resulting crate to crates.io.

If you’ve got a spare couple minutes, would you be able to skim through and point out any mistakes/oversights I’ve made?