Rust Audio

A Minimal Hardware Abstraction Layer

Currently, the Rust ecosystem has a handful of audio driver wrappers (I’m not delving too deeply into each crate, if you want to add details leave a comment)

(forgive the absence of links, discourse won’t let me post them)

  1. cpal
    • MacOS, Linux, Windows (and wasm?)
    • uses an “EventLoop” abstraction to handle multiple streams to a single device
    • Unclear on lock-free behavior in the audio thread
    • Some ideas about a “Host” API for multiple backends, unclear on its status
  2. alsa
    • Linux only
  3. jack
    • unclear if supported on MacOS
  4. coreaudio-rs
    • MacOS only
    • Wraps the AudioInput/Output audio units, not a direct wrapper around the C API (for the moment)
  5. portaudio
    • bindings to the PortAudio library
    • probably the most battle-tested audio library out there

I don’t think any of them are perfect. Ideally there should be something along these lines:

  • Support for MacOS, Windows, Linux, iOS, Android, Web
  • Absolute thread safety.
  • Support exclusive mode where available
  • Completely lock/wait free in any code that needs to run on the audio thread not supplied by a user
  • Enumerate available devices, and initiate a stream for a device.

More debatable features:

  • Support for multiple driver backends in the same target
  • Multiple streams per device (personally, I’m against this)
  • Polling and callback API
  • Audio Buffer abstraction, with interleaving/deinterleaving methods
  • MIDI
  • Syncing input/output devices in a single callback
  • Sensible defaults, optional configuration for users. Easy things are easy, complex things are possible.

I’d like to advocate for an API similar to Timur Doumler’s work on libstdaudio, and something a lot simpler than JUCE, cpal, or PortAudio.

So thoughts and opinions? Do we need a different HAL or is cpal good enough for everyone?

No obviously there needs to be a better HAL than CPAL. I agree with your list of what is required. As for the debated features:

Support for multiple driver backends in the same target

I can see an argument for this on Linux maybe with Jack and Alsa. Though many audio applications only support Jack in practice.

Multiple streams per device (personally, I’m against this)

Sorry could you elaborate on this. I’m not quite sure what you’re getting at. I can guess, but my guess may be wrong.

Polling and callback API

Which would you prefer? I’m assuming polling as a callback API can be built on top of that in a separate crate?

Audio Buffer abstraction, with interleaving/deinterleaving methods

Separate crate. This library should probably offer it as an option, but allowing people to substitute their own. My only question is what do you get by default from coreaudio/Jack/etc?

MIDI

I feel very strongly that this should be in a separate crate. There is a cross platform library (Midir) that does this, though I don’t love the API.

Syncing input/output devices in a single callback

No - that seems unnecessary.

Sensible defaults, optional configuration for users. Easy things are easy, complex things are possible.

My problem with this is that there’s such a huge range of things that people might want to do, that I’m not sure how feasible this is. But maybe I’m being pessimistic. Either way - this seems like something to put off until late in the project.

I can see an argument for this on Linux maybe with Jack and Alsa

Windows is also a bit of a minefield (WASAPI, DirectSound, ASIO).

In terms of multiple streams, the EventLoop abstraction in cpal is (afaict) designed to allow multiple streams/callbacks on the same device. I think most drivers handle this internally, but it should probably be investigated in case there are hiccups with multiple streams from the same process.

Which would you prefer? I’m assuming polling as a callback API can be built on top of that in a separate crate?

In the C++ code I linked, they look at how driver APIs are either polling or interrupt (callback). Notably, WASAPI supports a polling API. From the cpal issues I think this is related to the wasm targets as well. I’d prefer callbacks, since they’re more ubiquitous and usually lower latency, but polling may be necessary for total coverage. That said, it wouldn’t be horribly difficult to put a callback API on top of a polling backend.

This library should probably offer it as an option, but allowing people to substitute their own. My only question is what do you get by default from coreaudio/Jack/etc

Well the issue to me is that if you have users provide a callback to the drivers, the audio buffer needs to be an argument for the callback. If we want to keep things memory/thread safe, we need some kind of abstraction since the drivers pass back a void* that you need to cast. CoreAudio is also a little different in that it doesn’t pass back a single buffer, but an AudioBufferList* which could contain more than one buffer. I’m not sure of the conditions for this to happen, RtAudio seems to assume there is one buffer, but there’s some confusing logic in the source code.

Alright here’s a draft

The motivation for using [u8; 64] for the device name, and iterators for enumerating things is to avoid allocations in the HAL layer. Thoughts?