MLow: Meta’s low bitrate audio codec

MLow: Meta’s low bitrate audio codec
MLow: Meta’s low bitrate audio codec
  • At Meta, we assist real-time communication (RTC) for billions of individuals by way of our apps, together with WhatsApp, Instagram, and Messenger. 
  • We’re working to make RTC accessible by offering a high-quality expertise for everybody – even those that won’t have the quickest connections or the newest telephones.
  • As increasingly individuals have relied on our merchandise to make calls through the years, we’ve been engaged on new methods to make sure all calls have a strong audio high quality.
  • We’ve constructed the Meta Low Bitrate (MLow) codec: a brand new software that improves audio high quality particularly for these on slow-speed connections.
Determine 1: Rising complexity or bitrate often improves high quality, however good codecs obtain larger high quality whereas balancing the opposite two.

RTC merchandise use many constructing blocks to ship the total expertise, and one of many important parts is audio/video codecs. These codecs assist compress the captured audio/video knowledge so it may be despatched throughout the web effectively to the recipient, maintaining the expertise actual time. For instance, the scale of uncooked audio captured for a typical name is 768 kbps (mono, sampling at 48kHz, bit depth 16), which trendy codecs are in a position to compress all the way down to 25-30 kbps. Typically this compression comes at the price of some high quality (lack of data), however good codecs can strike a stability among the many trio of high quality, bitrate, and complexity by exploiting deep information concerning the nature of the audio sign in addition to through the use of psychoacoustics. 

Constructing an excellent codec is sort of difficult, and that’s the reason we don’t see new codecs rising fairly often. The final extensively identified, good open-source codec was Opus, launched in 2012, which has turn into the codec of selection for the big variety of purposes on the web. Meta has used Opus for all its RTC wants, and thus far it has served us properly – serving to to ship high quality calls to billions of customers throughout the globe. 

Our motivation for constructing a brand new codec

Given the large scale of RTC utilization in Meta merchandise, we get to see how a codec performs in a spread of community situations and the way it impacts the top person’s expertise. Specifically, we’ve noticed {that a} vital chunk of calls have poor community connections all through or for a part of a name. Usually a bandwidth estimation module (BWE) detects the standard of the community, and because the community high quality degrades, we have to decrease the codec working bitrate to keep away from congesting the community and maintain the audio flowing – impacting the trio stability referenced above. Complicating issues, conducting a video name regardless of poor community high quality leaves little room for audio and pushes the audio bitrate additional down. The bottom working level for Opus is 6 kbps, at which it runs in NarrowBand mode (0 – 4kHz) and doesn’t adequately seize all of the sound frequencies produced by human voices—and so doesn’t sound as clear or pure. Right here is an instance of how Opus sounds at 6kbps and the corresponding reference file for comparability.

Uncooked reference sign: 

Opus @ 6 kbps NarrowBand (NB): 

During the last two years, now we have seen improvement of some new machine studying (ML)-based audio codecs that present good high quality audio at very low bitrates. In October of 2022, Meta launched Encodec, which achieves amazingly crisp audio high quality at very low bitrates. Whereas these AI/ML-based codecs are in a position to obtain nice high quality at low bitrates, it typically comes on the expense of heavy computational price. Consequently, solely the very high-end (costly) cellular handsets are in a position to run these codecs reliably, whereas customers operating on lower-end units proceed to expertise audio high quality points in low-bitrate situations. So the web influence of those newer computationally costly codecs is definitely restricted to a small portion of customers.

A major variety of our customers nonetheless use low-end units. For instance, greater than 20 p.c of our calls are made on ARMv7 units, and 10’s of thousands and thousands of every day calls on WhatsApp are on 10-year-old-plus units. Given the available codec decisions and our dedication to make sure that all customers – no matter what system they’re on – have a top quality calling expertise, we clearly want a codec with very low-compute necessities that also delivers high-quality audio at these lowest bitrates.

The MLow codec

We broke floor with our improvement of a brand new codec in late 2021. After almost two years of energetic improvement and testing, we’re proud to announce Meta Low Bitrate audio codec, aka MLow, which achieves two-times-better high quality than Opus (POLQA MOS 1.89 vs 3.9 @ 6kbps WB). Much more importantly, we’re in a position to obtain this nice high quality whereas maintaining MLow’s computational complexity 10 p.c decrease than that of Opus. 

Determine 2 under exhibits a MOS (Imply Opinion Rating) plot on a 1-5 scale and compares the POLQA scores between Opus and MLow at numerous bitrates. Because the chart makes evident, MLow has an enormous benefit over Opus on the lowest bitrates, the place it saturates high quality quicker than Opus.

Determine 2: POLQA rating evaluating Opus (WB) versus MLow at numerous bitrates throughout a big dataset of recordsdata.

Now we have already totally launched MLow to all Instagram and Messenger calls and are actively rolling it out on WhatsApp—and we’ve already seen unimaginable enchancment in person engagement pushed by higher audio high quality.

Listed here are some audio samples so that you can take heed to. We propose that you just use your favourite pair of headphones to understand the putting audio-quality variations.

Opus 6 kbps NB MLow 6 kbps WB Reference

Having the ability to encode high-quality audio at decrease bitrates additionally unlocks more practical Ahead Error Correction (FEC) methods. In contrast with Opus, with MLow we will afford to pack FEC at a lot decrease bitrates, which considerably helps to enhance the audio high quality in packet loss situations. 

Listed here are two audio samples at 14 kbps with heavy 30 p.c receiver-side packet loss.

Opus:

Observe that at these bitrates, Opus just isn’t in a position to encode any inband FEC. It wants a minimal of 19 kbps to encode any inband FEC at 10 p.c packet loss, which hurts the audio restoration.

MLow internals

MLow builds on the ideas of a traditional CELP (Code Excited Linear Prediction) codec with developments round excitation era, parameter quantization, and coding schemes. Determine 3 is a high-level visible of how the codec works internally. On the left now we have an enter sign (uncooked PCM audio) feeding into the encoder, which then splits the sign into two low and high-frequency bands. Then, every band is encoded individually whereas making use of shared data to realize higher compression. All of the output is handed by way of a spread encoder to additional compress and generate an encoded payload. The decoder does the precise reverse when given the payload to generate output audio indicators.

Determine 3: Excessive degree MLow encoder and decoder structure.

With these split-band optimizations, we’re in a position to encode the excessive band utilizing only a few bits, which lets MLow ship SuperWideBand (32kHz sampling) utilizing a a lot decrease bitrate.

What’s subsequent?

MLow has drastically enhanced audio high quality on low-end units whereas nonetheless guaranteeing calls are end-to-end encrypted. We’re actually enthusiastic about what now we have completed in simply the final two years—from creating a brand new codec to efficiently transport it to billions of customers across the globe. We’re persevering with to work on bettering the audio restoration in heavy packet loss networks by pumping out extra redundant audio, which MLow permits us to do effectively. We’re excited to share extra as we proceed working to make it simpler for all our customers to make high quality audio calls.