Advanced Audio Coding


Advanced Audio Coding is an audio coding standard for lossy digital audio compression. It was developed by Dolby, AT&T, Fraunhofer and Sony, originally as part of the MPEG-2 specification but later improved under MPEG-4. AAC was designed to be the successor of the MP3 format and generally achieves higher sound quality than MP3 at the same bit rate. AAC encoded audio files are typically packaged in an MP4 container most commonly using the filename extension .m4a.
The basic profile of AAC is called AAC-LC. It is widely supported in the industry and has been adopted as the default or standard audio format on products including Apple's iTunes Store, Nintendo's Wii, DSi and 3DS and Sony's PlayStation 3. It is also further supported on various other devices and software such as iPhone, iPod, PlayStation Portable and Vita, PlayStation 5, Android and older cell phones, digital audio players like Sony Walkman and SanDisk Clip, media players such as VLC, Winamp and Windows Media Player, various in-dash car audio systems, and is used on Spotify, Google Nest, Amazon Alexa. Apple Music, YouTube and also YouTube Music streaming services. AAC has been further extended into HE-AAC, which improves efficiency over AAC-LC. Another variant is AAC-LD.
AAC supports inclusion of 48 full-bandwidth audio channels in one stream plus 16 low frequency effects channels, up to 16 "coupling" or dialog channels, and up to 16 data streams. The quality for stereo is satisfactory to modest requirements at 96 kbit/s in joint stereo mode; however, hi-fi transparency demands data rates of at least 128 kbit/s. Tests of MPEG-4 audio have shown that AAC meets the requirements referred to as "transparent" for the ITU at 128 kbit/s for stereo, and 384 kbit/s for 5.1 audio. AAC uses only a modified discrete cosine transform algorithm, giving it higher compression efficiency than MP3, which uses a hybrid coding algorithm that is part MDCT and part FFT.

History

Background

The discrete cosine transform, a type of transform coding for lossy compression, was proposed by Nasir Ahmed in 1972, and developed by Ahmed with T. Natarajan and K. R. Rao in 1973, publishing their results in 1974. This led to the development of the modified discrete cosine transform, proposed by J. P. Princen, A. W. Johnson and A. B. Bradley in 1987, following earlier work by Princen and Bradley in 1986. The MP3 audio coding standard introduced in 1992 used a hybrid coding algorithm that is part MDCT and part FFT. AAC uses a purely MDCT algorithm, giving it higher compression efficiency than MP3. Development further advanced when Lars Liljeryd introduced a method that radically shrank the amount of information needed to store the digitized form of a song or speech.
AAC was developed with cooperation between AT&T Labs, Dolby, Fraunhofer IIS and Sony Corporation. AAC was officially declared an international standard by the Moving Picture Experts Group in April 1997. It is specified both as Part 7 of the MPEG-2 standard, and Subpart 4 in Part 3 of the MPEG-4 standard. Further companies have contributed to development in later years including Bell Labs, LG Electronics, NEC, Nokia, Panasonic, ETRI, JVC Kenwood, Philips, Microsoft, and NTT.

Standardization

In 1997, AAC was first introduced as MPEG-2 Part 7, formally known as ISO/IEC 13818-7:1997. This part of MPEG-2 was a new part, since MPEG-2 already included MPEG-2 Part 3, formally known as ISO/IEC 13818-3: MPEG-2 BC. Therefore, MPEG-2 Part 7 is also known as MPEG-2 NBC, because it is not compatible with the MPEG-1 audio formats.
MPEG-2 Part 7 defined three profiles: Low-Complexity profile, Main profile and Scalable Sampling Rate profile. AAC-LC profile consists of a base format very much like AT&T's Perceptual Audio Coding coding format, with the addition of temporal noise shaping, the Kaiser window, a nonuniform quantizer, and a reworking of the bitstream format to handle up to 16 stereo channels, 16 mono channels, 16 low-frequency effect channels and 16 commentary channels in one bitstream. The Main profile adds a set of recursive predictors that are calculated on each tap of the filterbank. The SSR uses a 4-band PQMF filterbank, with four shorter filterbanks following, in order to allow for scalable sampling rates.
In 1999, MPEG-2 Part 7 was updated and included in the MPEG-4 family of standards and became known as MPEG-4 Part 3, MPEG-4 Audio or ISO/IEC 14496-3:1999. This update included several improvements. One of these improvements was the addition of Audio Object Types which are used to allow interoperability with a diverse range of other audio formats such as TwinVQ, CELP, HVXC, speech synthesis and MPEG-4 Structured Audio. Another notable addition in this version of the AAC standard is Perceptual Noise Substitution. In that regard, the AAC profiles are combined with perceptual noise substitution and are defined in the MPEG-4 audio standard as Audio Object Types. MPEG-4 Audio Object Types are combined in four MPEG-4 Audio profiles: Main, Scalable, Speech and Low Rate Synthesis.
The reference software for MPEG-4 Part 3 is specified in MPEG-4 Part 5 and the conformance bit-streams are specified in MPEG-4 Part 4. MPEG-4 Audio remains backward-compatible with MPEG-2 Part 7.
The MPEG-4 Audio Version 2 defined new audio object types: the low delay AAC object type, bit-sliced arithmetic coding object type, parametric audio coding using harmonic and individual line plus noise and error resilient versions of object types. It also defined four new audio profiles: High Quality Audio Profile, Low Delay Audio Profile, Natural Audio Profile and Mobile Audio Internetworking Profile.
The HE-AAC Profile and AAC Profile were first standardized in ISO/IEC 14496-3:2001/Amd 1:2003. The HE-AAC v2 Profile was first specified in ISO/IEC 14496-3:2005/Amd 2:2006. The Parametric Stereo audio object type used in HE-AAC v2 was first defined in ISO/IEC 14496-3:2001/Amd 2:2004.
The current version of the AAC standard is defined in ISO/IEC 14496-3:2009.
AAC+ v2 is also standardized by ETSI as TS 102005.
The MPEG-4 Part 3 standard also contains other ways of compressing sound. These include lossless compression formats, synthetic audio and low bit-rate compression formats generally used for speech.

AAC's improvements over MP3

Advanced Audio Coding is designed to be the successor of the MPEG-1 Audio Layer 3, known as MP3 format, which was specified by ISO/IEC in 11172-3 and 13818-3.
Improvements include:
  • more sample rates than MP3 ;
  • up to 48 channels ;
  • arbitrary bit rates and variable frame length. Standardized constant bit rate with bit reservoir;
  • higher efficiency and simpler filter bank. AAC uses a pure MDCT, rather than MP3's hybrid coding ;
  • higher coding efficiency for stationary signals ;
  • higher coding accuracy for transient signals ;
  • possibility to use Kaiser-Bessel derived window function to eliminate spectral leakage at the expense of widening the main lobe;
  • much better handling of audio frequencies above 16 kHz;
  • more flexible joint stereo ;
  • additional modules added to increase compression efficiency: TNS, backwards prediction, perceptual noise substitution, etc. These modules can be combined to constitute different encoding profiles.
Overall, the AAC format allows developers more flexibility to design codecs than MP3 does, and corrects many of the design choices made in the original MPEG-1 audio specification. This increased flexibility often leads to more concurrent encoding strategies and, as a result, to more efficient compression. This is especially true at very low bit rates where the superior stereo coding, pure MDCT, and better transform window sizes leave MP3 unable to compete.

Adoption

While the MP3 format has near-universal hardware and software support, primarily because MP3 was the format of choice during the crucial first few years of widespread music file-sharing/distribution over the internet, AAC remained a strong contender due to some unwavering industry support. Due to MP3's dominance, adoption of AAC was initially slow. The first commercialization was in 1997 when AT&T Labs launched a digital music store with songs encoded in MPEG-2 AAC. HomeBoy for Windows was one of the earliest available AAC encoders and decoders.
Dolby Laboratories came in charge of AAC licensing in 2000. A new licensing model was launched by Dolby in 2002, while Nokia became a fifth co-licenser of the format. Dolby itself also marketed its own coding format, Dolby AC-3.
Nokia started supporting AAC playback on devices as early as 2001, but it was the exclusive use of AAC by Apple Computer for their iTunes Store which accelerated attention to AAC. Soon the format was also supported by Sony for their PlayStation Portable, and music-oriented cell phones from Sony Ericsson, beginning with the Sony Ericsson W800. The Windows Media Audio format, from Microsoft, was considered to be AAC's main competitor.
By 2017, AAC was considered to have become a de facto industry standard for lossy audio.

Functionality

AAC is a wideband audio coding algorithm that exploits two primary coding strategies to dramatically reduce the amount of data needed to represent high-quality digital audio:
  • Signal components that are perceptually irrelevant are discarded.
  • Redundancies in the coded audio signal are eliminated.
The actual encoding process consists of the following steps:
  • The signal is converted from time-domain to frequency-domain using forward modified discrete cosine transform. This is done by using filter banks that take an appropriate number of time samples and convert them to frequency samples.
  • The frequency domain signal is quantized based on a psychoacoustic model and encoded.
  • Internal error correction codes are added.
  • The signal is stored or transmitted.
  • In order to prevent corrupt samples, a modern implementation of the Luhn mod N algorithm is applied to each frame.
The MPEG-4 audio standard does not define a single or small set of highly efficient compression schemes but rather a complex toolbox to perform a wide range of operations from low bit rate speech coding to high-quality audio coding and music synthesis.
  • The MPEG-4 audio coding algorithm family spans the range from low bit rate speech encoding to high-quality audio coding.
  • AAC offers sampling frequencies between 8 kHz and 96 kHz and any number of channels between 1 and 48.
  • In contrast to MP3's hybrid filter bank, AAC uses the modified discrete cosine transform together with the increased window lengths of 1024 or 960 points.
AAC encoders can switch dynamically between a single MDCT block of length 1024 points or 8 blocks of 128 points.
  • If a signal change or a transient occurs, 8 shorter windows of 128/120 points each are chosen for their better temporal resolution.
  • By default, the longer 1024-point/960-point window is otherwise used because the increased frequency resolution allows for a more sophisticated psychoacoustic model, resulting in improved coding efficiency.