Immersive audio has everyone wrapped in sound

Features

02/05/2024

Paul Mac looks at the state-of-the-art in immersive audio and gathers comments from technologists and creatives on its future potential.

Immersive audio, despite its current multiplicity of forms, variations, and standards, does seem to have settled on its own ambitious aim - a spherical, encompassing representation of sound in an environment - real or otherwise. It’s the experience of being inside a space - of experiencing that space, directional information, and discrete sources as if the speakers aren’t there at all. How close you get to that depends on many factors, from content creation at one end, to reproduction at the other; it depends on the limitations of the space you’re reproducing the content in, and it very much depends on budget.

In talking about the potential of immersive audio, we shouldn’t have to limit ourselves to bespoke, high-budget venues and sound design. It can add value anywhere there is an audio experience: cinema, games, virtual and augmented reality tech, TV and video streaming, sports broadcast, virtual acoustics, every music genre, museums, theatre, clubs, and yes - the large venue experiences; even headphone-led visitor experiences and tours.

Spherical entertainment
However, it is the grand gestures that make the biggest headlines and tend to lead the way. In cinema, for instance, early movie offerings like Brave and Gravity rung in Dolby Atmos to great effect. In the visitor experience space it’s unlikely you haven’t read about Sphere, Las Vegas - a cinematic experience designed by Sphere Studios in partnership with Holoplot.

The Sphere Immersive Sound system is claimed to be the world’s largest concert-grade audio system and was specifically developed for Sphere’s unique curved interior. The numbers are spectacular: approximately 1,600 permanently installed and 300 mobile Holoplot X1 Matrix Array loudspeaker modules and a total of 167,000 individually amplified loudspeaker drivers. According to the Holoplot press release: “This results in controlled, consistent, and crystal-clear concert-grade audio for audiences of up to 20,000 people, providing each audience member with a truly exceptional and personalised listening experience.” The entire sound system is completely hidden behind Sphere’s 160,000 square foot interior LED display plane, which wraps up, over, and around the audience.

The three i's
An exciting aspect of the current and developing soundscape of immersive audio is how interconnected the technologies and ideas are. For most purposes, we can divide approaches to immersive audio into channel-, scene-,and object-based audio. An example of channel-based immersive audio is surround sound - particularly when you add in the height channels in formats like 7.1.4. Here you have known, discrete panning locations and channels are distributed to those channels in higher or lower proportions. Object-based audio takes that a step further and does away with the speaker channels. Instead, it pans ‘objects’ with three-dimensional coordinates. In this way, you can have as many speakers as you like - all you have to do is render the coordinate panning accurately in the speaker array that’s being used.

For scene-based audio the most useful example is ambisonics - not least because it is fantastically adaptable and being used by YouTube and in the new open source IAMF container being championed by the Alliance for Open Media, as well as being part of the Fraunhofer MPEG-H 3D Audio specification. Ambisonics is a way of encoding a complete spherical soundfield in a relatively small number of channels. ‘1st Order’ Ambisonics comprises an omnidirectional channel, plus three figure-of-eight channels. You can raise the resolution of ambisonics though the orders by adding ‘spherical harmonics’, which also has the effect of widening the sweet spot. 1st order has four channels, 2nd order has nine, 3rd order has 16 channels. Ambisonic audio is completely speaker agnostic, has a conveniently low channel count compared to the object count in object-based audio, and can be mapped to pretty much any immersive speaker arrangement.

Of course, we shouldn’t ignore the potential of headphone experiences either - encoding these formats to binaural audio can bring an immersive experience to visitor attractions that rely on headphones. Once we get into immersive audio standards, things get a little complicated. In the object-based realm, Dolby Atmos started things off in cinema and allowed it to ‘trickle down’ into broadcast, the home, and further into most devices. DTS-X is there with a comparable option, and there is also the Auro-3D format. Of course, all of these are licensed technologies, and in the case of Dolby Atmos, for example, requires compliance in the mix studio and the replay site, along with Dolby specific processes.

IAMF
Possibly the most interesting standard to emerge recently is IAMF (Immersive Audio Model and Formats), developed by the Alliance for Open Media (AOM) and heavily promoted by Google and Samsung at this year’s CES. As a container for packaging channel and ambisonic information it shows a lot of potential. First, the companies behind it are powerful; other members of the alliance include Apple, Netflix, Amazon, Microsoft, and Tencent. Second, a big motivation must be to free themselves from the costs and restrictions involved in licensed technologies. But another angle is certainly creative. Vaudeville Sound has been working with Google and Samsung on the early proof-of-concepts for IAMF. As head of immersive and
a key figure in the Vaudeville’s research and development in the immersive space, Mirko Vogel, had a front seat. The production company was behind the Shutterstock ambisonic sound library - currently standing at about 60,000 assets.

This, says Vogel, led to conversations with Google and AOM. “They told us they were building the ability for every device and every platform to be able to stream high resolution and higher order ambisonic content, as well as channel based content. So I was like… ‘that’s very, very exciting’!”

The team at Vaudeville partnered with AOM, Samsung, and Google to create IAMF demos that were showcased at CES earlier this year. The demos were played out on soundbars and additional speakers and were received particularly well - gratifying for Vogel, who describes the experience: “We had this complete, accurate sphere around us. The sound moved really accurately across the room, and across the ceiling. It was one of the most immersive home theatre experiences that we’ve had. The speakers disappeared.”

The real kicker here though is that IAMF is built to be open source. It will cost no money to use the all-important codecs and API: “It allows anyone to be able to build IAMF into their own system - to build an IMF decoder or encoder into their processor or software, for example.” To scale this for larger spaces, live venues, and so on is something that excites Vogel. “The idea of being able to reproduce a high order ambisonic soundfield in a live setting is extremely exciting and something that we really want to work on, he says. “There’s a whole bunch of exciting ideas here from live esports and live streaming to installation and museum spaces.”

The tech viewpoint
A number of sound reinforcement / PA companies also have their own specific object-based formats and products designed to compliment their loudspeaker products. Adamson’s FletcherMachine does object-based mixing in conjunction with its Remote software, d&b has Soundscape, and and Meyersound’s Spacemap Go tool uses the Galileo Galaxy platform for immersive sound design and mixing.

L-Acoustics’ approach is to provide the whole ecosystem. Scott Sugden, the company’s director of product management, solutions, explains: “L-Acoustics offers a full solution for immersive audio for live events, from hardware and software that enable the live event, and software and education that supports creators and content creation. The workflow of system designers starts with our industry leading Soundvision 3D modelling software, providing analytics to evaluate the performance of an L-ISA system design for the entire audience.

“For creators, engineers and artists, our L-ISA software is available for free and can be used to create music, shows, events and more, starting at home where you can prepare an immersive mix with nothing but a laptop and a set of headphones. The exact same mix and software can scale effortlessly to a studio or a live event with thousands of people.”

The L-ISA Studio software is available free and incorporates binaural encoding and head tracking capability, so you can create and experiment with immersive audio on headphones, with a laptop.

Aki Mäkivirta, R&D director at Genelec, sees his and the businesses’ role as an ‘enabler’ for the myriad of formats and options already on the market and on their way: “Our approach is that we enable things. Do we want to create our own immersive process? At the moment I don’t think it’s really necessary because there are already so many - and there are going to be more in the future, so there will be no shortage. No matter what kind of audio system you need, we are ready for that. We already have the technology that can enable you to have what you need.” The Genelec line of Smart IP two way speakers and the new subwoofer, are Dante and AES67-connected, powered over ethernet, with configuration, supervision, and calibration tools to ensure accuracy. One of the standout features is the level these products can achieve with just PoE as a supply: “We have the capacity to store some of that power so that it can cope with the peak levels - it works because the ratio between the peaks and the RMS value of normal music or speech - the ‘crest factor’ - is high. Even the subwoofer works this way at about 30W and is pretty loud: 106dB at one metre.”

Mäkivirta acknowledges that, in turn, Audio over IP has been a great enabler for immersive audio, simply because one network connection for each speaker is all you need. For content where higher - or appropriate - speaker density can be crucial, this is key: “When people go to audio over IP, you are no longer limited by a fixed channel count,” he notes. “The only limit would be the performance of your network. Also, you can deliver audio to the loudspeakers in a very controlled fashion as audio over IP supports a high degree of synchronisation control between loudspeakers - accurate level and timing is relatively straightforward.”

Beam me every way
Holoplot’s technology uses 3D Audio-Beamforming and Wave Field Synthesis with distinctive Matrix Array loudspeakers to beam multiple sound fields simultaneously and create tightly controlled areas of sound, and even bounce sound off of reflective surfaces to create realistic phantom sources. “That way you can have an immersive experience from a system positioned in front of you,” says Natalia Szczepanczyk, segment manager for immersive and experiential applications at Holoplot. “The Lightroom project in London is a good example of that. They only gave us two speaker locations - one on each end of the room - and asked for immersive sound. We’re directing very focused beams of sound into the sidewalls, and as an audience member you hear both the direct sound, and sound coming out of the walls. So it’s a very cool way to include that immersive element.”

Szczepanczyk notes that even Holoplot tech can’t steer all frequencies, and that reflections would still be reliant on the reflecting surface, so loudspeaker position for full bandwidth is still important. However, as no venue or space is ever ideal, Holoplot can do an incredible job of accounting for imperfections - and creatively, the possibilities are exciting: “Whereas a normal loudspeaker might act like a light bulb, we can be more like a spotlight; we can choose to just illuminate one actor or one audience member or many different subjects at the same time.”

Immersive on demand
However immersive audio is delivered, it is true the demand is coming from the audience, who are rightly being spoiled by a proliferation of immersive content from smart devices, games, and home cinema. Expectation is at an all-time high. “I think audiences expect more,” says Szczepanczyk. “It’s not just the higher quality now that people expect, but I think it’s also the intricacy of sound design - how immersive and how interesting the sound is… It’s no longer okay to just have a sound as a blanket kind of cover.

“We’re seeing a lot of installations where the audience has agency; they can move around freely. The show itself is no longer just happening in front of us. There is a huge opportunity for sound to be the thing that directs us.” She notes an example in VR, where it’s difficult to direct the participant to anything behind them, without using sound. Sugden agrees: “Immersive audio can serve a number of purposes for an artist, event or tour, and in some situations, it even becomes a key storytelling medium.

“The best example might be a play or musical where audio is a critical messaging medium that can support or lead the narrative. This could be something assimple as a sound effect or more immersive soundscape to put the audience in the show, to having the actor’s voice directly connect with their position. “In other places immersive audio is a connecting technology that brings the audience and the artist(s) closer together. Shows often feel small and intimate when it feels like the artist is performing for just a few, and the effect of perfectly connecting the senses of hearing and seeing at a concert, festival, or special event changes how the audience connects with the performance.”

“The potential is certainly there,” adds Mäkivirta. “The amount of material is increasing all the time and I think that creative applications for immersive technology will increase in the future - especially with the potential impact of AI helping to create content. It may be that the time taken over creating this kind of content can be reduced dramatically.

“Somebody asked me once when this fashion for immersive audio is going to pass. I said ‘probably never’, because that’s how we hear audio. So it will never pass in that sense.”