AES 148 – A Digital Vienna

Written jointly by Aggela Mourgela and JT Colonel

#VirtualVienna

The AES hosted its 148th international conference virtually this year. Despite the circumstances we find ourselves in due to covid-19, the conference put up an excellent program filled with informative talks, tutorials, and demonstrations. Below is a round-up of our favourite presentations, which run the gamut from incredibly technical talks regarding finite arithmetic systems to highly creative demonstrations of an augmented reality installation.

Tuesday

The first session on Tuesday morning, Active Sensing and Slow Listening was held by Thomas Lund & Susan E. Rogers, discussing principles of active sensing and slow listening as well as their role in pro audio product development. Lund kicked the session off by introducing the theory behind sound cognition as well as discussing about the afferent & efferent function of the brain with regards to sound perception. The session was then picked up by Rogers, who described the auditory pathway and its bidirectionality in more detail, presenting the parts of the brain engaging in sonic cognition. Rogers touched on the subjects of proprioception, the awareness of our bodies and interoception, the awareness of our feelings as well as the role of expectation when studying our responses to sound. To conclude both presenters pointed out that we should not treat listening as passive or uni-dynamic both external and internal factors influence the way we hear.   

Diagram showing the development of the tympanic ear across different geologic eras discussed in the Active Sensing and Slow Listening demonstration

Later in the day, Jamie Angus presented on Audio Signal Processing in the Real World: Dealing with the Effects of Finite Precision. At the center of the talk was a fundamental question: how does finite precision affect audio processing. Angus went into full detail regarding different finite precision arithemtics, i.e. fractional floating-point, and derived how the noise introduced by these systems impact filter design. 

The 3rd MATLAB Student Design Competition was hosted by Gabriele Bunkheila. Using the example of a stereo width expander, Bunkheila demonstrated the process of turning a simple offline MATLAB script into a real time audioPlugin class, using MATLAB’s inbuilt audio test benching app. He then proceeded to talk about C++ code generation, validation and export of the code into a VST plugin format, for use in a conventional digital audio workstation.  Bunkheila also demonstrated a simple GUI generation using MATLAB’s audioPluginInterface functionality. 

Wednesday

On Wednesday, Thomas Lund and Hyunkook Lee discussed the shift from stereo to immersive multi-channel audio in their talk Goodbye Stereo. First, Lund discussed the basics of spatial perception, the limitations of stereo in audio recording and reproduction, frequency related aspects of spatial audio, and the standards being implemented in immersive audio. Lee went on to discuss the psychoacoustic principles that apply to immersive audio as well as the differences between stereo and 3D. He expanded on limitations arising from microphones due to placement or internal characteristics and proceeded to discuss microphone array configurations that his research group is working on. The presentation was followed by a set of truly impressive immersive recordings, made in various venues with different microphone configurations and the audience was prompted to use headphones to experience them. Lee finished by introducing a 3D recorded database which will include room impulse responses, available for spatial audio research.

In his talk The Secret Life of Low Fequencies, Bruce Black discussed the trials and tribulations of acoustically treating rooms while paying special attention to their low frequency response. Black discussed the particle propagation and wave propagation models of sound transmission, and how they each require different treatments. He called specific attention to how the attenuation across low frequencies of a sound can change over the course of 200-400ms within a room. Black went on to show how Helmholtz resonators can be strategically placed in a space to smooth these uneven attenuations.

Marisa Hoeschele gave a very interesting keynote lecture on Audio from a Biological Perspective. Hoeschele began by discussing the concept of addressing human sounds from the perspective of the ”visiting alien”, where humans are studied as yet another species on the planet. Hoeschele discussed observations on shared emotional information and how we can identify sonic attributes corresponding to stress level/ excitement  across species. She then proceeded to discuss the ways in which we can study musicality as an innate human characteristic, as well as commonalities across cultures. Hoeschele then discussed ways in which other animals can inform us on musicality, by giving examples of experiments on the animals’ ability to respond to musical attributes like octave equivalence, as well as search correlations with human behavior.

Thursday

On Thursday, Brian Gibbs gave a workshop on spatial audio mixing, using a demo mix of Queen’s Bohemian Rhapsody. He began his presentation with a short discussion about the basics of spatial audio, presenting the concept of recording spatial audio from scratch or spatialization of audio recording in the studio. Gibbs also talked about the Ambix and Fuma renderers, while he also discussed higher order ambisonics and the MPEG-H format. He then proceeded to introduce the importance of loudness, giving a brief talk and demonstration of using LUFS in metering. Finally he discussed the importance of being aware of the platform or format that your work is going to end up, emphasizing on different streaming services and devices and their requirements. He ended the workshop with a listening session, where he presented Bohemian Rhapsody mixed alternating between mono, stereo and static 360 audio for his audience. 

Later, Thomas Aichinger presented Immersive Storytelling: Narrative Aspects in AR Audio Applications. Meant to be an AR installation in Vienna to coincide with the conference, Aichinger outlined the development and implementation of “SONIC TRACES,” a piece where participants would navigate an audio-based story set in the Heldenplatz. Aichinger described the difficulties and how his team overcame issues such as GPS tracking of users in the plaza, how to incorporate six degrees of freedom in motion tracking, and tweaking signal attenuation to suit the narrative. 

Thomas Aichinger’s rendering of the Heldenplatz in Unity, which is the game engine he and his team used to construct the AR experience

Friday

On the final day of the conference, Gabriele Bunkheila gave a keynote speech on Deep Learning for Audio Applications – Engineering Best Practices for Data. While deep learning applications are most frequently implemented in Python, Bunkheila made a compelling case for using MATLAB. Bunkheila pointed out the discrepancies between deep learning approaches in academia and in industry: namely, that academia focuses primarily on theorizing and developing new models, whereas industry devotes much more time to dataset construction and scrubbing. Moreover, he mentioned how deep learning for audio should not necessarily follow the best practices laid out by the image recognition community. For example, when applying noise to an audio dataset, one ought to simulate the environment in which one expects to deploy their deep learning system. So, for audio applications it makes much more sense to apply reverb or add speech to obscure your signal rather than Gaussian noise. 

In Closing

Though this conference looked nothing like what anyone expected at the beginning of this year, the AES demonstrated its ability to adapt to new paradigms and technologies. Here’s to hoping that the AES will be able to resume in-person conferences soon. In the meantime, the AES will continue its strong tradition of providing a platform for groundbreaking audio technologies and educational opportunities virtually.