Research highlights for the AES Show Fall 2020



We try to write a preview of the technical track for almost all recent Audio Engineering Society (AES) Conventions, see our entries on the 142nd, 143rd, 144th, 145th147th and 148th Conventions. Like the 148th Convention, the 149th convention, or just the AES Show, is an online event. But one challenge with these sorts of online events is that anything not on the main live stream can get overlooked. The technical papers are available on demand. So though many people can access them, perhaps more than would attend the presentation in person if possible. But they don’t have the feel of an event.

Hopefully, I can give you some idea of the exciting nature of these technical papers. And they really do present a lot of cutting edge and adventurous research. They unveil, for the first time some breakthrough technologies, and both surprising and significant advances in our understanding of audio engineering and related fields.

This time, since all the research papers are available throughout the Convention and beyond, starting Oct. 28th, I haven’t organised them by date. Instead, I’ve divided them into the regular technical papers (usually longer, with more reviewing), and the Engineering Briefs, or E-briefs. The E-briefs are typically smaller, often presenting work-in-progress, late-breaking or just unusual research. Though this time, the unusual appears in the regular papers too.

But first… listening tests. Sooner or later, almost every researcher has to do them. And a good software package will help the whole process run easier. There are two packages presented at the convention. Dale Johnson will present the next generation of a high quality one in the E-Brief ‘HULTI-GEN Version 2 – A Max-based universal listening test framework’. And Stefan Gorzynski will present the paper ‘A flexible software tool for perceptual evaluation of audio material and VR environments’.


A must for audio educators is Brett Leonard’s ‘A Survey of Current Music Technology & Recording Arts Curriculum Order’. These sorts of programs are often ‘made up’ based on the experience and knowledge of the people involved. Brecht surveyed 35 institutions and analysed the results to establish a holistic framework for the structure of these degree programmes.

The idea of time-stretching as a live phenomenon might seem counterintuitive. For instance, how can you speed up a signal if its only just arriving? And if you slow it down, then surely after a while it lags far enough behind that it is no longer ‘live’. A novel solution is explored in Colin Malloy’s ‘An approach for implementing time-stretching as a live realtime audio effect

The wonderfully titled ‘A Terribly Good Speaker: Understanding the Yamaha NS-10 Phenomenon,’ is all about how and why a low quality loudspeaker with bad reviews became seen as a ‘must have’ amongst many audio professionals. It looks like this presentation will have lessons for those who study marketing, business trends and consumer psychology in almost any sector, not just audio.

Just how good are musicians at tuning their instruments? Not very good, it seems. Or at least, that was what was found out in ‘Evaluating the accuracy of musicians and sound engineers in performing a common drum tuning exercise’, presented by Rob Toulson. But before you start with your favourite drummer joke, note that the participants were all experienced musicians or sound engineers, but not exclusively drummers. So it might be that everyone is bad at drum tuning, whether they’re used to carrying drumsticks around or not.

Matt Cheshire’s ‘Snare Drum Data Set (SDDS): More snare drums than you can shake a stick at’ is worth mentioning just for the title.

Champ Darabundit will present some interesting work on ‘Generalized Digital Second Order Systems Beyond Nyquist Frequency’, showing that the basic filter designs can be tuned to do a lot more than just what is covered in the textbooks. Its interesting and good work, but I have a minor issue with it. The paper only has one reference that isn’t a general overview or tutorial. But there’s lots of good, relevant related work, out there.

I’m involved in only one paper at this convention (shame!). But its well worth checking out. Angeliki Mourgela is presenting ‘Investigation of a Real-Time Hearing Loss Simulation for Audio Production’. It builds on an initial hearing loss simulator she presented at the 147th Convention, but now its higher quality, real-time and available as a VST plugin. This means that audio producers can easily preview what their content would sound like to most listeners with hearing loss.

Masking is an important and very interesting auditory phenomenon. With the emergence of immersive sound, there’s more and more research about spatial masking. But questions come up, like whether artificially panning a source to a location will result in masking the same way as actually placing a source at that location. ‘Spatial auditory masking caused by phantom sound images’, presented by Masayuki Nishiguchi, will show how spatial auditory masking works when sources are placed at virtual locations using rendering techniques.

Technical papers

There’s a double bill presented by Hsein Pew, ‘Sonification of Spectroscopic analysis of food data using FM Synthesis’ and ‘A Sonification Algorithm for Subjective Classification of Food Samples.’ They are unusual papers, but not reallly about classifying food samples. The focus is on the sonification method, which turns data into sounds, allowing listeners to easily discriminate between data collections.

Wow. When I first saw Moorer in the list of presenting authors, I thought ‘what a great coincidence that a presenter has the same last name as one of the great legends in audio engineering. But no, it really is James Moorer. We talked about him before in our blog about the greatest JAES papers of all time. And the abstract for his talk, ‘Audio in the New Millenium – Redux‘, is better than anything I could have written about the paper. He wrote, “In the author’s Heyser lecture in 2000, technological advances from the point of view of digital audio from 1980 to 2000 were summarized then projected 20 years into the future. This paper assesses those projections and comes to the somewhat startling conclusion that entertainment (digital video, digital audio, computer games) has become the driver of technology, displacing military and business forces.”

The paper with the most authors is presented by Lutz Ehrig. And he’ll be presenting a breakthrough, the first ‘Balanced Electrostatic All-Silicon MEMS Speakers’. If you don’t know what that is, you’re not alone. But its worth finding out, because this may be tomorrow’s widespread commercial technology.

If you recorded today, but only using equipment from 1955, would it really sound like a 65 year old recording? Clive Mead will present ‘Composing, Recording and Producing with Historical Equipment and Instrument Models’ which explores just that sort of question. He and his co-authors created and used models to simulate the recording technology and instruments, available at different points in recorded music history.

Degradation effects of water immersion on earbud audio quality,’ presented by Scott Beveridge, sounds at first like it might be very minor work, dipping earbuds in water and then listening to distorted sound from them. But I know a bit about the co-authors. They’re the type to apply rigorous, hardcore science to a problem. And it has practical applications too, since its leading towards methods by which consumers can measure the quality of their earbuds.

Forensic audio is a fascinating field, though most people have only come across it in film and TV shows like CSI, where detectives identify incriminating evidence buried in a very noisy recording. In ‘Forensic Interpretation and Processing of User Generated Audio Recordings’, audio forensics expert Rob Maher looks at how user generated recordings, like when many smartphones record a shooting, can be combined, synchronised and used as evidence.

Mark Waldrep presents a somewhat controversial paper, ‘Native High-Resolution versus Red Book Standard Audio: A Perceptual Discrimination Survey’. He sent out high resolution and CD quality recordings to over 450 participants, asking them to judge which was high resolution. The overall results were little better than guessing. But there were a very large number of questionable decisions in his methodology and interpretation of results. I expect this paper will get the online audiophile community talking for quite some time.

Neural networks are all the rage in machine learning. And for good reason- for many tasks, they outperform all the other methods. There are three neural network papers presented, Tejas Manjunath’s ‘Automatic Classification of Live and Studio Audio Recordings using Convolutional Neural Networks‘, J. T. Colonel’s (who is now part of the team behind this blog) ‘Low Latency Timbre Interpolation and Warping using Autoencoding Neural Networks’ and William Mitchell’s ‘Exploring Quality and Generalizability in Parameterized Neural Audio Effects‘.

The research team here did some unpublished work that seemed to suggest that the mix had only a minimal effect on how people respond to music for untrained listeners, but this became more significant with trained sound engineers and musicians. Kelsey Taylor’s research suggests there’s a lot more to uncover here. In ‘I’m All Ears: What Do Untrained Listeners Perceive in a Raw Mix versus a Refined Mix?’, she performed structured interviews and found that untrained listeners perceive a lot of mixing aspects, but use different terms to describe it.

No loudness measure is perfect. Even the well established ones, like ITU 1770 for broadcast content, or the Glasberg Moore auditory model of loudness perception, see here and, have been noted before. In ‘Using ITU-R BS.1770 to Measure the Loudness of Music versus Dialog-based Content’, Scott Norcross shows another issue with the ITU loudness measure, the difficulty in matching levels for speech and music.

Staying on the subject of loudness, Kazuma Watanabe presents ‘The Reality of The Loudness War in Japan -A Case Study on Japanese Popular Music’. This loudness war, the overuse of dynamic range compression, has resulted in lower quality recordings (and annoyingly loud TV and radio ads). It also led to measures like the ITU standard. Watanabe and co-authors measured the increased loudness over the last 30 years, and make a strong

Remember to check the AES E-Library which has all the full papers for all the presentations mentioned here, including listing all authors not just presenters. And feel free to get in touch with us. Josh Reiss (author of this blog entry), J. T. Colonel, and Angeliki Mourgela from the Audio Engineering research team within the Centre for Digital Music, will all be (virtually) there.

Congratulations, Dr. Marco Martinez Ramirez

Today one of our PhD student researchers, Marco Martinez Ramirez, successfully defended his PhD. The form of these exams, or vivas, varies from country to country, and even institution to institution, which we discussed previously. Here, its pretty gruelling; behind closed doors, with two expert examiners probing every aspect of the PhD. And it was made even more challenging since it was all online due to the virus situation.
Marco’s PhD was on ‘Deep learning for audio effects modeling.’

Audio effects modeling is the process of emulating an audio effect unit and seeks to recreate the sound, behaviour and main perceptual features of an analog reference device. Both digital and analog audio effect units  transform characteristics of the sound source. These transformations can be linear or nonlinear, time-invariant or time-varying and with short-term and long-term memory. Most typical audio effect transformations are based on dynamics, such as compression; tone such as distortion; frequency such as equalization; and time such as artificial reverberation or modulation based audio effects.

Simulation of audio processors is normally done by designing mathematical models of these systems. Its very difficult because it seeks to accurately model all components within the effect unit, which usually contains mechanical elements together with nonlinear and time-varying analog electronics. Most audio effects models are either simplified or optimized for a specific circuit or  effect and cannot be efficiently translated to other effects.

Marco’s thesis explored deep learning architectures for audio processing in the context of audio effects modelling. He investigated deep neural networks as black-box modelling strategies to solve this task, i.e. by using only input-output measurements. He proposed several different DSP-informed deep learning models to emulate each type of audio effect transformations.

Marco then explored the performance of these models when modeling various analog audio effects, and analyzed how the given tasks are accomplished and what the models are actually learning. He investigated virtual analog models of nonlinear effects, such as a tube preamplifier; nonlinear effects with memory, such as a transistor-based limiter; and electromechanical nonlinear time-varying effects, such as a Leslie speaker cabinet and plate and spring reverberators.

Marco showed that the proposed deep learning architectures represent an improvement of the state-of-the-art in black-box modeling of audio effects and the respective directions of future work are given.

His research also led to a new start-up company, TONZ, which build on his machine learning techniques to provide new audio processing interactions for the next generation of musicians and music makers.

Here’s a list of some of Marco’s papers that relate to his PhD research while a member of the Intelligent Sound Engineering team.

Congratulations again, Marco!