Hearing loss simulator – MATLAB Plugin Competition Gold Award Winner

Congratulations to Angeliki Mourgela, winner of the AES Show 2020 Student Competition for developing a MATLAB plugin. The aim of the competition was for students to ‘Design a new kind of audio production VST plugin using MATLAB Software and your wits’.

Hearing loss is a global phenomenon, with almost 500 million people worldwide suffering from it, a number only increasing with an ageing population. Hearing loss can severely impact the daily life of an individual, causing both functional and emotional difficulties and affecting their overall quality of life. Research efforts towards a better understanding of its physical and perceptual characteristics, as well as the development of new and efficient methods for audio enhancement are  an essential endeavour for the future.

Angeliki developed a real-time hearing loss simulator, for use in audio production. It builds on a previous simulation, but is now real-time, low latency, and available as a stereo VST audio effect plug-in with more control and more accurate modelling of hearing loss. It offers the option of customizing threshold attenuations on each ear corresponding to the audiogram information. It also incorporates additional effects such as spectral smearing, rapid loudness growth and loss of temporal resolution on audio.

In effect, it allows anyone to hear the world as it really sounds to someone with hearing loss. And it means that audio producers can easily preview what their content would sound like to most hearing impaired listeners.

Here’s a video with Angeliki demonstrating the system.

Her plugin was also used in an episode of the BBC drama Casualty to let the audience hear the world as heard by a character with severe hearing loss.

You can download her code from the MathWorks file exchange and additional code on SoundSoftware.

Full technical details of the work and the research around it (in collaboration with myself and Dr. Trevor Agus of Queen’s University Belfast) were published in;

A. Mourgela, T. Agus and J. D. Reiss, “‘Investigation of a Real-Time Hearing Loss Simulation for Audio Production,” 149th AES Convention, 2020

Many thanks to the team from Matlab MathWorks for sponsoring and hosting the competition, and congratulations to all the other winners of the AES Student Competitions.

Research highlights for the AES Show Fall 2020

AES_FallShow2020_logo_x

#AESShow

We try to write a preview of the technical track for almost all recent Audio Engineering Society (AES) Conventions, see our entries on the 142nd, 143rd, 144th, 145th147th and 148th Conventions. Like the 148th Convention, the 149th convention, or just the AES Show, is an online event. But one challenge with these sorts of online events is that anything not on the main live stream can get overlooked. The technical papers are available on demand. So though many people can access them, perhaps more than would attend the presentation in person if possible. But they don’t have the feel of an event.

Hopefully, I can give you some idea of the exciting nature of these technical papers. And they really do present a lot of cutting edge and adventurous research. They unveil, for the first time some breakthrough technologies, and both surprising and significant advances in our understanding of audio engineering and related fields.

This time, since all the research papers are available throughout the Convention and beyond, starting Oct. 28th, I haven’t organised them by date. Instead, I’ve divided them into the regular technical papers (usually longer, with more reviewing), and the Engineering Briefs, or E-briefs. The E-briefs are typically smaller, often presenting work-in-progress, late-breaking or just unusual research. Though this time, the unusual appears in the regular papers too.

But first… listening tests. Sooner or later, almost every researcher has to do them. And a good software package will help the whole process run easier. There are two packages presented at the convention. Dale Johnson will present the next generation of a high quality one in the E-Brief ‘HULTI-GEN Version 2 – A Max-based universal listening test framework’. And Stefan Gorzynski will present the paper ‘A flexible software tool for perceptual evaluation of audio material and VR environments’.

E-Briefs

A must for audio educators is Brett Leonard’s ‘A Survey of Current Music Technology & Recording Arts Curriculum Order’. These sorts of programs are often ‘made up’ based on the experience and knowledge of the people involved. Brecht surveyed 35 institutions and analysed the results to establish a holistic framework for the structure of these degree programmes.

The idea of time-stretching as a live phenomenon might seem counterintuitive. For instance, how can you speed up a signal if its only just arriving? And if you slow it down, then surely after a while it lags far enough behind that it is no longer ‘live’. A novel solution is explored in Colin Malloy’s ‘An approach for implementing time-stretching as a live realtime audio effect

The wonderfully titled ‘A Terribly Good Speaker: Understanding the Yamaha NS-10 Phenomenon,’ is all about how and why a low quality loudspeaker with bad reviews became seen as a ‘must have’ amongst many audio professionals. It looks like this presentation will have lessons for those who study marketing, business trends and consumer psychology in almost any sector, not just audio.

Just how good are musicians at tuning their instruments? Not very good, it seems. Or at least, that was what was found out in ‘Evaluating the accuracy of musicians and sound engineers in performing a common drum tuning exercise’, presented by Rob Toulson. But before you start with your favourite drummer joke, note that the participants were all experienced musicians or sound engineers, but not exclusively drummers. So it might be that everyone is bad at drum tuning, whether they’re used to carrying drumsticks around or not.

Matt Cheshire’s ‘Snare Drum Data Set (SDDS): More snare drums than you can shake a stick at’ is worth mentioning just for the title.

Champ Darabundit will present some interesting work on ‘Generalized Digital Second Order Systems Beyond Nyquist Frequency’, showing that the basic filter designs can be tuned to do a lot more than just what is covered in the textbooks. Its interesting and good work, but I have a minor issue with it. The paper only has one reference that isn’t a general overview or tutorial. But there’s lots of good, relevant related work, out there.

I’m involved in only one paper at this convention (shame!). But its well worth checking out. Angeliki Mourgela is presenting ‘Investigation of a Real-Time Hearing Loss Simulation for Audio Production’. It builds on an initial hearing loss simulator she presented at the 147th Convention, but now its higher quality, real-time and available as a VST plugin. This means that audio producers can easily preview what their content would sound like to most listeners with hearing loss.

Masking is an important and very interesting auditory phenomenon. With the emergence of immersive sound, there’s more and more research about spatial masking. But questions come up, like whether artificially panning a source to a location will result in masking the same way as actually placing a source at that location. ‘Spatial auditory masking caused by phantom sound images’, presented by Masayuki Nishiguchi, will show how spatial auditory masking works when sources are placed at virtual locations using rendering techniques.

Technical papers

There’s a double bill presented by Hsein Pew, ‘Sonification of Spectroscopic analysis of food data using FM Synthesis’ and ‘A Sonification Algorithm for Subjective Classification of Food Samples.’ They are unusual papers, but not reallly about classifying food samples. The focus is on the sonification method, which turns data into sounds, allowing listeners to easily discriminate between data collections.

Wow. When I first saw Moorer in the list of presenting authors, I thought ‘what a great coincidence that a presenter has the same last name as one of the great legends in audio engineering. But no, it really is James Moorer. We talked about him before in our blog about the greatest JAES papers of all time. And the abstract for his talk, ‘Audio in the New Millenium – Redux‘, is better than anything I could have written about the paper. He wrote, “In the author’s Heyser lecture in 2000, technological advances from the point of view of digital audio from 1980 to 2000 were summarized then projected 20 years into the future. This paper assesses those projections and comes to the somewhat startling conclusion that entertainment (digital video, digital audio, computer games) has become the driver of technology, displacing military and business forces.”

The paper with the most authors is presented by Lutz Ehrig. And he’ll be presenting a breakthrough, the first ‘Balanced Electrostatic All-Silicon MEMS Speakers’. If you don’t know what that is, you’re not alone. But its worth finding out, because this may be tomorrow’s widespread commercial technology.

If you recorded today, but only using equipment from 1955, would it really sound like a 65 year old recording? Clive Mead will present ‘Composing, Recording and Producing with Historical Equipment and Instrument Models’ which explores just that sort of question. He and his co-authors created and used models to simulate the recording technology and instruments, available at different points in recorded music history.

Degradation effects of water immersion on earbud audio quality,’ presented by Scott Beveridge, sounds at first like it might be very minor work, dipping earbuds in water and then listening to distorted sound from them. But I know a bit about the co-authors. They’re the type to apply rigorous, hardcore science to a problem. And it has practical applications too, since its leading towards methods by which consumers can measure the quality of their earbuds.

Forensic audio is a fascinating field, though most people have only come across it in film and TV shows like CSI, where detectives identify incriminating evidence buried in a very noisy recording. In ‘Forensic Interpretation and Processing of User Generated Audio Recordings’, audio forensics expert Rob Maher looks at how user generated recordings, like when many smartphones record a shooting, can be combined, synchronised and used as evidence.

Mark Waldrep presents a somewhat controversial paper, ‘Native High-Resolution versus Red Book Standard Audio: A Perceptual Discrimination Survey’. He sent out high resolution and CD quality recordings to over 450 participants, asking them to judge which was high resolution. The overall results were little better than guessing. But there were a very large number of questionable decisions in his methodology and interpretation of results. I expect this paper will get the online audiophile community talking for quite some time.

Neural networks are all the rage in machine learning. And for good reason- for many tasks, they outperform all the other methods. There are three neural network papers presented, Tejas Manjunath’s ‘Automatic Classification of Live and Studio Audio Recordings using Convolutional Neural Networks‘, J. T. Colonel’s (who is now part of the team behind this blog) ‘Low Latency Timbre Interpolation and Warping using Autoencoding Neural Networks’ and William Mitchell’s ‘Exploring Quality and Generalizability in Parameterized Neural Audio Effects‘.

The research team here did some unpublished work that seemed to suggest that the mix had only a minimal effect on how people respond to music for untrained listeners, but this became more significant with trained sound engineers and musicians. Kelsey Taylor’s research suggests there’s a lot more to uncover here. In ‘I’m All Ears: What Do Untrained Listeners Perceive in a Raw Mix versus a Refined Mix?’, she performed structured interviews and found that untrained listeners perceive a lot of mixing aspects, but use different terms to describe it.

No loudness measure is perfect. Even the well established ones, like ITU 1770 for broadcast content, or the Glasberg Moore auditory model of loudness perception, see http://www.aes.org/e-lib/browse.cfm?elib=16608 here and http://www.aes.org/e-lib/browse.cfm?elib=17098, have been noted before. In ‘Using ITU-R BS.1770 to Measure the Loudness of Music versus Dialog-based Content’, Scott Norcross shows another issue with the ITU loudness measure, the difficulty in matching levels for speech and music.

Staying on the subject of loudness, Kazuma Watanabe presents ‘The Reality of The Loudness War in Japan -A Case Study on Japanese Popular Music’. This loudness war, the overuse of dynamic range compression, has resulted in lower quality recordings (and annoyingly loud TV and radio ads). It also led to measures like the ITU standard. Watanabe and co-authors measured the increased loudness over the last 30 years, and make a strong

Remember to check the AES E-Library which has all the full papers for all the presentations mentioned here, including listing all authors not just presenters. And feel free to get in touch with us. Josh Reiss (author of this blog entry), J. T. Colonel, and Angeliki Mourgela from the Audio Engineering research team within the Centre for Digital Music, will all be (virtually) there.

Congratulations, Dr. Marco Martinez Ramirez

Today one of our PhD student researchers, Marco Martinez Ramirez, successfully defended his PhD. The form of these exams, or vivas, varies from country to country, and even institution to institution, which we discussed previously. Here, its pretty gruelling; behind closed doors, with two expert examiners probing every aspect of the PhD. And it was made even more challenging since it was all online due to the virus situation.
Marco’s PhD was on ‘Deep learning for audio effects modeling.’

Audio effects modeling is the process of emulating an audio effect unit and seeks to recreate the sound, behaviour and main perceptual features of an analog reference device. Both digital and analog audio effect units  transform characteristics of the sound source. These transformations can be linear or nonlinear, time-invariant or time-varying and with short-term and long-term memory. Most typical audio effect transformations are based on dynamics, such as compression; tone such as distortion; frequency such as equalization; and time such as artificial reverberation or modulation based audio effects.

Simulation of audio processors is normally done by designing mathematical models of these systems. Its very difficult because it seeks to accurately model all components within the effect unit, which usually contains mechanical elements together with nonlinear and time-varying analog electronics. Most audio effects models are either simplified or optimized for a specific circuit or  effect and cannot be efficiently translated to other effects.

Marco’s thesis explored deep learning architectures for audio processing in the context of audio effects modelling. He investigated deep neural networks as black-box modelling strategies to solve this task, i.e. by using only input-output measurements. He proposed several different DSP-informed deep learning models to emulate each type of audio effect transformations.

Marco then explored the performance of these models when modeling various analog audio effects, and analyzed how the given tasks are accomplished and what the models are actually learning. He investigated virtual analog models of nonlinear effects, such as a tube preamplifier; nonlinear effects with memory, such as a transistor-based limiter; and electromechanical nonlinear time-varying effects, such as a Leslie speaker cabinet and plate and spring reverberators.

Marco showed that the proposed deep learning architectures represent an improvement of the state-of-the-art in black-box modeling of audio effects and the respective directions of future work are given.

His research also led to a new start-up company, TONZ, which build on his machine learning techniques to provide new audio processing interactions for the next generation of musicians and music makers.

Here’s a list of some of Marco’s papers that relate to his PhD research while a member of the Intelligent Sound Engineering team.

Congratulations again, Marco!

We want you to take part in a listening test

Hi everyone,

As you may know, we do lots of listening tests (audio evaluation experiments) to evaluate our research, or to gather data about perception and preference that we use in the systems we create.

We would like to invite you to participate in a study on the perception of sound effects.

Participation just requires a computer with audio output, a reliable internet connection and internet browser (definitely works on Chrome, should work on most other browsers). The experiment shouldn’t take more than a half hour at most. You don’t have to be an expert to take part. And I promise it will be interesting!

You can access the experiment by going to http://webprojects.eecs.qmul.ac.uk/ee08m037/WebAudioEvaluationTool/test.html?url=FXiveTest.xml

And if you’re curious, the listening test was created using the Web Audio Evaluation Tool, https://github.com/BrechtDeMan/WebAudioEvaluationTool [1,2].

Please direct any questions to joshua.reiss@qmul.ac.uk , r.selfridge@qmul.ac.uk or h.e.tez@qmul.ac.uk .

Once the experiment is finished, I’ll share the results in another blog entry.

Thanks!

[1] N. Jillings, D. Moffat, B. De Man, J. D. Reiss, R. Stables, ‘Web Audio Evaluation Tool: A framework for subjective assessment of audio,’ 2nd Web Audio Conf., Atlanta, 2016

[2] N. Jillings, B. De Man, D. Moffat and J. D. Reiss, ‘Web Audio Evaluation Tool: A Browser-Based Listening Test Environment,’ Sound and Music Computing (SMC), July 26 – Aug. 1, 2015

AES President-Elect-Elect!

“Anyone who is capable of getting themselves made President should on no account be allowed to do the job.” ― Douglas Adams, The Hitchhiker’s Guide to the Galaxy

So I’m sure you’ve all been waiting for this presidential election to end. No not that one. I’m referring to the Audio Engineering Society (AES)’s recent elections for their Board of Directors and Board of Governors.

And I’m very pleased and honored, that I (that’s Josh Reiss, the main author of this blog) have been elected as President.

Its actually three positions; in 2021 I’ll be President-Elect, 2022 President, and 2023 I’ll be Past-President. Another way to look at it is that the AES always has three presidents, one planning for the future, one getting things done and one imparting their experience and knowledge.

For those who don’t know, the AES is the largest professional society in audio engineering and related fields. It has over 12,000 members, and is the only professional society devoted exclusively to audio technology. It was founded in 1948 and has grown to become an international organisation that unites audio engineers, creative artists, scientist and students worldwide by promoting advances in audio and disseminating new knowledge and research.

My thanks to everyone who voted, to the AES in general, and to everyone who has said congratulations. And a big congratulations to all the other elected officers.

AES 148 – A Digital Vienna

Written jointly by Aggela Mourgela and JT Colonel

#VirtualVienna

The AES hosted its 148th international conference virtually this year. Despite the circumstances we find ourselves in due to covid-19, the conference put up an excellent program filled with informative talks, tutorials, and demonstrations. Below is a round-up of our favourite presentations, which run the gamut from incredibly technical talks regarding finite arithmetic systems to highly creative demonstrations of an augmented reality installation.

Tuesday

The first session on Tuesday morning, Active Sensing and Slow Listening was held by Thomas Lund & Susan E. Rogers, discussing principles of active sensing and slow listening as well as their role in pro audio product development. Lund kicked the session off by introducing the theory behind sound cognition as well as discussing about the afferent & efferent function of the brain with regards to sound perception. The session was then picked up by Rogers, who described the auditory pathway and its bidirectionality in more detail, presenting the parts of the brain engaging in sonic cognition. Rogers touched on the subjects of proprioception, the awareness of our bodies and interoception, the awareness of our feelings as well as the role of expectation when studying our responses to sound. To conclude both presenters pointed out that we should not treat listening as passive or uni-dynamic both external and internal factors influence the way we hear.   

Diagram showing the development of the tympanic ear across different geologic eras discussed in the Active Sensing and Slow Listening demonstration

Later in the day, Jamie Angus presented on Audio Signal Processing in the Real World: Dealing with the Effects of Finite Precision. At the center of the talk was a fundamental question: how does finite precision affect audio processing. Angus went into full detail regarding different finite precision arithemtics, i.e. fractional floating-point, and derived how the noise introduced by these systems impact filter design. 

The 3rd MATLAB Student Design Competition was hosted by Gabriele Bunkheila. Using the example of a stereo width expander, Bunkheila demonstrated the process of turning a simple offline MATLAB script into a real time audioPlugin class, using MATLAB’s inbuilt audio test benching app. He then proceeded to talk about C++ code generation, validation and export of the code into a VST plugin format, for use in a conventional digital audio workstation.  Bunkheila also demonstrated a simple GUI generation using MATLAB’s audioPluginInterface functionality. 

Wednesday

On Wednesday, Thomas Lund and Hyunkook Lee discussed the shift from stereo to immersive multi-channel audio in their talk Goodbye Stereo. First, Lund discussed the basics of spatial perception, the limitations of stereo in audio recording and reproduction, frequency related aspects of spatial audio, and the standards being implemented in immersive audio. Lee went on to discuss the psychoacoustic principles that apply to immersive audio as well as the differences between stereo and 3D. He expanded on limitations arising from microphones due to placement or internal characteristics and proceeded to discuss microphone array configurations that his research group is working on. The presentation was followed by a set of truly impressive immersive recordings, made in various venues with different microphone configurations and the audience was prompted to use headphones to experience them. Lee finished by introducing a 3D recorded database which will include room impulse responses, available for spatial audio research.

In his talk The Secret Life of Low Fequencies, Bruce Black discussed the trials and tribulations of acoustically treating rooms while paying special attention to their low frequency response. Black discussed the particle propagation and wave propagation models of sound transmission, and how they each require different treatments. He called specific attention to how the attenuation across low frequencies of a sound can change over the course of 200-400ms within a room. Black went on to show how Helmholtz resonators can be strategically placed in a space to smooth these uneven attenuations.

Marisa Hoeschele gave a very interesting keynote lecture on Audio from a Biological Perspective. Hoeschele began by discussing the concept of addressing human sounds from the perspective of the ”visiting alien”, where humans are studied as yet another species on the planet. Hoeschele discussed observations on shared emotional information and how we can identify sonic attributes corresponding to stress level/ excitement  across species. She then proceeded to discuss the ways in which we can study musicality as an innate human characteristic, as well as commonalities across cultures. Hoeschele then discussed ways in which other animals can inform us on musicality, by giving examples of experiments on the animals’ ability to respond to musical attributes like octave equivalence, as well as search correlations with human behavior.

Thursday

On Thursday, Brian Gibbs gave a workshop on spatial audio mixing, using a demo mix of Queen’s Bohemian Rhapsody. He began his presentation with a short discussion about the basics of spatial audio, presenting the concept of recording spatial audio from scratch or spatialization of audio recording in the studio. Gibbs also talked about the Ambix and Fuma renderers, while he also discussed higher order ambisonics and the MPEG-H format. He then proceeded to introduce the importance of loudness, giving a brief talk and demonstration of using LUFS in metering. Finally he discussed the importance of being aware of the platform or format that your work is going to end up, emphasizing on different streaming services and devices and their requirements. He ended the workshop with a listening session, where he presented Bohemian Rhapsody mixed alternating between mono, stereo and static 360 audio for his audience. 

Later, Thomas Aichinger presented Immersive Storytelling: Narrative Aspects in AR Audio Applications. Meant to be an AR installation in Vienna to coincide with the conference, Aichinger outlined the development and implementation of “SONIC TRACES,” a piece where participants would navigate an audio-based story set in the Heldenplatz. Aichinger described the difficulties and how his team overcame issues such as GPS tracking of users in the plaza, how to incorporate six degrees of freedom in motion tracking, and tweaking signal attenuation to suit the narrative. 

Thomas Aichinger’s rendering of the Heldenplatz in Unity, which is the game engine he and his team used to construct the AR experience

Friday

On the final day of the conference, Gabriele Bunkheila gave a keynote speech on Deep Learning for Audio Applications – Engineering Best Practices for Data. While deep learning applications are most frequently implemented in Python, Bunkheila made a compelling case for using MATLAB. Bunkheila pointed out the discrepancies between deep learning approaches in academia and in industry: namely, that academia focuses primarily on theorizing and developing new models, whereas industry devotes much more time to dataset construction and scrubbing. Moreover, he mentioned how deep learning for audio should not necessarily follow the best practices laid out by the image recognition community. For example, when applying noise to an audio dataset, one ought to simulate the environment in which one expects to deploy their deep learning system. So, for audio applications it makes much more sense to apply reverb or add speech to obscure your signal rather than Gaussian noise. 

In Closing

Though this conference looked nothing like what anyone expected at the beginning of this year, the AES demonstrated its ability to adapt to new paradigms and technologies. Here’s to hoping that the AES will be able to resume in-person conferences soon. In the meantime, the AES will continue its strong tradition of providing a platform for groundbreaking audio technologies and educational opportunities virtually.

Venturous Views on Virtual Vienna – a preview of AES 148

#VirtualVienna

We try to write a preview of the technical track for almost all recent Audio Engineering Society (AES) Conventions, see our entries on the 142nd, 143rd, 144th, 145th and 147th Conventions. But this 148th Convention is very different.

It is, of course, an online event. The Convention planning committee have put huge effort into putting it all online and making it a really engaging and exciting experience (and in massively reducing costs). There will be a mix of live-streams, break out sessions, interactive chat rooms and so on. But the technical papers will mostly be on-demand viewing, with Q&A and online dialog with the authors. This is great in the sense that you can view it and interact with authors any time, but it means that its easy to overlook really interesting work.

So we’ve gathered together some information about a lot of the presented research that caught our eye as being unusual, exceptionally high quality, or just worth mentioning. And every paper mentioned here will appear soon in the AES E-Library, by the way. Currently though, you can browse all the abstracts by searching the full papers and engineering briefs on the Convention website.

Deep learning and neural networks are all the rage in machine learning nowadays. A few contributions to the field will be presented by Eugenio Donati with ‘Prediction of hearing loss through application of Deep Neural Network’, Simon Plain with ‘Pruning of an Audio Enhancing Deep Generative Neural Network’, Giovanni Pepe’s presentation of ‘Generative Adversarial Networks for Audio Equalization: an evaluation study’, Yiwen Wang presenting ‘Direction of arrival estimation based on transfer function learning using autoencoder network’, and the author of this post, Josh Reiss will present work done mainly by sound designer/researcher Guillermo Peters, ‘A deep learning approach to sound classification for film audio post-production’. Related to this, check out the Workshop on ‘Deep Learning for Audio Applications – Engineering Best Practices for Data’, run by Gabriele Bunkheila of MathWorks (Matlab), which will be live-streamed  on Friday.

There’s enough work being presented on spatial audio that there could be a whole conference on the subject within the convention. A lot of that is in Keynotes, Workshops, Tutorials, and the Heyser Memorial Lecture by Francis Rumsey. But a few papers in the area really stood out for me. Toru Kamekawa’s investigated a big question with ‘Are full-range loudspeakers necessary for the top layer of 3D audio?’ Marcel Nophut’s ‘Multichannel Acoustic Echo Cancellation for Ambisonics-based Immersive Distributed Performances’ has me intrigued because I know a bit about echo cancellation and a bit about ambisonics, but have no idea how to do the former for the latter.

And I’m intrigued by ‘Creating virtual height loudspeakers using VHAP’, presented by Kacper Borzym. I’ve never heard of VHAP, but the original VBAP paper is the most highly cited paper in the Journal of the AES (1367 citations at the time of writing this).

How good are you at understanding speech from native speakers? How about when there’s a lot of noise in the background? Do you think you’re as good as a computer? Gain some insight into related research when viewing the presentation by Eugenio Donati on ‘Comparing speech identification under degraded acoustic conditions between native and non-native English speakers’.

There’s a few papers exploring creative works, all of which look interesting and have great titles. David Poirier-Quinot will present ‘Emily’s World: behind the scenes of a binaural synthesis production’. Music technology has a fascinating history. Michael J. Murphy will explore the beginning of a revolution with ‘Reimagining Robb: The Sound of the World’s First Sample-based Electronic Musical Instrument circa 1927’. And if you’re into Scandinavian instrumental rock music (and who isn’t?), Zachary Bresler’s presentation of ‘Music and Space: A case of live immersive music performance with the Norwegian post-rock band Spurv’ is a must.

robb

Frank Morse Robb, inventor of the first sample-based electronic musical instrument.

But sound creation comes first, and new technologies are emerging to do it. Damian T. Dziwis will present ‘Body-controlled sound field manipulation as a performance practice’. And particularly relevant given the worldwide isolation going on is ‘Quality of Musicians’ Experience in Network Music Performance: A Subjective Evaluation,’ presented by Konstantinos Tsioutas.

Portraiture looks at how to represent or capture the essence and rich details of a person. Maree Sheehan explores how this is achieved sonically, focusing on Maori women, in an intriguing presentation on ‘Audio portraiture sound design- the development and creation of audio portraiture within immersive and binaural audio environments.’

We talked about exciting research on metamaterials for headphones and loudspeakers when giving previews of previous AES Conventions, and there’s another development in this area presented by Sebastien Degraeve in ‘Metamaterial Absorber for Loudspeaker Enclosures’

Paul Ferguson and colleagues look set to break some speed records, but any such feats require careful testing first, as in ‘Trans-Europe Express Audio: testing 1000 mile low-latency uncompressed audio between Edinburgh and Berlin using GPS-derived word clock’

Our own research has focused a lot on intelligent music production, and especially automatic mixing. A novel contribution to the field, and a fresh perspective, is given in Nyssim Lefford’s presentation of ‘Mixing with Intelligent Mixing Systems: Evolving Practices and Lessons from Computer Assisted Design’.

Subjective evaluation, usually in the form of listening tests, is the primary form of testing audio engineering theory and technology. As Feynman said, ‘if it disagrees with experiment, its wrong!’

And thus, there are quite a few top-notch research presentations focused on experiments with listeners. Minh Voong looks at an interesting aspect of bone conduction with ‘Influence of individual HRTF preference on localization accuracy – a comparison between regular and bone conducting headphones. Realistic reverb in games is incredibly challenging because characters are always moving, so Zoran Cvetkovic tackles this with ‘Perceptual Evaluation of Artificial Reverberation Methods for Computer Games.’ The abstract for Lawrence Pardoe’s ‘Investigating user interface preferences for controlling background-foreground balance on connected TVs’ suggests that there’s more than one answer to that preference question. That highlights the need for looking deep into any data, and not just considering the mean and standard deviation, which often leads to Simpson’s Paradox. And finally, Peter Critchell will present ‘A new approach to predicting listener’s preference based on acoustical parameters,’ which addresses the need to accurately simulate and understand listening test results.

There are some talks about really rigorous signal processing approaches. Jens Ahren will present ‘Tutorial on Scaling of the Discrete Fourier Transform and the Implied Physical Units of the Spectra of Time-Discrete Signals.’ I’m excited about this because it may shed some light on a possible explanation for why we hear a difference between CD quality and very high sample rate audio formats.

The Constant-Q Transform represents a signal in frequency domain, but with logarithmically spaced bins. So potentially very useful for audio. The last decade has seen a couple of breakthroughs that may make it far more practical.  I was sitting next to Gino Velasco when he won the “best student paper” award for Velasco et al.’s “Constructing an invertible constant-Q transform with nonstationary Gabor frames.” Schörkhuber and Klapuri also made excellent contributions, mainly around implementing a fast version of the transform, culminating in a JAES paper. and the teams collaborated together on a popular Matlab toolbox. Now there’s another advance with Felix Holzmüller presenting ‘Computational efficient real-time capable constant-Q spectrum analyzer’.

The abstract for Dan Turner’s ‘Content matching for sound generating objects within a visual scene using a computer vision approach’ suggests that it has implications for selection of sound effect samples in immersive sound design. But I’m a big fan of procedural audio, and think this could have even higher potential for sound synthesis and generative audio systems.

And finally, there’s some really interesting talks about innovative ways to conduct audio research based on practical challenges. Nils Meyer-Kahlen presents ‘DIY Modifications for Acoustically Transparent Headphones’. The abstract for Valerian Drack’s ‘A personal, 3D printable compact spherical loudspeaker array’, also mentions its use in a DIY approach. Joan La Roda’s own experience of festival shows led to his presentation of ‘Barrier Effect at Open-air Concerts, Part 1’. Another presentation with deep insights derived from personal experience is Fabio Kaiser’s ‘Working with room acoustics as a sound engineer using active acoustics.’ And the lecturers amongst us will be very interested in Sebastian Duran’s ‘Impact of room acoustics on perceived vocal fatigue of staff-members in Higher-education environments: a pilot study.’

Remember to check the AES E-Library which will soon have all the full papers for all the presentations mentioned here, including listing all authors not just presenters. And feel free to get in touch with us. Josh Reiss (author of this blog entry), J. T. Colonel, and Angeliki Mourgela from the Audio Engineering research team within the Centre for Digital Music, will all be (virtually) there.

Awesome student projects in sound design and audio effects

I teach classes in Sound Design and Digital Audio Effects. In both classes, the final assignment involves creating an original work that involves audio programming and using concepts taught in class. But the students also have a lot of free reign to experiment and explore their own ideas. The results are always great. Lots of really cool ideas, many of which could lead to a publication, or would be great to listen to regardless of the fact that it was an assignment.

The last couple of years, I posted about it here and here.  Here’s a few of the projects this year.

From the Sound Design class;

  • A procedural audio model of a waterfall. The code was small, involving some filtered noise sources with random gain changes, but the result was great.waterfall2
  • An interactive animation of a girl writing at a desk during a storm. There were some really neat tricks to get a realistic thunder sound.
  • A procedurally generated sound scene for a walk through the countryside. The student found lots of clever ways to generate the sounds of birds, bees, a river and the whoosh of a passing car.
  • New sound design replacing the audio track in a film scene. Check it out.

And from the Digital Audio Effects class;

  • I don’t need to mention anything about the next one. Just read the student’s tweet.

 

  • Rainmaker, a VST plugin that takes an incoming signal and transforms it into a ‘rain’ like sound, starting above the listener and then floating down below.

  • A plugin implementation of the Karplus-Strong algorithm, except an audio sample is used to excite the string instead of a noise burst. It gives really interesting timbral qualities.

  • Stormify, an audio plugin that enables users to add varying levels of rain and wind to the background of their audio, making it appear that the recording took place in inclement weather.
  • An all-in-one plugin for synthesising and sculpting drum-like sounds.
  • The Binaural Phase Vocoder, a VST/AU plugin whereby users can position a virtual sound source in a 3D space and process the sound through an overlap-add phase vocoder.
  • A multiband multi-effect consisting of three frequency bands and three effects on each band: delay, distortion, and tremolo. Despite the seeming complexity, the interface was straightforward and easy to use.

multi-interface

There were many other interesting assignments, including several sonifications of images. But this selection really shows both the talent of the students and the possibilities to create new and interesting sounds.

Funded PhD studentships available in Data-informed Audience-centric Media Engineering

So its been a while since I’ve written a blog post. Life, work, and of course, the Covid crisis has made my time limited. But hopefully I’ll write more frequently in future.

The good news is that there are fully funded PhD studentships which you or others you know might be interested in. They are all around the concept of Data-informed Audience-centric Media Engineering (DAME). See https://dame.qmul.ac.uk/ for details.

Three studentships are available. They are all fully-funded, for four years of study, based at Queen Mary University of London, and starting January 2021. Two of the proposed topics, ‘Media engineering for hearing-impaired audiences’ and ‘Intelligent systems for radio drama production’, are supported by BBC and build on prior and ongoing work by my research team.

  • Media engineering for hearing-impaired audiences: This research proposes the exploration of ways in which media content can be automatically processed to deliver the content optimally for audiences with hearing loss. It builds on prior work by our group and the collaborator, BBC, in development of effective audio mixing techniques for broadcast audio enhancement [1,2,3]. It will form a deeper understanding of the effects of hearing loss on media content perception and enjoyment, as well as utilize this knowledge towards the development of intelligent audio production techniques and applications that could improve audio quality by providing efficient and customisable compensation. It aims to advance beyond current research [4], which does not yet fully take into account the artistic intent of the material, and requires an ‘ideal mix’ for normal hearing listeners. So a new approach that both removes constraints and is more focused on the meaning of the content is required. This approach will be derived from natural language processing and audio informatics, to prioritise sources and establish requirements for the preferred mix.
  • Intelligent systems for radio drama production: This research topic proposes methods for assisting a human creator in producing radio dramas. Radio drama consists of both literary aspects, such as plot, story characters, or environments, as well as production aspects, such as speech, music, and sound effects. This project builds on recent, high impact collaboration with BBC [3, 5], to greatly advance the understanding of radio drama production, with the goal of devising and assessing intelligent technologies to aid in its creation. The project will first be concerned with investigating rules-based systems for generating production scripts from story outlines, and producing draft content from such scripts. It will consider existing workflows for content production and where such approaches rely on heavy manual labour. Evaluation will be with expert content producers, with the goal of creating new technologies that streamline workflows and facilitate the creative process.

If you or anyone you know is interested, please look at https://dame.qmul.ac.uk/ . Consider applying and feel free to ask me any questions.

[1] A. Mourgela, T. Agus and J. D. Reiss, “Perceptually Motivated Hearing Loss Simulation for Audio Mixing Reference,” 147th AES Convention, 2019.

[2] Ward, Lauren, et al. “Casualty Accessible and Enhanced (A&E) Audio: Trialling Object-Based Accessible TV Audio.” Audio Engineering Society Convention 147. 2019.

[3] E. T. Chourdakis, L. Ward, M. Paradis and J. D. Reiss, “Modelling Experts’ Decisions on Assigning Narrative Importances of Objects in a Radio Drama Mix,” Digital Audio Effects Conference (DAFx), 2019.

[4] L. Ward and B. Shirley, Personalization in object-based audio for accessibility: a review of advancements for hearing impaired listeners. Journal of the Audio Engineering Society, 67(7/8), 584-597, 2019.

[5] E. T. Chourdakis and J. D. Reiss, ‘From my pen to your ears: automatic production of radio plays from unstructured story text,’ 15th Sound and Music Computing Conference (SMC), Limassol, Cyprus, 4-7 July, 2018

Intelligent Music Production book is published

9781138055193

Ryan Stables is an occasional collaborator and all around brilliant person. He started the annual Workshop on Intelligent Music Production (WIMP) in 2015. Its been going strong ever since, with the 5th WIMP co-located with DAFx, this past September. The workshop series focuses on the application of intelligent systems (including expert systems, machine learning, AI) to music recording, mixing, mastering and related aspects of audio production or sound engineering.

Ryan had the idea for a book about the subject, and myself (Josh Reiss) and Brecht De Man (another all around brilliant person) were recruited as co-authors. What resulted was a massive amount of writing, editing, refining, re-editing and so on. We all contributed big chunks of content, but Brecht pulled it all together and turned it into something really high quality giving a comprehensive overview of the field, suitable for a wide range of audiences.

And the book is finally published today, October 31st! Its part of the AES Presents series by Focal Press, a division of Routledge. You can get it from the publisher, from Amazon or any of the other usual places.

And here’s the official blurb

Intelligent Music Production presents the state of the art in approaches, methodologies and systems from the emerging field of automation in music mixing and mastering. This book collects the relevant works in the domain of innovation in music production, and orders them in a way that outlines the way forward: first, covering our knowledge of the music production processes; then by reviewing the methodologies in classification, data collection and perceptual evaluation; and finally by presenting recent advances on introducing intelligence in audio effects, sound engineering processes and music production interfaces.

Intelligent Music Production is a comprehensive guide, providing an introductory read for beginners, as well as a crucial reference point for experienced researchers, producers, engineers and developers.