The State of the Art in Procedural Audio

I am pleased to announce that we have just published a large review article on procedural audio.

Procedural audio refers to the real-time generation of sounds that can adapt to changing input parameters. In its pure form, no recorded samples are stored, sounds are generated from ‘scratch’. This has huge benefits for game audio and VR, since very little memory is required, and sound generation can be controlled by the game or virtual environment. We have talked about procedural audio a few times before on this blog.

This review article is by Pedro Pestana of the Catholic University of Portugal, and Dimitris Menexopoulos and Josh Reiss, both from Queen Mary University of London and part of the research team behind this blog.

The article, published in the Journal of the Audio Engineering Society is quite a large one attempting to cover the state of the whole field. Its also a bit of an homage to all the great researchers that created it all.

Here’s the article, http://www.aes.org/e-lib/download.cfm/22346.pdf?ID=22346

With associated webpage https://dmenex.github.io/proceduralaudioreview/

And youtube video

Enjoy!

Congratulations Dr. Angeliki Mourgela!

Today one of our PhD student researchers, Angeliki Mourgela, successfully defended her PhD. The form of these exams, or vivas, varies from country to country, and even institution to institution, which we discussed previously. Here, its pretty gruelling; behind closed doors, with two expert examiners probing every aspect of the PhD.
Angeliki’s PhD was on ‘Perceptually Motivated, Intelligent Audio Mixing Approaches for Hearing Loss.’ Aspects of her research have been described in previous blog entries on hearing loss simulation and online listening tests. Angeliki is also a sound engineer and death metal musician.

As the population ages, hearing loss is becoming more and more of a concern. Yet mixing engineers, and sound engineers in general, rarely know how the content that they produce would sound to those listeners with hearing loss. Wouldn’t it be great if they could, with the click of a button, hear in real-time how their mix would sound to listeners with different hearing profiles? And could the content be automatically remixed so that the new mix sounds as close as possible for someone with hearing loss as the original mix sounds for someone with normal hearing? That was the motivation for this research.

Angeliki’s thesis explored perceptually motivated intelligent approaches to audio mixing for listeners with hearing loss, through use of a hearing loss simulator as a referencing tool for manual and automatic audio mixing. She designed a real-time hearing loss simulation and tested for its accuracy and effectiveness through the conduction of listening studies with participants with real and simulated hearing loss. The simulation was used by audio engineering students and professionals, in order to see how engineers might combat the effects of hearing loss while mixing content through the simulation.
The extracted practices were then used to inform intelligent audio production approaches for hearing loss.

Angeliki now works for RoEx Audio, a start-up company based partly on research done here. We discussed RoEx in a previous blog entry

Here’s a video with Angeliki demonstrating an early version of her hearing loss simulator plugin.

The simulator won First Place in the Matlab Student Plugin Competition, at the 149th AES Convention, Oct. 2020. It was also used in an episode of the BBC drama Casualty to let the audience hear the world as heard by a character with severe hearing loss.

And finally, here are a few of her publications

Many thanks also to Angeliki’s collaborators, especially Dr. Trevor Agus, who offered great advice and proposed research directions, Dr. Lorenzo Picinali, who collaborated on some recent evaluation of hearing loss simulators, and Matt Paradis and others from BBC, who supported this work.

Working with the Web Audio API out now- book, source code & videos

I was tempted to call this “What I did during lockdown”. Like many people, when Covid hit, I gave myself a project while working from home and isolated.

The Web Audio API provides a powerful and versatile system for controlling audio on the Web. It allows developers to generate sounds, select sources, add effects, create visualizations and render audio scenes in an immersive environment.

In recent years I had developed a Love / Mild Annoyance (hate is too strong) relationship with the Web Audio API. And I also noticed that there really wasn’t a comprehensive and useful guide to the Web Audio API. Sure, there’s the online specification and plenty of other documentation, but there’s a whole lot that it leaves out, whether one wants to do FIR filtering or one wants the easiest way to record the output of an audio node, for instance. And there’s nothing like a teaching book or a reference that one might keep handy while doing Web Audio programming.

So writing that book, titled Working with the Web Audio API, became my Covid project, and its finally hit the shelves! Its part of the AES Presents book series, and the publisher’s link to the book is;

https://www.routledge.com/Working-with-the-Web-Audio-API/Reiss/p/book/9781032118673

but you can find it through all the usual places to buy books. The accompanying source code is available at;

https://github.com/joshreiss/Working-with-the-Web-Audio-API

I think there’s about 70 source code examples, covering every audio node and every important feature of the API.

And I’ve made about 20 instructional videos covering many aspects in the YouTube channel.

https://tinyurl.com/y3mtauav

I’ll keep improving the github repository and YouTube channel whenever I get a chance. So please, check it out and let me know what you think. 🙂

Adaptive footstep sound effects

Adaptive footsteps plug-in released for Unreal and Unity game engines

From the creeping, ominous footsteps in a horror film to the thud clunk of an armored soldier in an action game, footstep sounds are one of the most widely souht after sound effects in creative content. But to get realistic variation, one needs hundreds of different samples for each character, each foot, each surface, and at different paces. Even then, repetition becomes a problem.

So at nemisindo.com , we’ve developed a procedural model for generating footstep sounds without the use of recorded samples. We’ve released it as the Nemisindo Adaptive Footsteps plug-in for game engines, available in the Unity Asset Store and in the Unreal Marketplace. You can also try it out at https://nemisindo.com/models/footsteps.html . It offers a lot more than standard sample packs libraries: footsteps are generated in real-time, based on intuitive parameters that you can control.

The plugin provides benefits that no other audio plugin does;

  • Customisable: 4 different shoe types, 7 surface types, and controls for pace, step firmness, steadiness, etc.
  • Convenient: Easy to set up, comes with 12 presets to get started in no time.
  • Versatile: Automatic and Manual modes can be added to any element in a game scene.
  • Lightweight: Uses very little less disk space; the entire code takes about the same space as one footstep sample.

In a research paper soon to appear at the 152nd Audio Engineering Society Convention, we tried a different approach. We implemented multilayer neural network architectures for footstep synthesis and compared the results with real recordings and various sound synthesis methods, including Nemisindo’s online implementation. The neural approach is not yet applicable to most sound design problems, since it does not offer parametric control. But the listening test was very useful. It showed that Nemisindo’s procedural approach outperformed all other traditional sound synthesis approaches, and gave us insights that led to further improvements.

Here’s a short video introducing the Unity plugin:

And a video introducing it for Unreal

And a nice tutorial video on how to use it in Unreal

So please check it out. Its a big footstep forward in procedural and adaptive sound design (sorry, couldn’t resist the wordplay 😁).

Submit your research paper to the 152nd AES Convention

The next Audio Engineering Society Convention will be in May, in the Hague, the Netherlands. Its expected to be the first major AES event with an in-person presence (though it has an online component too) since the whole Covid situation began. It will cover the whole field of audio engineering, with workshops, panel discussions, tutorials, keynotes, recording competitions and more. And attendees cover the full range of students, educators, researchers, audiophiles, professional engineers and industry representatives.
I’m always focused on the Technical Program for these events, where lots of new research is published and presented, and I expect this one to be great. Just based on some expected submissions that I know of, there’s sure to be some great papers on sound synthesis, game audio, immersive and spatial audio, higher quality and deeper understanding of audio effects, plenty of machine learning and neural network, novel mixing and mastering tools, and lots of new psychoacoustics research.
And that’s just the ones I’ve heard about!
Its definitely not too late to submit your own work, see the Call for Submissions. The deadline for full paper submissions (Category 1) or abstract + precis submissions (Category 2) is February 15th. And the deadline for abstract-only submissions (Category 3) is March 1st. In all cases, you submit a full paper for the final version if accepted (though for Category 3 this is optional). So the main difference between the 3 categories is the depth of reviewing, from full peer review for initial paper submissions to ‘light touch’ reviewing for an initial abstract submission.
For those who aren’t familiar with it, great research has been, and continues to be, presented at AES conventions. The very first music composition on a digital computer was presented at the 9th AES Convention in 1957. Schroeder’s reverberator first appeared there, the invention of the parametric equalizer was announced and explained there in 1972, Farina’s work on the swept sine technique for room response estimation was unveiled there, and has received over 1365 citations. Other famous firsts from the Technical program include the introduction of Feedback Delay Networks, Gardner’s famous paper on zero delay convolution, now used in almost all fast convolution algorithms, the unveiling of Spatial audio object coding, and the Gerzon-Craven noise shaping theorem, which is at the heart of many A to D and D to A converters.
So please consider submitting your research there, and I hope to see you there too, whether virtually or in person.

AES Presidency – and without a coup attempt

Happy New Year everyone!

As of January 1st, I (that’s Josh Reiss, the main author of this blog) am the president of the Audio Engineering Society. I wrote about being elected to this position before. Its an incredible honour.

For those who don’t know, the Audio Engineering Society (AES) is the largest professional society in audio engineering and related fields, and the only professional society devoted exclusively to audio technology. It has over 10,000 members, hosts conferences, conventions and lots of other events, publishes a renowned journal, develops standards and so much more. It was founded in 1948 and has grown to become an international organisation that unites audio engineers, creative artists, scientist and students worldwide by promoting advances in audio and disseminating new knowledge and research.

Anyway, I expect an exciting and challenging year ahead in this role. And one of my first tasks was to deliver the President’s Message, sort of announcing the start of my term to the AES community and laying down some of my thoughts about it. You can read all about it here.

And looking forward to seeing you all at the next AES Convention in The Hague in May, either in-person or online.

The crack of thunder

Lightning, copyright James Insogna, 2011

The gaming, film and virtual reality industries rely heavily on recorded samples for sound design. This has inherent limitations since the sound is fixed from the point of recording, leading to drawbacks such as repetition, storage, and lack of perceptually relevant controls.

Procedural audio offers a more flexible approach by allowing the parameters of a sound to be altered and sound to be generated from first principles. A natural choice for procedural audio is environmental sounds. They occur widely in creative industries content, and are notoriously difficult to capture. On-location sounds often cannot be used due to recording issues and unwanted background sounds, yet recordings from sample libraries are rarely a good match to an environmental scene.

Thunder in particular, is highly relevant. It provides a sense of the environment and location, but can also be used to supplement the narrative and heighten the tension or foreboding in a scene. There exist a fair number of methods to simulate thunder. But no one’s ever actually sat down and evaluated these models. That’s what we did in,

J. D. Reiss, H. E. Tez, R. Selfridge, ‘A comparative perceptual evaluation of thunder synthesis techniques’, to appear at the 150th Audio Engineering Convention, 2021.

We looked at all the thunder synthesis models we could find, and in the end were able to compare five models and a recording of real thunder in a listening test. And here’s the key result,

This was surprising. None of the methods sound very close to the real thing. It didn’t matter whether it was a physical model, didn’t matter which type of physical modelling approach was used, or whether an entirely signal-based approach was applied. And yet there’s plenty of other sounds where procedural audio can sound indistinguishable from the real thing, see our previous blog post on applause foot .

We also played around with the code. Its clear that the methods could be improved. For instance, they all produced mono sounds (so we used a mono recording for comparison too), the physical models could be much, much faster, and most of the models used very simplistic approximation of lightning. So there’s a really nice PhD topic for someone to work on one day.

Besides showing the limitations of the current models, it also showed the need for better evaluation in sound synthesis research, and the benefits of making code and data available for others. On that note, we put the paper and all the relevant code, data, sound samples etc online at

And you can try out a couple of models at

What we did in 2020 (despite the virus!)

So this is a short year in review for the Intelligent Sound Engineering team. I won’t focus on Covid 19, because that will be the focus of every other year in review. Instead, I’ll just keep it brief with some highlights.

We co-founded two new companies, Tonz and Nemisindo. Tonz relates to some of our deep learning research and Nemisindo to our procedural audio research, though they’ll surely evolve into something greater. Keep an eye out for announcements from both of them.

I (Josh Reiss) was elected president of the Audio Engineering Society. Its a great honour. But I become President-Elect, Jan 1st 2021, and then President Jan 1st 2022, so its a slow transition into the role. I also gave a keynote at the 8th China Conference on Sound and Music Technology.

Angeliki Mourgela’s hearing loss simulator won the Gold Prize (first place) in the Matlab VST plugin competition. This work was also used to present sounds as heard by a character with hearing loss in the BBC drama Casualty.

J. T. Colonel and Christian Steinmetz gave an invited talk at the AES Virtual Symposium: Applications of Machine Learning in Audio

We are continuing collaboration with Yamaha, started new grants with support from InnovateUK (Hookslam), industry and EPSRC, and others. There’s others in various stages of submission, review or finalising acceptance, so hopefully I can make proper announcements about them soon.

Christian Steinmetz and Ilias Ibnyahya started their PhDs with the team. Emmanouil Chourdakis, Alessia Milo and Marco Martinez completed their PhDs . Lauren Edlin, Angeliki Mourgela, J. T. Colonel and Marco Comunita are all progressing well through various stages of the PhD. Johan Pauwels and Hazar Tez are doing great work in Postdoc positions, and Jack Walters and Luke Brosnahan are working wonders while interning with our spin-out companies. I’m sure I’ve left a few people out.

So, though the virus situation meant a lot of things were put on pause or fizzled out, we actually accomplished quite a lot in 2020.

And finally, here’s our research publications this past year;

Research highlights for the AES Show Fall 2020

AES_FallShow2020_logo_x

#AESShow

We try to write a preview of the technical track for almost all recent Audio Engineering Society (AES) Conventions, see our entries on the 142nd, 143rd, 144th, 145th147th and 148th Conventions. Like the 148th Convention, the 149th convention, or just the AES Show, is an online event. But one challenge with these sorts of online events is that anything not on the main live stream can get overlooked. The technical papers are available on demand. So though many people can access them, perhaps more than would attend the presentation in person if possible. But they don’t have the feel of an event.

Hopefully, I can give you some idea of the exciting nature of these technical papers. And they really do present a lot of cutting edge and adventurous research. They unveil, for the first time some breakthrough technologies, and both surprising and significant advances in our understanding of audio engineering and related fields.

This time, since all the research papers are available throughout the Convention and beyond, starting Oct. 28th, I haven’t organised them by date. Instead, I’ve divided them into the regular technical papers (usually longer, with more reviewing), and the Engineering Briefs, or E-briefs. The E-briefs are typically smaller, often presenting work-in-progress, late-breaking or just unusual research. Though this time, the unusual appears in the regular papers too.

But first… listening tests. Sooner or later, almost every researcher has to do them. And a good software package will help the whole process run easier. There are two packages presented at the convention. Dale Johnson will present the next generation of a high quality one in the E-Brief ‘HULTI-GEN Version 2 – A Max-based universal listening test framework’. And Stefan Gorzynski will present the paper ‘A flexible software tool for perceptual evaluation of audio material and VR environments’.

E-Briefs

A must for audio educators is Brett Leonard’s ‘A Survey of Current Music Technology & Recording Arts Curriculum Order’. These sorts of programs are often ‘made up’ based on the experience and knowledge of the people involved. Brecht surveyed 35 institutions and analysed the results to establish a holistic framework for the structure of these degree programmes.

The idea of time-stretching as a live phenomenon might seem counterintuitive. For instance, how can you speed up a signal if its only just arriving? And if you slow it down, then surely after a while it lags far enough behind that it is no longer ‘live’. A novel solution is explored in Colin Malloy’s ‘An approach for implementing time-stretching as a live realtime audio effect

The wonderfully titled ‘A Terribly Good Speaker: Understanding the Yamaha NS-10 Phenomenon,’ is all about how and why a low quality loudspeaker with bad reviews became seen as a ‘must have’ amongst many audio professionals. It looks like this presentation will have lessons for those who study marketing, business trends and consumer psychology in almost any sector, not just audio.

Just how good are musicians at tuning their instruments? Not very good, it seems. Or at least, that was what was found out in ‘Evaluating the accuracy of musicians and sound engineers in performing a common drum tuning exercise’, presented by Rob Toulson. But before you start with your favourite drummer joke, note that the participants were all experienced musicians or sound engineers, but not exclusively drummers. So it might be that everyone is bad at drum tuning, whether they’re used to carrying drumsticks around or not.

Matt Cheshire’s ‘Snare Drum Data Set (SDDS): More snare drums than you can shake a stick at’ is worth mentioning just for the title.

Champ Darabundit will present some interesting work on ‘Generalized Digital Second Order Systems Beyond Nyquist Frequency’, showing that the basic filter designs can be tuned to do a lot more than just what is covered in the textbooks. Its interesting and good work, but I have a minor issue with it. The paper only has one reference that isn’t a general overview or tutorial. But there’s lots of good, relevant related work, out there.

I’m involved in only one paper at this convention (shame!). But its well worth checking out. Angeliki Mourgela is presenting ‘Investigation of a Real-Time Hearing Loss Simulation for Audio Production’. It builds on an initial hearing loss simulator she presented at the 147th Convention, but now its higher quality, real-time and available as a VST plugin. This means that audio producers can easily preview what their content would sound like to most listeners with hearing loss.

Masking is an important and very interesting auditory phenomenon. With the emergence of immersive sound, there’s more and more research about spatial masking. But questions come up, like whether artificially panning a source to a location will result in masking the same way as actually placing a source at that location. ‘Spatial auditory masking caused by phantom sound images’, presented by Masayuki Nishiguchi, will show how spatial auditory masking works when sources are placed at virtual locations using rendering techniques.

Technical papers

There’s a double bill presented by Hsein Pew, ‘Sonification of Spectroscopic analysis of food data using FM Synthesis’ and ‘A Sonification Algorithm for Subjective Classification of Food Samples.’ They are unusual papers, but not reallly about classifying food samples. The focus is on the sonification method, which turns data into sounds, allowing listeners to easily discriminate between data collections.

Wow. When I first saw Moorer in the list of presenting authors, I thought ‘what a great coincidence that a presenter has the same last name as one of the great legends in audio engineering. But no, it really is James Moorer. We talked about him before in our blog about the greatest JAES papers of all time. And the abstract for his talk, ‘Audio in the New Millenium – Redux‘, is better than anything I could have written about the paper. He wrote, “In the author’s Heyser lecture in 2000, technological advances from the point of view of digital audio from 1980 to 2000 were summarized then projected 20 years into the future. This paper assesses those projections and comes to the somewhat startling conclusion that entertainment (digital video, digital audio, computer games) has become the driver of technology, displacing military and business forces.”

The paper with the most authors is presented by Lutz Ehrig. And he’ll be presenting a breakthrough, the first ‘Balanced Electrostatic All-Silicon MEMS Speakers’. If you don’t know what that is, you’re not alone. But its worth finding out, because this may be tomorrow’s widespread commercial technology.

If you recorded today, but only using equipment from 1955, would it really sound like a 65 year old recording? Clive Mead will present ‘Composing, Recording and Producing with Historical Equipment and Instrument Models’ which explores just that sort of question. He and his co-authors created and used models to simulate the recording technology and instruments, available at different points in recorded music history.

Degradation effects of water immersion on earbud audio quality,’ presented by Scott Beveridge, sounds at first like it might be very minor work, dipping earbuds in water and then listening to distorted sound from them. But I know a bit about the co-authors. They’re the type to apply rigorous, hardcore science to a problem. And it has practical applications too, since its leading towards methods by which consumers can measure the quality of their earbuds.

Forensic audio is a fascinating field, though most people have only come across it in film and TV shows like CSI, where detectives identify incriminating evidence buried in a very noisy recording. In ‘Forensic Interpretation and Processing of User Generated Audio Recordings’, audio forensics expert Rob Maher looks at how user generated recordings, like when many smartphones record a shooting, can be combined, synchronised and used as evidence.

Mark Waldrep presents a somewhat controversial paper, ‘Native High-Resolution versus Red Book Standard Audio: A Perceptual Discrimination Survey’. He sent out high resolution and CD quality recordings to over 450 participants, asking them to judge which was high resolution. The overall results were little better than guessing. But there were a very large number of questionable decisions in his methodology and interpretation of results. I expect this paper will get the online audiophile community talking for quite some time.

Neural networks are all the rage in machine learning. And for good reason- for many tasks, they outperform all the other methods. There are three neural network papers presented, Tejas Manjunath’s ‘Automatic Classification of Live and Studio Audio Recordings using Convolutional Neural Networks‘, J. T. Colonel’s (who is now part of the team behind this blog) ‘Low Latency Timbre Interpolation and Warping using Autoencoding Neural Networks’ and William Mitchell’s ‘Exploring Quality and Generalizability in Parameterized Neural Audio Effects‘.

The research team here did some unpublished work that seemed to suggest that the mix had only a minimal effect on how people respond to music for untrained listeners, but this became more significant with trained sound engineers and musicians. Kelsey Taylor’s research suggests there’s a lot more to uncover here. In ‘I’m All Ears: What Do Untrained Listeners Perceive in a Raw Mix versus a Refined Mix?’, she performed structured interviews and found that untrained listeners perceive a lot of mixing aspects, but use different terms to describe it.

No loudness measure is perfect. Even the well established ones, like ITU 1770 for broadcast content, or the Glasberg Moore auditory model of loudness perception, see http://www.aes.org/e-lib/browse.cfm?elib=16608 here and http://www.aes.org/e-lib/browse.cfm?elib=17098, have been noted before. In ‘Using ITU-R BS.1770 to Measure the Loudness of Music versus Dialog-based Content’, Scott Norcross shows another issue with the ITU loudness measure, the difficulty in matching levels for speech and music.

Staying on the subject of loudness, Kazuma Watanabe presents ‘The Reality of The Loudness War in Japan -A Case Study on Japanese Popular Music’. This loudness war, the overuse of dynamic range compression, has resulted in lower quality recordings (and annoyingly loud TV and radio ads). It also led to measures like the ITU standard. Watanabe and co-authors measured the increased loudness over the last 30 years, and make a strong

Remember to check the AES E-Library which has all the full papers for all the presentations mentioned here, including listing all authors not just presenters. And feel free to get in touch with us. Josh Reiss (author of this blog entry), J. T. Colonel, and Angeliki Mourgela from the Audio Engineering research team within the Centre for Digital Music, will all be (virtually) there.

AES President-Elect-Elect!

“Anyone who is capable of getting themselves made President should on no account be allowed to do the job.” ― Douglas Adams, The Hitchhiker’s Guide to the Galaxy

So I’m sure you’ve all been waiting for this presidential election to end. No not that one. I’m referring to the Audio Engineering Society (AES)’s recent elections for their Board of Directors and Board of Governors.

And I’m very pleased and honored, that I (that’s Josh Reiss, the main author of this blog) have been elected as President.

Its actually three positions; in 2021 I’ll be President-Elect, 2022 President, and 2023 I’ll be Past-President. Another way to look at it is that the AES always has three presidents, one planning for the future, one getting things done and one imparting their experience and knowledge.

For those who don’t know, the AES is the largest professional society in audio engineering and related fields. It has over 12,000 members, and is the only professional society devoted exclusively to audio technology. It was founded in 1948 and has grown to become an international organisation that unites audio engineers, creative artists, scientist and students worldwide by promoting advances in audio and disseminating new knowledge and research.

My thanks to everyone who voted, to the AES in general, and to everyone who has said congratulations. And a big congratulations to all the other elected officers.