AES 148 – A Digital Vienna

Written jointly by Aggela Mourgela and JT Colonel

#VirtualVienna

The AES hosted its 148th international conference virtually this year. Despite the circumstances we find ourselves in due to covid-19, the conference put up an excellent program filled with informative talks, tutorials, and demonstrations. Below is a round-up of our favourite presentations, which run the gamut from incredibly technical talks regarding finite arithmetic systems to highly creative demonstrations of an augmented reality installation.

Tuesday

The first session on Tuesday morning, Active Sensing and Slow Listening was held by Thomas Lund & Susan E. Rogers, discussing principles of active sensing and slow listening as well as their role in pro audio product development. Lund kicked the session off by introducing the theory behind sound cognition as well as discussing about the afferent & efferent function of the brain with regards to sound perception. The session was then picked up by Rogers, who described the auditory pathway and its bidirectionality in more detail, presenting the parts of the brain engaging in sonic cognition. Rogers touched on the subjects of proprioception, the awareness of our bodies and interoception, the awareness of our feelings as well as the role of expectation when studying our responses to sound. To conclude both presenters pointed out that we should not treat listening as passive or uni-dynamic both external and internal factors influence the way we hear.   

Diagram showing the development of the tympanic ear across different geologic eras discussed in the Active Sensing and Slow Listening demonstration

Later in the day, Jamie Angus presented on Audio Signal Processing in the Real World: Dealing with the Effects of Finite Precision. At the center of the talk was a fundamental question: how does finite precision affect audio processing. Angus went into full detail regarding different finite precision arithemtics, i.e. fractional floating-point, and derived how the noise introduced by these systems impact filter design. 

The 3rd MATLAB Student Design Competition was hosted by Gabriele Bunkheila. Using the example of a stereo width expander, Bunkheila demonstrated the process of turning a simple offline MATLAB script into a real time audioPlugin class, using MATLAB’s inbuilt audio test benching app. He then proceeded to talk about C++ code generation, validation and export of the code into a VST plugin format, for use in a conventional digital audio workstation.  Bunkheila also demonstrated a simple GUI generation using MATLAB’s audioPluginInterface functionality. 

Wednesday

On Wednesday, Thomas Lund and Hyunkook Lee discussed the shift from stereo to immersive multi-channel audio in their talk Goodbye Stereo. First, Lund discussed the basics of spatial perception, the limitations of stereo in audio recording and reproduction, frequency related aspects of spatial audio, and the standards being implemented in immersive audio. Lee went on to discuss the psychoacoustic principles that apply to immersive audio as well as the differences between stereo and 3D. He expanded on limitations arising from microphones due to placement or internal characteristics and proceeded to discuss microphone array configurations that his research group is working on. The presentation was followed by a set of truly impressive immersive recordings, made in various venues with different microphone configurations and the audience was prompted to use headphones to experience them. Lee finished by introducing a 3D recorded database which will include room impulse responses, available for spatial audio research.

In his talk The Secret Life of Low Fequencies, Bruce Black discussed the trials and tribulations of acoustically treating rooms while paying special attention to their low frequency response. Black discussed the particle propagation and wave propagation models of sound transmission, and how they each require different treatments. He called specific attention to how the attenuation across low frequencies of a sound can change over the course of 200-400ms within a room. Black went on to show how Helmholtz resonators can be strategically placed in a space to smooth these uneven attenuations.

Marisa Hoeschele gave a very interesting keynote lecture on Audio from a Biological Perspective. Hoeschele began by discussing the concept of addressing human sounds from the perspective of the ”visiting alien”, where humans are studied as yet another species on the planet. Hoeschele discussed observations on shared emotional information and how we can identify sonic attributes corresponding to stress level/ excitement  across species. She then proceeded to discuss the ways in which we can study musicality as an innate human characteristic, as well as commonalities across cultures. Hoeschele then discussed ways in which other animals can inform us on musicality, by giving examples of experiments on the animals’ ability to respond to musical attributes like octave equivalence, as well as search correlations with human behavior.

Thursday

On Thursday, Brian Gibbs gave a workshop on spatial audio mixing, using a demo mix of Queen’s Bohemian Rhapsody. He began his presentation with a short discussion about the basics of spatial audio, presenting the concept of recording spatial audio from scratch or spatialization of audio recording in the studio. Gibbs also talked about the Ambix and Fuma renderers, while he also discussed higher order ambisonics and the MPEG-H format. He then proceeded to introduce the importance of loudness, giving a brief talk and demonstration of using LUFS in metering. Finally he discussed the importance of being aware of the platform or format that your work is going to end up, emphasizing on different streaming services and devices and their requirements. He ended the workshop with a listening session, where he presented Bohemian Rhapsody mixed alternating between mono, stereo and static 360 audio for his audience. 

Later, Thomas Aichinger presented Immersive Storytelling: Narrative Aspects in AR Audio Applications. Meant to be an AR installation in Vienna to coincide with the conference, Aichinger outlined the development and implementation of “SONIC TRACES,” a piece where participants would navigate an audio-based story set in the Heldenplatz. Aichinger described the difficulties and how his team overcame issues such as GPS tracking of users in the plaza, how to incorporate six degrees of freedom in motion tracking, and tweaking signal attenuation to suit the narrative. 

Thomas Aichinger’s rendering of the Heldenplatz in Unity, which is the game engine he and his team used to construct the AR experience

Friday

On the final day of the conference, Gabriele Bunkheila gave a keynote speech on Deep Learning for Audio Applications – Engineering Best Practices for Data. While deep learning applications are most frequently implemented in Python, Bunkheila made a compelling case for using MATLAB. Bunkheila pointed out the discrepancies between deep learning approaches in academia and in industry: namely, that academia focuses primarily on theorizing and developing new models, whereas industry devotes much more time to dataset construction and scrubbing. Moreover, he mentioned how deep learning for audio should not necessarily follow the best practices laid out by the image recognition community. For example, when applying noise to an audio dataset, one ought to simulate the environment in which one expects to deploy their deep learning system. So, for audio applications it makes much more sense to apply reverb or add speech to obscure your signal rather than Gaussian noise. 

In Closing

Though this conference looked nothing like what anyone expected at the beginning of this year, the AES demonstrated its ability to adapt to new paradigms and technologies. Here’s to hoping that the AES will be able to resume in-person conferences soon. In the meantime, the AES will continue its strong tradition of providing a platform for groundbreaking audio technologies and educational opportunities virtually.

Venturous Views on Virtual Vienna – a preview of AES 148

#VirtualVienna

We try to write a preview of the technical track for almost all recent Audio Engineering Society (AES) Conventions, see our entries on the 142nd, 143rd, 144th, 145th and 147th Conventions. But this 148th Convention is very different.

It is, of course, an online event. The Convention planning committee have put huge effort into putting it all online and making it a really engaging and exciting experience (and in massively reducing costs). There will be a mix of live-streams, break out sessions, interactive chat rooms and so on. But the technical papers will mostly be on-demand viewing, with Q&A and online dialog with the authors. This is great in the sense that you can view it and interact with authors any time, but it means that its easy to overlook really interesting work.

So we’ve gathered together some information about a lot of the presented research that caught our eye as being unusual, exceptionally high quality, or just worth mentioning. And every paper mentioned here will appear soon in the AES E-Library, by the way. Currently though, you can browse all the abstracts by searching the full papers and engineering briefs on the Convention website.

Deep learning and neural networks are all the rage in machine learning nowadays. A few contributions to the field will be presented by Eugenio Donati with ‘Prediction of hearing loss through application of Deep Neural Network’, Simon Plain with ‘Pruning of an Audio Enhancing Deep Generative Neural Network’, Giovanni Pepe’s presentation of ‘Generative Adversarial Networks for Audio Equalization: an evaluation study’, Yiwen Wang presenting ‘Direction of arrival estimation based on transfer function learning using autoencoder network’, and the author of this post, Josh Reiss will present work done mainly by sound designer/researcher Guillermo Peters, ‘A deep learning approach to sound classification for film audio post-production’. Related to this, check out the Workshop on ‘Deep Learning for Audio Applications – Engineering Best Practices for Data’, run by Gabriele Bunkheila of MathWorks (Matlab), which will be live-streamed  on Friday.

There’s enough work being presented on spatial audio that there could be a whole conference on the subject within the convention. A lot of that is in Keynotes, Workshops, Tutorials, and the Heyser Memorial Lecture by Francis Rumsey. But a few papers in the area really stood out for me. Toru Kamekawa’s investigated a big question with ‘Are full-range loudspeakers necessary for the top layer of 3D audio?’ Marcel Nophut’s ‘Multichannel Acoustic Echo Cancellation for Ambisonics-based Immersive Distributed Performances’ has me intrigued because I know a bit about echo cancellation and a bit about ambisonics, but have no idea how to do the former for the latter.

And I’m intrigued by ‘Creating virtual height loudspeakers using VHAP’, presented by Kacper Borzym. I’ve never heard of VHAP, but the original VBAP paper is the most highly cited paper in the Journal of the AES (1367 citations at the time of writing this).

How good are you at understanding speech from native speakers? How about when there’s a lot of noise in the background? Do you think you’re as good as a computer? Gain some insight into related research when viewing the presentation by Eugenio Donati on ‘Comparing speech identification under degraded acoustic conditions between native and non-native English speakers’.

There’s a few papers exploring creative works, all of which look interesting and have great titles. David Poirier-Quinot will present ‘Emily’s World: behind the scenes of a binaural synthesis production’. Music technology has a fascinating history. Michael J. Murphy will explore the beginning of a revolution with ‘Reimagining Robb: The Sound of the World’s First Sample-based Electronic Musical Instrument circa 1927’. And if you’re into Scandinavian instrumental rock music (and who isn’t?), Zachary Bresler’s presentation of ‘Music and Space: A case of live immersive music performance with the Norwegian post-rock band Spurv’ is a must.

robb

Frank Morse Robb, inventor of the first sample-based electronic musical instrument.

But sound creation comes first, and new technologies are emerging to do it. Damian T. Dziwis will present ‘Body-controlled sound field manipulation as a performance practice’. And particularly relevant given the worldwide isolation going on is ‘Quality of Musicians’ Experience in Network Music Performance: A Subjective Evaluation,’ presented by Konstantinos Tsioutas.

Portraiture looks at how to represent or capture the essence and rich details of a person. Maree Sheehan explores how this is achieved sonically, focusing on Maori women, in an intriguing presentation on ‘Audio portraiture sound design- the development and creation of audio portraiture within immersive and binaural audio environments.’

We talked about exciting research on metamaterials for headphones and loudspeakers when giving previews of previous AES Conventions, and there’s another development in this area presented by Sebastien Degraeve in ‘Metamaterial Absorber for Loudspeaker Enclosures’

Paul Ferguson and colleagues look set to break some speed records, but any such feats require careful testing first, as in ‘Trans-Europe Express Audio: testing 1000 mile low-latency uncompressed audio between Edinburgh and Berlin using GPS-derived word clock’

Our own research has focused a lot on intelligent music production, and especially automatic mixing. A novel contribution to the field, and a fresh perspective, is given in Nyssim Lefford’s presentation of ‘Mixing with Intelligent Mixing Systems: Evolving Practices and Lessons from Computer Assisted Design’.

Subjective evaluation, usually in the form of listening tests, is the primary form of testing audio engineering theory and technology. As Feynman said, ‘if it disagrees with experiment, its wrong!’

And thus, there are quite a few top-notch research presentations focused on experiments with listeners. Minh Voong looks at an interesting aspect of bone conduction with ‘Influence of individual HRTF preference on localization accuracy – a comparison between regular and bone conducting headphones. Realistic reverb in games is incredibly challenging because characters are always moving, so Zoran Cvetkovic tackles this with ‘Perceptual Evaluation of Artificial Reverberation Methods for Computer Games.’ The abstract for Lawrence Pardoe’s ‘Investigating user interface preferences for controlling background-foreground balance on connected TVs’ suggests that there’s more than one answer to that preference question. That highlights the need for looking deep into any data, and not just considering the mean and standard deviation, which often leads to Simpson’s Paradox. And finally, Peter Critchell will present ‘A new approach to predicting listener’s preference based on acoustical parameters,’ which addresses the need to accurately simulate and understand listening test results.

There are some talks about really rigorous signal processing approaches. Jens Ahren will present ‘Tutorial on Scaling of the Discrete Fourier Transform and the Implied Physical Units of the Spectra of Time-Discrete Signals.’ I’m excited about this because it may shed some light on a possible explanation for why we hear a difference between CD quality and very high sample rate audio formats.

The Constant-Q Transform represents a signal in frequency domain, but with logarithmically spaced bins. So potentially very useful for audio. The last decade has seen a couple of breakthroughs that may make it far more practical.  I was sitting next to Gino Velasco when he won the “best student paper” award for Velasco et al.’s “Constructing an invertible constant-Q transform with nonstationary Gabor frames.” Schörkhuber and Klapuri also made excellent contributions, mainly around implementing a fast version of the transform, culminating in a JAES paper. and the teams collaborated together on a popular Matlab toolbox. Now there’s another advance with Felix Holzmüller presenting ‘Computational efficient real-time capable constant-Q spectrum analyzer’.

The abstract for Dan Turner’s ‘Content matching for sound generating objects within a visual scene using a computer vision approach’ suggests that it has implications for selection of sound effect samples in immersive sound design. But I’m a big fan of procedural audio, and think this could have even higher potential for sound synthesis and generative audio systems.

And finally, there’s some really interesting talks about innovative ways to conduct audio research based on practical challenges. Nils Meyer-Kahlen presents ‘DIY Modifications for Acoustically Transparent Headphones’. The abstract for Valerian Drack’s ‘A personal, 3D printable compact spherical loudspeaker array’, also mentions its use in a DIY approach. Joan La Roda’s own experience of festival shows led to his presentation of ‘Barrier Effect at Open-air Concerts, Part 1’. Another presentation with deep insights derived from personal experience is Fabio Kaiser’s ‘Working with room acoustics as a sound engineer using active acoustics.’ And the lecturers amongst us will be very interested in Sebastian Duran’s ‘Impact of room acoustics on perceived vocal fatigue of staff-members in Higher-education environments: a pilot study.’

Remember to check the AES E-Library which will soon have all the full papers for all the presentations mentioned here, including listing all authors not just presenters. And feel free to get in touch with us. Josh Reiss (author of this blog entry), J. T. Colonel, and Angeliki Mourgela from the Audio Engineering research team within the Centre for Digital Music, will all be (virtually) there.

Awesome student projects in sound design and audio effects

I teach classes in Sound Design and Digital Audio Effects. In both classes, the final assignment involves creating an original work that involves audio programming and using concepts taught in class. But the students also have a lot of free reign to experiment and explore their own ideas. The results are always great. Lots of really cool ideas, many of which could lead to a publication, or would be great to listen to regardless of the fact that it was an assignment.

The last couple of years, I posted about it here and here.  Here’s a few of the projects this year.

From the Sound Design class;

  • A procedural audio model of a waterfall. The code was small, involving some filtered noise sources with random gain changes, but the result was great.waterfall2
  • An interactive animation of a girl writing at a desk during a storm. There were some really neat tricks to get a realistic thunder sound.
  • A procedurally generated sound scene for a walk through the countryside. The student found lots of clever ways to generate the sounds of birds, bees, a river and the whoosh of a passing car.
  • New sound design replacing the audio track in a film scene. Check it out.

And from the Digital Audio Effects class;

  • I don’t need to mention anything about the next one. Just read the student’s tweet.

 

  • Rainmaker, a VST plugin that takes an incoming signal and transforms it into a ‘rain’ like sound, starting above the listener and then floating down below.

  • A plugin implementation of the Karplus-Strong algorithm, except an audio sample is used to excite the string instead of a noise burst. It gives really interesting timbral qualities.

  • Stormify, an audio plugin that enables users to add varying levels of rain and wind to the background of their audio, making it appear that the recording took place in inclement weather.
  • An all-in-one plugin for synthesising and sculpting drum-like sounds.
  • The Binaural Phase Vocoder, a VST/AU plugin whereby users can position a virtual sound source in a 3D space and process the sound through an overlap-add phase vocoder.
  • A multiband multi-effect consisting of three frequency bands and three effects on each band: delay, distortion, and tremolo. Despite the seeming complexity, the interface was straightforward and easy to use.

multi-interface

There were many other interesting assignments, including several sonifications of images. But this selection really shows both the talent of the students and the possibilities to create new and interesting sounds.

Funded PhD studentships available in Data-informed Audience-centric Media Engineering

So its been a while since I’ve written a blog post. Life, work, and of course, the Covid crisis has made my time limited. But hopefully I’ll write more frequently in future.

The good news is that there are fully funded PhD studentships which you or others you know might be interested in. They are all around the concept of Data-informed Audience-centric Media Engineering (DAME). See https://dame.qmul.ac.uk/ for details.

Three studentships are available. They are all fully-funded, for four years of study, based at Queen Mary University of London, and starting January 2021. Two of the proposed topics, ‘Media engineering for hearing-impaired audiences’ and ‘Intelligent systems for radio drama production’, are supported by BBC and build on prior and ongoing work by my research team.

  • Media engineering for hearing-impaired audiences: This research proposes the exploration of ways in which media content can be automatically processed to deliver the content optimally for audiences with hearing loss. It builds on prior work by our group and the collaborator, BBC, in development of effective audio mixing techniques for broadcast audio enhancement [1,2,3]. It will form a deeper understanding of the effects of hearing loss on media content perception and enjoyment, as well as utilize this knowledge towards the development of intelligent audio production techniques and applications that could improve audio quality by providing efficient and customisable compensation. It aims to advance beyond current research [4], which does not yet fully take into account the artistic intent of the material, and requires an ‘ideal mix’ for normal hearing listeners. So a new approach that both removes constraints and is more focused on the meaning of the content is required. This approach will be derived from natural language processing and audio informatics, to prioritise sources and establish requirements for the preferred mix.
  • Intelligent systems for radio drama production: This research topic proposes methods for assisting a human creator in producing radio dramas. Radio drama consists of both literary aspects, such as plot, story characters, or environments, as well as production aspects, such as speech, music, and sound effects. This project builds on recent, high impact collaboration with BBC [3, 5], to greatly advance the understanding of radio drama production, with the goal of devising and assessing intelligent technologies to aid in its creation. The project will first be concerned with investigating rules-based systems for generating production scripts from story outlines, and producing draft content from such scripts. It will consider existing workflows for content production and where such approaches rely on heavy manual labour. Evaluation will be with expert content producers, with the goal of creating new technologies that streamline workflows and facilitate the creative process.

If you or anyone you know is interested, please look at https://dame.qmul.ac.uk/ . Consider applying and feel free to ask me any questions.

[1] A. Mourgela, T. Agus and J. D. Reiss, “Perceptually Motivated Hearing Loss Simulation for Audio Mixing Reference,” 147th AES Convention, 2019.

[2] Ward, Lauren, et al. “Casualty Accessible and Enhanced (A&E) Audio: Trialling Object-Based Accessible TV Audio.” Audio Engineering Society Convention 147. 2019.

[3] E. T. Chourdakis, L. Ward, M. Paradis and J. D. Reiss, “Modelling Experts’ Decisions on Assigning Narrative Importances of Objects in a Radio Drama Mix,” Digital Audio Effects Conference (DAFx), 2019.

[4] L. Ward and B. Shirley, Personalization in object-based audio for accessibility: a review of advancements for hearing impaired listeners. Journal of the Audio Engineering Society, 67(7/8), 584-597, 2019.

[5] E. T. Chourdakis and J. D. Reiss, ‘From my pen to your ears: automatic production of radio plays from unstructured story text,’ 15th Sound and Music Computing Conference (SMC), Limassol, Cyprus, 4-7 July, 2018

Intelligent Music Production book is published

9781138055193

Ryan Stables is an occasional collaborator and all around brilliant person. He started the annual Workshop on Intelligent Music Production (WIMP) in 2015. Its been going strong ever since, with the 5th WIMP co-located with DAFx, this past September. The workshop series focuses on the application of intelligent systems (including expert systems, machine learning, AI) to music recording, mixing, mastering and related aspects of audio production or sound engineering.

Ryan had the idea for a book about the subject, and myself (Josh Reiss) and Brecht De Man (another all around brilliant person) were recruited as co-authors. What resulted was a massive amount of writing, editing, refining, re-editing and so on. We all contributed big chunks of content, but Brecht pulled it all together and turned it into something really high quality giving a comprehensive overview of the field, suitable for a wide range of audiences.

And the book is finally published today, October 31st! Its part of the AES Presents series by Focal Press, a division of Routledge. You can get it from the publisher, from Amazon or any of the other usual places.

And here’s the official blurb

Intelligent Music Production presents the state of the art in approaches, methodologies and systems from the emerging field of automation in music mixing and mastering. This book collects the relevant works in the domain of innovation in music production, and orders them in a way that outlines the way forward: first, covering our knowledge of the music production processes; then by reviewing the methodologies in classification, data collection and perceptual evaluation; and finally by presenting recent advances on introducing intelligence in audio effects, sound engineering processes and music production interfaces.

Intelligent Music Production is a comprehensive guide, providing an introductory read for beginners, as well as a crucial reference point for experienced researchers, producers, engineers and developers.

 

Fellow of the Audio Engineering Society

The Audio Engineering Society’s Fellowship Award is given to ‘a member who had rendered conspicuous service or is recognized to have made a valuable contribution to the advancement in or dissemination of knowledge of audio engineering or in the promotion of its application in practice’.

Today at the 147th AES Convention, I was given the Fellowship Award for valuable contributions to, and for encouraging and guiding the next generation of researchers in, the development of audio and musical signal processing.

This is quite an honour, of which I’m very proud. And it puts me in some excellent company. A lot of greats have become Fellows of the AES (Manfred SchroederVesa Valimaki, Poppy Crum, Bob Moog, Richard Heyser, Leslie Ann Jones, Gunther Thiele and Richard Small…) which also means I have a lot to live up to.

And thanks to the AES,

Josh Reiss

Radical and rigorous research at the upcoming Audio Engineering Society Convention

aes-ny-19-logo-small

We previewed the 142nd, 143rd, 144th  and 145th Audio Engineering Society (AES) Conventions, which we also followed with wrap-up discussions. Then we took a break, but now we’re back to preview the 147th AES  convention, October 16 to 19 in New York. As before, the Audio Engineering research team here aim to be quite active at the convention.

We’ve gathered together some information about a lot of the research-oriented events that caught our eye as being unusual, exceptionally high quality, involved in, attending, or just worth mentioning. And this Convention will certainly live up to the hype.

Wednesday October 16th

When I first read the title of the paper ‘Evaluation of Multichannel Audio in Automobiles versus Mobile Phones‘, presented at 10:30, I thought it was a comparison of multichannel automotive audio versus the tinny, quiet mono or barely stereo from a phone. But its actually comparing results of a listening test for stereo vs multichannel in a car, with results of a listening test for stereo vs multichannel for the same audio, but from a phone and rendered over headphones. And the results look quite interesting.

Deep neural networks are all the rage. We’ve been using DNNs to profile a wide variety of audio effects. Scott Hawley will be presenting some impressive related work at 9:30, ‘Profiling Audio Compressors with Deep Neural Networks.’

We previously presented work on digital filters that closely match their analog equivalents. We pointed out that such filters can have cut-off frequencies beyond Nyquist, but did not explore that aspect. ‘Digital Parametric Filters Beyond Nyquist Frequency‘, at 10 am, investigates this idea in depth.

I like a bit of high quality mathematical theory, and that’s what you get in Tamara Smyth’s 11:30 paper ‘On the Similarity between Feedback/Loopback Amplitude and Frequency Modulation‘, which shows a rather surprising (at least at first glance) equivalence between two types of feedback modulation.

There’s an interesting paper at 2pm, ‘What’s Old Is New Again: Using a Physical Scale Model Echo Chamber as a Real-Time Reverberator‘, where reverb is simulated not with impulse response recordings, or classic algorithms, but using scaled models of echo chambers.

At 4 o’clock, ‘A Comparison of Test Methodologies to Personalize Headphone Sound Quality‘ promises to offer great insights not just for headphones, but into subjective evaluation of audio in general.

There’s so many deep learning papers, but the 3-4:30 poster ‘Modal Representations for Audio Deep Learning‘ stands out from the pack. Deep learning for audio most often works with raw spectrogram data. But this work proposes learning modal filterbank coefficients directly, and they find it gives strong results for classification and generative tasks. Also in that session, ‘Analysis of the Sound Emitted by Honey Bees in a Beehive‘ promises to be an interesting and unusual piece of work. We talked about their preliminary results in a previous entry, but now they’ve used some rigorous audio analysis to make deep and meaningful conclusions about bee behaviour.

Immerse yourself in the world of virtual and augmented reality audio technology today, with some amazing workshops, like Music Production in VR and AR, Interactive AR Audio Using Spark, Music Production in Immersive Formats, ISSP: Immersive Sound System Panning, and Real-time Mixing and Monitoring Best Practices for Virtual, Mixed, and Augmented Reality. See the Calendar for full details.

Thursday, October 17th

An Automated Approach to the Application of Reverberation‘, at 9:30, is the first of several papers from our team, and essentially does something to algorithmic reverb similar to what “Parameter Automation in a Dynamic Range Compressor” did for a dynamic range compressor.

Why do public address (PA) systems sound for large venues sound so terrible? They actually have regulations for speech intelligibility. But this is only measured in empty stadiums. At 11 am, ‘The Effects of Spectators on the Speech Intelligibility Performance of Sound Systems in Stadia and Other Large Venues‘ looks at the real world challenges when the venue is occupied.

Two highlights of the 9-10:30 poster session, ‘Analyzing Loudness Aspects of 4.2 Million Musical Albums in Search of an Optimal Loudness Target for Music Streaming‘ is interesting, not just for the results, applications and research questions, but also for the fact that involved 4.2 million albums. Wow! And there’s a lot more to audio engineering research than what one might think. How about using acoustic sensors to enhance autonomous driving systems, which is a core application of ‘Audio Data Augmentation for Road Objects Classification‘.

Audio forensics is a fascinating world, where audio engineering is often applied to unusually but crucially. One such situation is explored at 2:15 in ‘Forensic Comparison of Simultaneous Recordings of Gunshots at a Crime Scene‘, which involves looking at several high profile, real world examples.

Friday, October 18th

There are two papers looking at new interfaces for virtual reality and immersive audio mixing, ‘Physical Controllers vs. Hand-and-Gesture Tracking: Control Scheme Evaluation for VR Audio Mixing‘ at 10:30, and ‘Exploratory Research into the Suitability of Various 3D Input Devices for an Immersive Mixing Task‘ at 3:15.

At 9:15, J. T. Colonel from our group looks into the features that relate, or don’t relate, to preference for multitrack mixes in ‘Exploring Preference for Multitrack Mixes Using Statistical Analysis of MIR and Textual Features‘, with some interesting results that invalidate some previous research. But don’t let negative results discourage ambitious approaches to intelligent mixing systems, like Dave Moffat’s (also from here) ‘Machine Learning Multitrack Gain Mixing of Drums‘, which follows at 9:30.

Continuing this theme of mixing analysis and automation is the poster ‘A Case Study of Cultural Influences on Mixing Preference—Targeting Japanese Acoustic Major Students‘, shown from 3:30-5, which does a bit of meta-analysis by merging their data with that of other studies.

Just below, I mention the need for multitrack audio data sets. Closely related, and also much needed, is this work on ‘A Dataset of High-Quality Object-Based Productions‘, also in the 3:30-5 poster session.

Saturday, October 19th

We’re approaching a world where almost every surface can be a visual display. Imagine if every surface could be a loudspeaker too. Such is the potential of metamaterials, discussed in ‘Acoustic Metamaterial in Loudspeaker Systems Design‘ at 10:45.

Another session, 9 to 11:30 has lots of interesting presentations about music production best practices. At 9, Amandine Pras presents ‘Production Processes of Pop Music Arrangers in Bamako, Mali‘. I doubt there will be many people at the convention who’ve thought about how production is done there, but I’m sure there will be lots of fascinating insights. This is followed at 9:30 by ‘Towards a Pedagogy of Multitrack Audio Resources for Sound Recording Education‘. We’ve published a few papers on multitrack audio collections, sorely needed for researchers and educators, so its good to see more advances.

I always appreciate filling the gaps in my knowledge. And though I know a lot about sound enhancement, I’ve never dived into how its done and how effective it is in soundbars, now widely used in home entertainment. So I’m looking forward to the poster ‘A Qualitative Investigation of Soundbar Theory‘, shown 10:30-12. From the title and abstract though, this feels like it might work better as an oral presentation. Also in that session, the poster ‘Sound Design and Reproduction Techniques for Co-Located Narrative VR Experiences‘ deserves special mention, since it won the Convention’s Best Peer-Reviewed Paper Award, and promises to be an important contribution to the growing field of immersive audio.

Its wonderful to see research make it into ‘product’, and ‘Casualty Accessible and Enhanced (A&E) Audio: Trialling Object-Based Accessible TV Audio‘, presented at 3:45, is a great example. Here, new technology to enhance broadcast audio for those with hearing loss iwas trialed for a popular BBC drama, Casualty. This is of extra interest to me since one of the researchers here, Angeliki Mourgela, does related research, also in collaboration with BBC. And one of my neighbours is an actress who appears on that TV show.

I encourage the project students working with me to aim for publishable research. Jorge Zuniga’s ‘Realistic Procedural Sound Synthesis of Bird Song Using Particle Swarm Optimization‘, presented at 2:30, is a stellar example. He created a machine learning system that uses bird sound recordings to find settings for a procedural audio model. Its a great improvement over other methods, and opens up a whole field of machine learning applied to sound synthesis.

At 3 o’clock in the same session is another paper from our team, Angeliki Mourgela presenting ‘Perceptually Motivated Hearing Loss Simulation for Audio Mixing Reference‘. Roughly 1 in 6 people suffer from some form of hearing loss, yet amazingly, sound engineers don’t know what the content will sound like to them. Wouldn’t it be great if the engineer could quickly audition any content as it would sound to hearing impaired listeners? That’s the aim of this research.

About three years ago, I published a meta-analysis on perception of high resolution audio, which received considerable attention. But almost all prior studies dealt with music content, and there are good reasons to consider more controlled stimuli too (noise, tones, etc). The poster ‘Discrimination of High-Resolution Audio without Music‘ does just that. Similarly, perceptual aspects of dynamic range compression is an oft debated topic, for which we have performed listening tests, and this is rigorously investigated in ‘Just Noticeable Difference for Dynamic Range Compression via “Limiting” of a Stereophonic Mix‘. Both posters are in the 3-4:30 session.

The full program can be explored on the Convention Calendar or the Convention website. Come say hi to us if you’re there! Josh Reiss (author of this blog entry), J. T. Colonel, Angeliki Mourgela and Dave Moffat from the Audio Engineering research team within the Centre for Digital Music, will all be there.

Congratulations, Dr. Will Wilkinson

This afternoon one of our PhD student researchers, Will Wilkinson, successfully defended his PhD. The form of these exams, or vivas, varies from country to country, and even institution to institution, which we discussed previously. Here, its pretty gruelling; behind closed doors, with two expert examiners probing every aspect of the PhD.
Will’s PhD was on ‘Gaussian Process Modelling for Audio Signals.’

Audio signals are characterised and perceived based on how their spectral make-up changes with time. Latent force modelling assumes these characteristics come about as a result of a common input function passing through some input-output process. Uncovering the behaviour of these hidden spectral components is at the heart of many applications involving sound, but is an extremely difficult task given the infinite number of ways any signal can be decomposed.

Will’s thesis studies the application of Gaussian processes to audio, which offer a way to specify probabilities for these functions whilst encoding prior knowledge about sound, the way it behaves, and the way it is perceived. Will advanced the theory considerably, and tested his approach for applications in sound synthesis, denoising and source separation tasks, among others.

http://c4dm.eecs.qmul.ac.uk/audioengineering/latent-force-synthesis/ – demonstrates some of his research applied to sound synthesis, and https://fxive.com/app/main-panel/Mammals.html is a real-time demonstration of his Masters work on sound synthesis for mammalian vocalisations.

Here’s a list of all Will’s papers while a member of the Intelligent Sound Engineering team and the Machine Listening Lab.

Back end developer needed for sound synthesis start-up

logo black on white

FXive (fxive.com) is a real-time sound effect synthesis framework in the browser, spun-out from research developed at Queen Mary University of London by the team behind this blog. It is currently front-end only. We’d like to subcontract a backend developer to implement:

  • Sign-up, log-in and subscription system
  • Payment system for subscription, which offers unlimited sound downloads, and purchasing sounds individually

Additional functionalities can be discussed.

If you’re interested or know a web developer that might be interested, please get in touch with us from fxiveteam@gmail.com .

You can check out some of the sound effect synthesis models used in FXive in previous blog entries.

 

@c4dm @QMUL #backend #webdeveloper #nodejs

 

Nonlinear Audio Effects at ICASSP 2019

The Audio Engineering research team within the Centre for Digital Music is going to be present at ICASSP 2019.

Marco Martínez is presenting the paper ‘Modeling Nonlinear Audio Effects With End-to-end Deep Neural Networks‘, which can be found here.

Basically, given that nonlinear audio effects are widely used by musicians and sound engineers and taking into account that most existing methods for nonlinear modeling are often either simplified or optimized to a very specific circuit. In this work, we introduce a general-purpose deep learning architecture for generic black-box modeling of nonlinear and linear audio effects.

We show the model performing nonlinear modeling for distortion, overdrive, amplifier emulation and combination of linear and nonlinear audio effects.

jimi.jpg

You can listen to some audio samples here.

Details about the presentation:

Session: AASP-L6: Music Signal Analysis, Processing and Synthesis
Location: Meeting Room 1
Time: Thursday, May 16, 09:20 – 09:40 (Approximate)

Title: Modeling Nonlinear Audio Effects With End-to-end Deep Neural Networks
Authors: Marco A. Martinez Ramirez, Joshua D. Reiss