Venturous Views on Virtual Vienna – a preview of AES 148

#VirtualVienna

We try to write a preview of the technical track for almost all recent Audio Engineering Society (AES) Conventions, see our entries on the 142nd, 143rd, 144th, 145th and 147th Conventions. But this 148th Convention is very different.

It is, of course, an online event. The Convention planning committee have put huge effort into putting it all online and making it a really engaging and exciting experience (and in massively reducing costs). There will be a mix of live-streams, break out sessions, interactive chat rooms and so on. But the technical papers will mostly be on-demand viewing, with Q&A and online dialog with the authors. This is great in the sense that you can view it and interact with authors any time, but it means that its easy to overlook really interesting work.

So we’ve gathered together some information about a lot of the presented research that caught our eye as being unusual, exceptionally high quality, or just worth mentioning. And every paper mentioned here will appear soon in the AES E-Library, by the way. Currently though, you can browse all the abstracts by searching the full papers and engineering briefs on the Convention website.

Deep learning and neural networks are all the rage in machine learning nowadays. A few contributions to the field will be presented by Eugenio Donati with ‘Prediction of hearing loss through application of Deep Neural Network’, Simon Plain with ‘Pruning of an Audio Enhancing Deep Generative Neural Network’, Giovanni Pepe’s presentation of ‘Generative Adversarial Networks for Audio Equalization: an evaluation study’, Yiwen Wang presenting ‘Direction of arrival estimation based on transfer function learning using autoencoder network’, and the author of this post, Josh Reiss will present work done mainly by sound designer/researcher Guillermo Peters, ‘A deep learning approach to sound classification for film audio post-production’. Related to this, check out the Workshop on ‘Deep Learning for Audio Applications – Engineering Best Practices for Data’, run by Gabriele Bunkheila of MathWorks (Matlab), which will be live-streamed  on Friday.

There’s enough work being presented on spatial audio that there could be a whole conference on the subject within the convention. A lot of that is in Keynotes, Workshops, Tutorials, and the Heyser Memorial Lecture by Francis Rumsey. But a few papers in the area really stood out for me. Toru Kamekawa’s investigated a big question with ‘Are full-range loudspeakers necessary for the top layer of 3D audio?’ Marcel Nophut’s ‘Multichannel Acoustic Echo Cancellation for Ambisonics-based Immersive Distributed Performances’ has me intrigued because I know a bit about echo cancellation and a bit about ambisonics, but have no idea how to do the former for the latter.

And I’m intrigued by ‘Creating virtual height loudspeakers using VHAP’, presented by Kacper Borzym. I’ve never heard of VHAP, but the original VBAP paper is the most highly cited paper in the Journal of the AES (1367 citations at the time of writing this).

How good are you at understanding speech from native speakers? How about when there’s a lot of noise in the background? Do you think you’re as good as a computer? Gain some insight into related research when viewing the presentation by Eugenio Donati on ‘Comparing speech identification under degraded acoustic conditions between native and non-native English speakers’.

There’s a few papers exploring creative works, all of which look interesting and have great titles. David Poirier-Quinot will present ‘Emily’s World: behind the scenes of a binaural synthesis production’. Music technology has a fascinating history. Michael J. Murphy will explore the beginning of a revolution with ‘Reimagining Robb: The Sound of the World’s First Sample-based Electronic Musical Instrument circa 1927’. And if you’re into Scandinavian instrumental rock music (and who isn’t?), Zachary Bresler’s presentation of ‘Music and Space: A case of live immersive music performance with the Norwegian post-rock band Spurv’ is a must.

robb

Frank Morse Robb, inventor of the first sample-based electronic musical instrument.

But sound creation comes first, and new technologies are emerging to do it. Damian T. Dziwis will present ‘Body-controlled sound field manipulation as a performance practice’. And particularly relevant given the worldwide isolation going on is ‘Quality of Musicians’ Experience in Network Music Performance: A Subjective Evaluation,’ presented by Konstantinos Tsioutas.

Portraiture looks at how to represent or capture the essence and rich details of a person. Maree Sheehan explores how this is achieved sonically, focusing on Maori women, in an intriguing presentation on ‘Audio portraiture sound design- the development and creation of audio portraiture within immersive and binaural audio environments.’

We talked about exciting research on metamaterials for headphones and loudspeakers when giving previews of previous AES Conventions, and there’s another development in this area presented by Sebastien Degraeve in ‘Metamaterial Absorber for Loudspeaker Enclosures’

Paul Ferguson and colleagues look set to break some speed records, but any such feats require careful testing first, as in ‘Trans-Europe Express Audio: testing 1000 mile low-latency uncompressed audio between Edinburgh and Berlin using GPS-derived word clock’

Our own research has focused a lot on intelligent music production, and especially automatic mixing. A novel contribution to the field, and a fresh perspective, is given in Nyssim Lefford’s presentation of ‘Mixing with Intelligent Mixing Systems: Evolving Practices and Lessons from Computer Assisted Design’.

Subjective evaluation, usually in the form of listening tests, is the primary form of testing audio engineering theory and technology. As Feynman said, ‘if it disagrees with experiment, its wrong!’

And thus, there are quite a few top-notch research presentations focused on experiments with listeners. Minh Voong looks at an interesting aspect of bone conduction with ‘Influence of individual HRTF preference on localization accuracy – a comparison between regular and bone conducting headphones. Realistic reverb in games is incredibly challenging because characters are always moving, so Zoran Cvetkovic tackles this with ‘Perceptual Evaluation of Artificial Reverberation Methods for Computer Games.’ The abstract for Lawrence Pardoe’s ‘Investigating user interface preferences for controlling background-foreground balance on connected TVs’ suggests that there’s more than one answer to that preference question. That highlights the need for looking deep into any data, and not just considering the mean and standard deviation, which often leads to Simpson’s Paradox. And finally, Peter Critchell will present ‘A new approach to predicting listener’s preference based on acoustical parameters,’ which addresses the need to accurately simulate and understand listening test results.

There are some talks about really rigorous signal processing approaches. Jens Ahren will present ‘Tutorial on Scaling of the Discrete Fourier Transform and the Implied Physical Units of the Spectra of Time-Discrete Signals.’ I’m excited about this because it may shed some light on a possible explanation for why we hear a difference between CD quality and very high sample rate audio formats.

The Constant-Q Transform represents a signal in frequency domain, but with logarithmically spaced bins. So potentially very useful for audio. The last decade has seen a couple of breakthroughs that may make it far more practical.  I was sitting next to Gino Velasco when he won the “best student paper” award for Velasco et al.’s “Constructing an invertible constant-Q transform with nonstationary Gabor frames.” Schörkhuber and Klapuri also made excellent contributions, mainly around implementing a fast version of the transform, culminating in a JAES paper. and the teams collaborated together on a popular Matlab toolbox. Now there’s another advance with Felix Holzmüller presenting ‘Computational efficient real-time capable constant-Q spectrum analyzer’.

The abstract for Dan Turner’s ‘Content matching for sound generating objects within a visual scene using a computer vision approach’ suggests that it has implications for selection of sound effect samples in immersive sound design. But I’m a big fan of procedural audio, and think this could have even higher potential for sound synthesis and generative audio systems.

And finally, there’s some really interesting talks about innovative ways to conduct audio research based on practical challenges. Nils Meyer-Kahlen presents ‘DIY Modifications for Acoustically Transparent Headphones’. The abstract for Valerian Drack’s ‘A personal, 3D printable compact spherical loudspeaker array’, also mentions its use in a DIY approach. Joan La Roda’s own experience of festival shows led to his presentation of ‘Barrier Effect at Open-air Concerts, Part 1’. Another presentation with deep insights derived from personal experience is Fabio Kaiser’s ‘Working with room acoustics as a sound engineer using active acoustics.’ And the lecturers amongst us will be very interested in Sebastian Duran’s ‘Impact of room acoustics on perceived vocal fatigue of staff-members in Higher-education environments: a pilot study.’

Remember to check the AES E-Library which will soon have all the full papers for all the presentations mentioned here, including listing all authors not just presenters. And feel free to get in touch with us. Josh Reiss (author of this blog entry), J. T. Colonel, and Angeliki Mourgela from the Audio Engineering research team within the Centre for Digital Music, will all be (virtually) there.

Intelligent Music Production book is published

9781138055193

Ryan Stables is an occasional collaborator and all around brilliant person. He started the annual Workshop on Intelligent Music Production (WIMP) in 2015. Its been going strong ever since, with the 5th WIMP co-located with DAFx, this past September. The workshop series focuses on the application of intelligent systems (including expert systems, machine learning, AI) to music recording, mixing, mastering and related aspects of audio production or sound engineering.

Ryan had the idea for a book about the subject, and myself (Josh Reiss) and Brecht De Man (another all around brilliant person) were recruited as co-authors. What resulted was a massive amount of writing, editing, refining, re-editing and so on. We all contributed big chunks of content, but Brecht pulled it all together and turned it into something really high quality giving a comprehensive overview of the field, suitable for a wide range of audiences.

And the book is finally published today, October 31st! Its part of the AES Presents series by Focal Press, a division of Routledge. You can get it from the publisher, from Amazon or any of the other usual places.

And here’s the official blurb

Intelligent Music Production presents the state of the art in approaches, methodologies and systems from the emerging field of automation in music mixing and mastering. This book collects the relevant works in the domain of innovation in music production, and orders them in a way that outlines the way forward: first, covering our knowledge of the music production processes; then by reviewing the methodologies in classification, data collection and perceptual evaluation; and finally by presenting recent advances on introducing intelligence in audio effects, sound engineering processes and music production interfaces.

Intelligent Music Production is a comprehensive guide, providing an introductory read for beginners, as well as a crucial reference point for experienced researchers, producers, engineers and developers.

 

Fellow of the Audio Engineering Society

The Audio Engineering Society’s Fellowship Award is given to ‘a member who had rendered conspicuous service or is recognized to have made a valuable contribution to the advancement in or dissemination of knowledge of audio engineering or in the promotion of its application in practice’.

Today at the 147th AES Convention, I was given the Fellowship Award for valuable contributions to, and for encouraging and guiding the next generation of researchers in, the development of audio and musical signal processing.

This is quite an honour, of which I’m very proud. And it puts me in some excellent company. A lot of greats have become Fellows of the AES (Manfred SchroederVesa Valimaki, Poppy Crum, Bob Moog, Richard Heyser, Leslie Ann Jones, Gunther Thiele and Richard Small…) which also means I have a lot to live up to.

And thanks to the AES,

Josh Reiss

Radical and rigorous research at the upcoming Audio Engineering Society Convention

aes-ny-19-logo-small

We previewed the 142nd, 143rd, 144th  and 145th Audio Engineering Society (AES) Conventions, which we also followed with wrap-up discussions. Then we took a break, but now we’re back to preview the 147th AES  convention, October 16 to 19 in New York. As before, the Audio Engineering research team here aim to be quite active at the convention.

We’ve gathered together some information about a lot of the research-oriented events that caught our eye as being unusual, exceptionally high quality, involved in, attending, or just worth mentioning. And this Convention will certainly live up to the hype.

Wednesday October 16th

When I first read the title of the paper ‘Evaluation of Multichannel Audio in Automobiles versus Mobile Phones‘, presented at 10:30, I thought it was a comparison of multichannel automotive audio versus the tinny, quiet mono or barely stereo from a phone. But its actually comparing results of a listening test for stereo vs multichannel in a car, with results of a listening test for stereo vs multichannel for the same audio, but from a phone and rendered over headphones. And the results look quite interesting.

Deep neural networks are all the rage. We’ve been using DNNs to profile a wide variety of audio effects. Scott Hawley will be presenting some impressive related work at 9:30, ‘Profiling Audio Compressors with Deep Neural Networks.’

We previously presented work on digital filters that closely match their analog equivalents. We pointed out that such filters can have cut-off frequencies beyond Nyquist, but did not explore that aspect. ‘Digital Parametric Filters Beyond Nyquist Frequency‘, at 10 am, investigates this idea in depth.

I like a bit of high quality mathematical theory, and that’s what you get in Tamara Smyth’s 11:30 paper ‘On the Similarity between Feedback/Loopback Amplitude and Frequency Modulation‘, which shows a rather surprising (at least at first glance) equivalence between two types of feedback modulation.

There’s an interesting paper at 2pm, ‘What’s Old Is New Again: Using a Physical Scale Model Echo Chamber as a Real-Time Reverberator‘, where reverb is simulated not with impulse response recordings, or classic algorithms, but using scaled models of echo chambers.

At 4 o’clock, ‘A Comparison of Test Methodologies to Personalize Headphone Sound Quality‘ promises to offer great insights not just for headphones, but into subjective evaluation of audio in general.

There’s so many deep learning papers, but the 3-4:30 poster ‘Modal Representations for Audio Deep Learning‘ stands out from the pack. Deep learning for audio most often works with raw spectrogram data. But this work proposes learning modal filterbank coefficients directly, and they find it gives strong results for classification and generative tasks. Also in that session, ‘Analysis of the Sound Emitted by Honey Bees in a Beehive‘ promises to be an interesting and unusual piece of work. We talked about their preliminary results in a previous entry, but now they’ve used some rigorous audio analysis to make deep and meaningful conclusions about bee behaviour.

Immerse yourself in the world of virtual and augmented reality audio technology today, with some amazing workshops, like Music Production in VR and AR, Interactive AR Audio Using Spark, Music Production in Immersive Formats, ISSP: Immersive Sound System Panning, and Real-time Mixing and Monitoring Best Practices for Virtual, Mixed, and Augmented Reality. See the Calendar for full details.

Thursday, October 17th

An Automated Approach to the Application of Reverberation‘, at 9:30, is the first of several papers from our team, and essentially does something to algorithmic reverb similar to what “Parameter Automation in a Dynamic Range Compressor” did for a dynamic range compressor.

Why do public address (PA) systems sound for large venues sound so terrible? They actually have regulations for speech intelligibility. But this is only measured in empty stadiums. At 11 am, ‘The Effects of Spectators on the Speech Intelligibility Performance of Sound Systems in Stadia and Other Large Venues‘ looks at the real world challenges when the venue is occupied.

Two highlights of the 9-10:30 poster session, ‘Analyzing Loudness Aspects of 4.2 Million Musical Albums in Search of an Optimal Loudness Target for Music Streaming‘ is interesting, not just for the results, applications and research questions, but also for the fact that involved 4.2 million albums. Wow! And there’s a lot more to audio engineering research than what one might think. How about using acoustic sensors to enhance autonomous driving systems, which is a core application of ‘Audio Data Augmentation for Road Objects Classification‘.

Audio forensics is a fascinating world, where audio engineering is often applied to unusually but crucially. One such situation is explored at 2:15 in ‘Forensic Comparison of Simultaneous Recordings of Gunshots at a Crime Scene‘, which involves looking at several high profile, real world examples.

Friday, October 18th

There are two papers looking at new interfaces for virtual reality and immersive audio mixing, ‘Physical Controllers vs. Hand-and-Gesture Tracking: Control Scheme Evaluation for VR Audio Mixing‘ at 10:30, and ‘Exploratory Research into the Suitability of Various 3D Input Devices for an Immersive Mixing Task‘ at 3:15.

At 9:15, J. T. Colonel from our group looks into the features that relate, or don’t relate, to preference for multitrack mixes in ‘Exploring Preference for Multitrack Mixes Using Statistical Analysis of MIR and Textual Features‘, with some interesting results that invalidate some previous research. But don’t let negative results discourage ambitious approaches to intelligent mixing systems, like Dave Moffat’s (also from here) ‘Machine Learning Multitrack Gain Mixing of Drums‘, which follows at 9:30.

Continuing this theme of mixing analysis and automation is the poster ‘A Case Study of Cultural Influences on Mixing Preference—Targeting Japanese Acoustic Major Students‘, shown from 3:30-5, which does a bit of meta-analysis by merging their data with that of other studies.

Just below, I mention the need for multitrack audio data sets. Closely related, and also much needed, is this work on ‘A Dataset of High-Quality Object-Based Productions‘, also in the 3:30-5 poster session.

Saturday, October 19th

We’re approaching a world where almost every surface can be a visual display. Imagine if every surface could be a loudspeaker too. Such is the potential of metamaterials, discussed in ‘Acoustic Metamaterial in Loudspeaker Systems Design‘ at 10:45.

Another session, 9 to 11:30 has lots of interesting presentations about music production best practices. At 9, Amandine Pras presents ‘Production Processes of Pop Music Arrangers in Bamako, Mali‘. I doubt there will be many people at the convention who’ve thought about how production is done there, but I’m sure there will be lots of fascinating insights. This is followed at 9:30 by ‘Towards a Pedagogy of Multitrack Audio Resources for Sound Recording Education‘. We’ve published a few papers on multitrack audio collections, sorely needed for researchers and educators, so its good to see more advances.

I always appreciate filling the gaps in my knowledge. And though I know a lot about sound enhancement, I’ve never dived into how its done and how effective it is in soundbars, now widely used in home entertainment. So I’m looking forward to the poster ‘A Qualitative Investigation of Soundbar Theory‘, shown 10:30-12. From the title and abstract though, this feels like it might work better as an oral presentation. Also in that session, the poster ‘Sound Design and Reproduction Techniques for Co-Located Narrative VR Experiences‘ deserves special mention, since it won the Convention’s Best Peer-Reviewed Paper Award, and promises to be an important contribution to the growing field of immersive audio.

Its wonderful to see research make it into ‘product’, and ‘Casualty Accessible and Enhanced (A&E) Audio: Trialling Object-Based Accessible TV Audio‘, presented at 3:45, is a great example. Here, new technology to enhance broadcast audio for those with hearing loss iwas trialed for a popular BBC drama, Casualty. This is of extra interest to me since one of the researchers here, Angeliki Mourgela, does related research, also in collaboration with BBC. And one of my neighbours is an actress who appears on that TV show.

I encourage the project students working with me to aim for publishable research. Jorge Zuniga’s ‘Realistic Procedural Sound Synthesis of Bird Song Using Particle Swarm Optimization‘, presented at 2:30, is a stellar example. He created a machine learning system that uses bird sound recordings to find settings for a procedural audio model. Its a great improvement over other methods, and opens up a whole field of machine learning applied to sound synthesis.

At 3 o’clock in the same session is another paper from our team, Angeliki Mourgela presenting ‘Perceptually Motivated Hearing Loss Simulation for Audio Mixing Reference‘. Roughly 1 in 6 people suffer from some form of hearing loss, yet amazingly, sound engineers don’t know what the content will sound like to them. Wouldn’t it be great if the engineer could quickly audition any content as it would sound to hearing impaired listeners? That’s the aim of this research.

About three years ago, I published a meta-analysis on perception of high resolution audio, which received considerable attention. But almost all prior studies dealt with music content, and there are good reasons to consider more controlled stimuli too (noise, tones, etc). The poster ‘Discrimination of High-Resolution Audio without Music‘ does just that. Similarly, perceptual aspects of dynamic range compression is an oft debated topic, for which we have performed listening tests, and this is rigorously investigated in ‘Just Noticeable Difference for Dynamic Range Compression via “Limiting” of a Stereophonic Mix‘. Both posters are in the 3-4:30 session.

The full program can be explored on the Convention Calendar or the Convention website. Come say hi to us if you’re there! Josh Reiss (author of this blog entry), J. T. Colonel, Angeliki Mourgela and Dave Moffat from the Audio Engineering research team within the Centre for Digital Music, will all be there.

Sneak preview of the research to be unveiled at the 145th Audio Engineering Society

max-audio-logo2[1]

We’ve made it a tradition on this blog to preview the technical program at the Audio Engineering Society Conventions, as we did with the 142nd, 143rd, and 144th AES Conventions. The 145th AES  convention is just around the corner, October 17 to 20 in New York. As before, the Audio Engineering research team behind this blog will be quite active at the convention.

These conventions have thousands of attendees, but aren’t so large that you get lost or overwhelmed. Away from the main exhibition hall is the Technical Program, which includes plenty of tutorials and presentations on cutting edge research.

So we’ve gathered together some information about a lot of the events that caught our eye as being unusual, exceptionally high quality involved in, attending, or just worth mentioning. And this Convention will certainly live up to the hype. Plus, its a special one, the 70th anniversary of the founding of the AES.

By the way, I don’t think I mention a single loudspeaker paper below, but the Technical Program is full of them this time. You could have a full conference just on loudspeakers from them. If you want to become an expert on loudspeaker research, this is the place to be.

Anyway, lets dive right in.

Wednesday, October 17th

We know different cultures listen to music differently, but do they listen to audio coding artifacts differently? Find out at 9:30 when Sascha Disch and co-authors present On the Influence of Cultural Differences on the Perception of Audio Coding Artifacts in Music.

ABX, AB, MUSHRA… so many choices for subjective evaluation and listening tests, so little time. Which one to use, which one gives the strongest results? Lets put them all to the test while looking at the same question. This is what was done in Investigation into the Effects of Subjective Test Interface Choice on the Validity of Results, presented at 11:30. The results are strong, and surprising. Authors include former members of the team behind this blog, Nick Jillings and Brecht de Man, myself and frequent collaborator Ryan Stables.

From 10-11:30, Steve Fenton will be presenting the poster Automatic Mixing of Multitrack Material Using Modified Loudness Models. Automatic mixing is a really hot research area, one where we’ve made quite a few contributions. And a lot of it has involved loudness models for level balancing or fader settings. Someone really should do a review of all the papers focused on that, or better yet, a meta-analysis. Dr. Fenton and co-authors also have another poster in the same session, about a Real-Time System for the Measurement of Perceived Punch. Fenton’s PhD was about perception and modelling of punchiness in audio, and I suggested to him that the thesis should have just been titled ‘Punch!’

The researchers from Harman continue their analysis of headphone preference and quality with A Survey and Analysis of Consumer and Professional Headphones Based on Their Objective and Subjective Performances at 3:30. Harman obviously have a strong interest in this, but its rigorous, high quality research, not promotion.

In the 3:00 to 4:30 poster session, Daniel Johnston presents a wonderful spatial audio application, SoundFields: A Mixed Reality Spatial Audio Game for Children with Autism Spectrum Disorder. I’m pretty sure this isn’t the quirky lo-fi singer/songwriter Daniel Johnston.

Thursday, October 18th

There’s something bizarre about the EBU R128 / ITU-R BS.1770 specification for loudness measurements. It doesn’t give the filter coefficients as a function of sample rate. So, for this and other reasons, even though the actual specification is just a few lines of code, you have to reverse engineer it if you’re doing it yourself, as was done here. At 10 am, Brecht de Man presents Evaluation of Implementations of the EBU R128 Loudness Measurement, which looks carefully at different implementations and provides full implementations in several programming languages.

Roughly one in six people in developed countries suffer some hearing impairment. If you think that seems too high, think how many wear glasses or contact lenses or had eye surgery. And given the sound exposure, I’d expect the average to be higher with music producers. But we need good data on this. Thus, Laura Sinnott’s 3 pm presentation on Risk of Sound-Induced Hearing Disorders for Audio Post Production Engineers: A Preliminary Study is particularly relevant.

Some interesting posters in the 2:45 to 4:15 session. Maree Sheehan’s Audio Portraiture –The Sound of Identity, an Indigenous Artistic Enquiry uses 3D immersive and binaural sound to create audio portraits of Maori women. Its a wonderful use of state of the art audio technologies for cultural and artistic study. Researchers from the University of Alcala in Madrid present an improved method to detect anger in speech in Precision Maximization in Anger Detection in Interactive Voice Response Systems.

Friday, October 19th

There’s plenty of interesting papers this day, but only one I’m highlighting. By coincidence, its my own presentation of work with He Peng, on Why Can You Hear a Difference between Pouring Hot and Cold Water? An Investigation of Temperature Dependence in Psychoacoustics. This was inspired by the curious phenomenon and initial investigations described in a previous blog entry.

Saturday, October 20th

Get there early on Saturday to find out about audio branding from a designer’s perspective in the 9 am Creative Approach to Audio in Corporate Brand Experiences.

Object-based audio allows broadcasters to deliver separate channels for sound effects, music and dialog, which can then be remixed on the client-side. This has high potential for delivering better sound for the hearing-impaired, as described in Lauren Ward’s Accessible Object-Based Audio Using Hierarchical Narrative Importance Metadata at 9:45. I’ve heard this demonstrated by the way, and it sounds amazing.

A big challenge with spatial audio systems is the rendering of sounds that are close to the listener. Descriptions of such systems almost always begin with ‘assume the sound source is in the far field.’ In the 10:30 to 12:00 poster session, researchers from the Chinese Academy of Science present a real advance in this subject with Near-Field Compensated Higher-Order Ambisonics Using a Virtual Source Panning Method.

Rob Maher is one of the world’s leading audio forensics experts. At 1:30 in Audio Forensic Gunshot Analysis and Multilateration, he looks at how to answer the question ‘Who shot first?’ from audio recordings. As is often the case in audio forensics, I suspect this paper was motivated by real court cases.

When visual cues disagree with auditory cues, which ones do you believe? Or conversely, does low quality audio seem more realistic if strengthened by visual cues? These sorts of questions are investigated at 2 pm in the large international collaboration Influence of Visual Content on the Perceived Audio Quality in Virtual Reality. Audio Engineering Society Conventions are full of original research, but survey and review papers are certainly welcomed, especially ones like the thorough and insightful HRTF Individualization: A Survey, presented at 2:30.

Standard devices for measuring auditory brainstem response are typically designed to work only with clicks or tone bursts. A team of researchers from Gdansk developed A Device for Measuring Auditory Brainstem Responses to Audio, presented in the 2:30 to 4 pm poster session.

 

Hopefully, I can also give a wrap-up after the Convention, as we did here and here.

Aeroacoustic Sound Effects – Journal Article

I am delighted to be able to announce that my article on Creating Real-Time Aeroacoustic Sound Effects Using Physically Informed Models is in this months Journal of the Audio Engineering Society. This is an invited article following winning the best paper award at the Audio Engineering Society 141st Convention in LA. It is an open access article so free for all to download!

The article extends the original paper by examining how the Aeolian tone synthesis models can be used to create a number of sound effects. The benefits of these models are that the produce plausible sound effects which operate in real-time. Users are presented with a number of highly relevant parameters to control the effects which can be mapped directly to 3D models within game engines.

The basics of the Aeolian tone were given in a previous blog post. To summarise, a tone is generated when air passes around an object and vortices are shed behind it. Fluid dynamic equations are available which allow a prediction of the tone frequency based on the physics of the interaction between the air and object. The Aeolian tone is modelled as a compact sound source.

To model a sword or similar object a number of these compact sound sources are placed in a row. A previous blog post describes this in more detail. The majority of compact sound sources are placed at the tip as this is where the airspeed is greatest and the greatest sound is generated.

The behaviour of a sword when being swung has to be modelled which then used to control some of the parameters in the equations. This behaviour can be controlled by a game engine making fully integrated procedural audio models.

The sword model was extended to include objects like a baseball bat and golf club, as well as a broom handle. The compact sound source of a cavity tone was also added in to replicate swords which have grooved profiles. Subjective evaluation gave excellent results, especially for thicker objects which were perceived as plausible as pre-recorded samples.

The synthesis model could be extended to look at a range of sword cross sections as well as any influence of the material of the sword. It is envisaged that other sporting equipment which swing or fly through the air could be modelled using compact sound sources.

A propeller sound is one which is common in games and film and partially based on the sounds generated from the Aeolian tone and vortex shedding. As a blade passes through the air vortices are shed at a specific frequency along the length. To model individual propeller blades the profiles of a number were obtained with specific span length (centre to tip) and chord lengths (leading edge to trailing edge).

Another major sound source is the loading sounds generated by the torque and thrust. A procedure for modelling these sounds is outlined in the article. Missing from the propeller model is distortion sounds. These are more associated with rotors which turn in the horizontal plane.

An important sound when hearing a propeller powered aircraft is the engine sound. The one taken for this model was based on one of Andy Farnell’s from his book Designing Sound. Once complete a user is able to select an aircraft from a pre-programmed bank and set the flight path. If linked to a game engine the physical dimensions and flight paths can all be controlled procedurally.

Listening tests indicate that the synthesis model was as plausible as an alternative method but still not as plausible as pre-recorded samples. It is believed that results may have been more favourable if modelling electric-powered drones and aircraft which do not have the sound of a combustion engine.

The final model exploring the use of the Aeolian tone was that of an Aeolian Harp. This is a musical instrument that is activated by wind blowing around the strings. The vortices that are shed behind the string can activate a mechanical vibration if they are around the frequency of one of the strings natural harmonics. This produces a distinctive sound.

The digital model allows a user to synthesis a harp of up to 13 strings. Tension, mass density, length and diameter can all be adjusted to replicate a wide variety of string material and harp size. Users can also control a wind model modified from one presented in Andy Farnell’s book Designing Sound, with control over the amount of gusts. Listening tests indicate that the sound is not as plausible as pre-recorded ones but is as plausible as alternative synthesis methods.

The article describes the design processes in more detail as well as the fluid dynamic principles each was developed from. All models developed are open source and implemented in pure data. Links to these are in the paper as well as my previous publications. Demo videos can be found on YouTube.

Weird and wonderful research to be unveiled at the 144th Audio Engineering Society Convention

th

Last year, we previewed the142nd and 143rd AES Conventions, which we followed with a wrap-up discussions here and here. The next AES  convention is just around the corner, May 23 to 26 in Milan. As before, the Audio Engineering research team here aim to be quite active at the convention.

These conventions have thousands of attendees, but aren’t so large that you get lost or overwhelmed. Away from the main exhibition hall is the Technical Program, which includes plenty of tutorials and presentations on cutting edge research.

So we’ve gathered together some information about a lot of the events that caught our eye as being unusual, exceptionally high quality involved in, attending, or just worth mentioning. And this Convention will certainly live up to the hype.

Wednesday May 23rd

From 11:15 to 12:45 that day, there’s an interesting poster by a team of researchers from the University of Limerick titled Can Visual Priming Affect the Perceived Sound Quality of a Voice Signal in Voice over Internet Protocol (VoIP) Applications? This builds on work we discussed in a previous blog entry, where they did a perceptual study of DFA Faders, looking at how people’s perception of mixing changes when the sound engineer only pretends to make an adjustment.

As expected given the location, there’s lots of great work being presented by Italian researchers. The first one that caught my eye is the 2:30-4 poster on Active noise control for snoring reduction. Whether you’re a loud snorer, sleep next to someone who is a loud snorer or just interested in unusual applications of audio signal processing, this one is worth checking out.

Do you get annoyed sometimes when driving and the road surface changes to something really noisy? Surely someone should do a study and find out which roads are noisiest so that then we can put a bit of effort into better road design and better in-vehicle equalisation and noise reduction? Well, now its finally happened with this paper in the same session on Deep Neural Networks for Road Surface Roughness Classification from Acoustic Signals.

Thursday, May 24

If you were to spend only one day this year immersing yourself in frontier audio engineering research, this is the day to do it.

How do people mix music differently in different countries? And do people perceive the mixes differently based on their different cultural backgrounds? These are the sorts of questions our research team here have been asking. Find out more in this 9:30 presentation by Amandine Pras. She led this Case Study of Cultural Influences on Mixing Practices, in collaboration with Brecht De Man (now with Birmingham City University) and myself.

Rod Selfridge has been blazing new trails in sound synthesis and procedural audio. He won the Best Student Paper Award at AES 141st Convention and the Best Paper Award at Sound and Music Computing. He’ll give another great presentation at noon on Physically Derived Synthesis Model of an Edge Tone which was also discussed in a recent blog entry.

I love the title of this next paper, Miniaturized Noise Generation System—A Simulation of a Simulation, which will be presented at 2:30pm by researchers from Intel Technology in Gdansk, Poland. This idea of a meta-simulation is not as uncommon as you might think; we do digital emulation of old analogue synthesizers, and I’ve seen papers on numerical models of Foley rain sound generators.

A highlight for our team here is our 2:45 pm presentation, FXive: A Web Platform for Procedural Sound Synthesis. We’ll be unveiling a disruptive innovation for sound design, FXive.com, aimed at replacing reliance on sound effect libraries. Please come check it out, and get in touch with the presenters or any members of the team to find out more.

Immediately following this is a presentation which asks Can Algorithms Replace a Sound Engineer? This is a question the research team here have also investigated a lot, you could even say it was the main focus of our research for several years. The team behind this presentation are asking it in relation to Auto-EQ. I’m sure it will be interesting, and I hope they reference a few of our papers on the subject.

From 9-10:30, I will chair a Workshop on The State of the Art in Sound Synthesis and Procedural Audio, featuring the world’s experts on the subject. Outside of speech and possibly music, sound synthesis is still in its infancy, but its destined to change the world of sound design in the near future. Find out why.

12:15 — 13:45 is a workshop related to machine learning in audio (a subject that is sometimes called Machine Listening), Deep Learning for Audio Applications. Deep learning can be quite a technical subject, and there’s a lot of hype around it. So a Workshop on the subject is a good way to get a feel for it. See below for another machine listening related workshop on Friday.

The Heyser Lecture, named after Richard Heyser (we discussed some of his work in a previous entry), is a prestigious evening talk given by one of the eminent individuals in the field. This one will be presented by Malcolm Hawksford. , a man who has had major impact on research in audio engineering for decades.

Friday

The 9:30 — 11 poster session features some unusual but very interesting research. A talented team of researchers from Ancona will present A Preliminary Study of Sounds Emitted by Honey Bees in a Beehive.

Intense solar activity in March 2012 caused some amazing solar storms here on Earth. Researchers in Finland recorded them, and some very unusual results will be presented in the same session with the poster titled Analysis of Reports and Crackling Sounds with Associated Magnetic Field Disturbances Recorded during a Geomagnetic Storm on March 7, 2012 in Southern Finland.

You’ve been living in a cave if you haven’t noticed the recent proliferation of smart devices, especially in the audio field. But what makes them tick, is there a common framework and how are they tested? Find out more at 10:45 when researchers from Audio Precision will present The Anatomy, Physiology, and Diagnostics of Smart Audio Devices.

From 3 to 4:30, there’s a Workshop on Artificial Intelligence in Your Audio. It follows on from a highly successful workshop we did on the subject at the last Convention.

Saturday

A couple of weeks ago, John Flynn wrote an excellent blog entry describing his paper on Improving the Frequency Response Magnitude and Phase of Analogue-Matched Digital Filters. His work is a true advance on the state of the art, providing digital filters with closer matches to their analogue counterparts than any previous approaches. The full details will be unveiled in his presentation at 10:30.

If you haven’t seen Mariana Lopez presenting research, you’re missing out. Her enthusiasm for the subject is infectious, and she has a wonderful ability to convey the technical details, their deeper meanings and their importance to any audience. See her one hour tutorial on Hearing the Past: Using Acoustic Measurement Techniques and Computer Models to Study Heritage Sites, starting at 9:15.

The full program can be explored on the Convention Calendar or the Convention website. Come say hi to us if you’re there! Josh Reiss (author of this blog entry), John Flynn, Parham Bahadoran and Adan Benito from the Audio Engineering research team within the Centre for Digital Music, along with two recent graduates Brecht De Man and Rod Selfridge, will all be there.

Analogue matched digital EQ: How far can you go linearly?

(Background post for the paper “Improving the frequency response magnitude and phase of
analogue-matched digital filters” by John Flynn & Josh Reiss for AES Milan 2018)

Professional audio mastering is a field that is still dominated by analogue hardware. Many mastering engineers still favour their go-to outboard compressors and equalisers over digital emulations. As a practising mastering engineer myself, I empathise. Quality analogue gear has a proven track record in terms of sonic quality spanning about a century. Even though digital approximations of analogue tools have gotten better, particularly over the past decade, I too have tended to reach for analogue hardware. However, through my research at Queen Mary with Professor Josh Reiss, that is changing.

When modelling an analogue EQ, a lot of focus has been in modelling distortions and other non-linearities, we chose to look at the linear component. Have we reached a ceiling in terms of modelling an analogue prototype filter in the digital domain? Can we do better? We found that yes there was room for improvement and yes we can do better.

The milestone of research in this area is Orfanidis’ 1997 paper “Digital parametric equalizer design with prescribed Nyquist-frequency gain“, the first major improvement over the bilinear transform which has a reknowned ‘cramped’ sound in the high frequencies. Basically, the bilinear transform is what all first generation digital equalisers is based on. It’s high frequencies towards 20kHz drops sharply, giving a ‘closed/cramped’ sound. Orfanidis and later improvements by Massberg [9] & Gunness/Chauhan [10] give a much better approximation of an analogue prototype.

blt

However [9],[10] improve magnitude, they don’t capture analogue phase. Bizarrely, the bilinear transform performs reasonably well on phase. So we knew it was possible.

So the problem is: how do you get a more accurate magnitude match to analogue than [9],[10]? While also getting a good match to phase? Many attempts, including complicated iterative Parks/McClellen filter design approaches, fell flat. It turned out that Occam was right, in this case a simple answer was the better answer.

By combining a matched-z transform, frequency sampling filter design and a little bit of clever coefficient manipulation, we achieved excellent results. A match to the analogue prototype to an arbitrary degree. At low filter lengths you get a filter that performs as well as [9],[10] in magnitude but also matches analogue phase. By using longer filter lengths the match to analogue is extremely precise, in both magnitude and phase (lower error is more accurate)

error-vs

 

Since submitting the post I have released the algorithm in a plugin with my mastering company and been getting informal feedback from other mastering engineers about how this sounds in use.

balance-mastering-analog-magpha-eq-plugin-small-new

Overall the word back has been overwhelmingly positive, with one engineer claiming it to be the “the best sounding plugin EQ on the market to date”. It’s nice know that those long hours staring at decibel error charts have not been in vain.

Are you heading to AES Milan next month? Come up and say hello!

 

Audio Engineering Society E-library

I try to avoid too much promotion in this blog, but in this case I think its justified. I’m involved in advancing a resource from a non-profit professional organisation, the Audio Engineering Society. They do lots and lots of different things, promoting the science, education and practice of all things audio engineering related. Among others, they’ve been publishing research in the area for almost 70 years, and institutions can get full access to all the content in a searchable library. In recent posts, I’ve written about some of the greatest papers ever published there, Part 1 and Part 2, and about one of my own contributions.

In an ideal world, this would all be Open Access . But publishing still costs money, so the AES support both gold Open Access (free to all, but authors pay Article Processing Charges) and the traditional model, where its free to publish but individuals or institutions subscribe or articles can be purchased individually. AES members get free access. I could write many blog articles just about Open Access (should I?)- its never as straightforward as it seems. At its best it is freely disseminating information for the benefit of all, but at its worst its like Pay to Play, a highly criticised practice for the music industry, and gives publishers an incentive to lower acceptance standards. But for now I’ll just point out that the AES does its absolute best to keep the costs down, regardless of publishing model, and the costs are generally much less than similar publishers.

Anyway, the AES realised that one of the most cost effective ways to get our content out to large communities is through institutional licenses or subscriptions. And we’re missing an opportunity here since we haven’t really promoted this option. And everybody benefits from it; wider dissemination of knowledge and research, more awareness of the AES, better access, etc. With this in mind, the AES issued the following press release, which I have copied verbatim. You can also find it as a tweet, blog entry or facebook post.

AES_ELibrary

AES E-Library Subscriptions Benefit Institutions and Organizations

— The Audio Engineering Society E-Library is the world’s largest collection of audio industry resources, and subscriptions provide access to extensive content for research, product development and education — 

New York, NY, March 22, 2018 — Does your research staff, faculty or students deserve access to the world’s most comprehensive collection of audio information? The continuously growing Audio Engineering Society (AES) E-Library contains over 16,000 fully searchable PDF files documenting the progression of audio research from 1953 to the present day. It includes every AES paper published from every AES convention and conference, as well as those published in the Journal of the Audio Engineering Society. From the phonograph to MP3s, from early concepts of digital audio through its fulfillment as the mainstay of audio production, distribution and reproduction, to leading-edge realization of spatial audio and audio for augmented and virtual reality, the E-Library provides a gateway to both the historical and the forward-looking foundational knowledge that sustains an entire industry.  

The AES E-Library has become the go-to online resource for anyone looking to gain instant access to the vast amount of information gathered by the Audio Engineering Society through research, presentations, interviews, conventions, section meetings and more. “Our academic and research staff, and PhD and undergraduate Tonmeister students, use the AES E-Library a lot,” says Dr. Tim Brookes, Senior Lecturer in Audio & Director of Research Institute of Sound Recording (IoSR) University of Surrey. “It’s an invaluable resource for our teaching, for independent student study and, of course, for our research.” 

“Researchers, academics and students benefit from E-Library access daily,” says Joshua Reiss, Chair of the AES Publications Policy Committee, “while many relevant institutions – academic, governmental or corporate – do not have an institutional license of the AES E-library, which means their staff or students are missing out on all the wonderful content there. We encourage all involved in audio research and investigation to inquire if their libraries have an E-Library subscription and, if not, suggest the library subscribe.” 

E-Library subscriptions can be obtained directly from the AES or through journal bundling services. A subscription allows a library’s users to download any document in the E-Library at no additional cost. 

“As an international audio company with over 25,000 employees world-wide, the AES E-library has been an incredibly valuable resource used by Harman audio researchers, engineers, patent lawyers and others,” says Dr. Sean Olive, Acoustic Research Fellow, Harman International. “It has paid for itself many times over.” 

The fee for an institutional online E-Library subscription is $1800 per year, which is significantly less than equivalent publisher licenses. 

To search the E-library, go to http://www.aes.org/e-lib/

To arrange for an institutional license, contact Lori Jackson directly at lori.jackson@aes.org, or go to http://www.aes.org/e-lib/subscribe/.

 

About the Audio Engineering Society
The Audio Engineering Society, celebrating its 70th anniversary in 2018, now counts over 12,000 members throughout the U.S., Latin America, Europe, Japan and the Far East. The organization serves as the pivotal force in the exchange and dissemination of technical information for the industry. Currently, its members are affiliated with 90 AES professional sections and more than 120 AES student sections around the world. Section activities include guest speakers, technical tours, demonstrations and social functions. Through local AES section events, members experience valuable opportunities for professional networking and personal growth. For additional information visit http://www.aes.org.

Join the conversation and keep up with the latest AES News and Events:
Twitter: #AESorg (AES Official) 
Facebook: http://facebook.com/AES.org

Greatest JAES papers of all time, Part 2

Last week I revealed Part 1 of the greatest ever papers published in the Journal of the Audio Engineering Society (JAES). JAES is the premier peer-reviewed journal devoted exclusively to audio technology, and the flagship publication of the AES. This week, its time for Part 2. There’s little rhyme or reason to how I divided up and selected the papers, other than I started by looking at the most highly cited ones according to Google Scholar. But all the papers listed here have had major impact on the science, education and practice of audio engineering and related fields.

All of the papers below are available from the Audio Engineering Society (AES) E-library, the world’s most comprehensive collection of audio information. It contains over 16,000 fully searchable PDF files documenting the progression of audio research from 1953 to the present day. It includes every AES paper published at a convention, conference or in the Journal. Members of the AES get free access to the E-library. To arrange for an institutional license, giving full access to all members of an institution, contact Lori Jackson Lori Jackson directly, or go to http://www.aes.org/e-lib/subscribe/ .

And without further ado, here are the rest of the Selected greatest JAES papers

More than any other work, this 1992 paper by Stanley Lipshitz and co-authors has resulted in the correct application of dither by music production. Its one possible reason that digital recording quality improved after the early years of the Compact Disc (though the loudness wars reversed that trend). As renowned mastering engineer Bob Katz put it, “if you want to get your digital audio done just right, then you should learn about dither,” and there is no better resource than this paper.

According to Wikipedia, this 1993 paper coined the term Auralization as an analogy to visualization for rendering audible (imaginary) sound fields. This general research area of understanding and rendering the sound field of acoustic spaces has resulted in several other highly influential papers. Berkhout’s 1988 A holographic approach to acoustic control (575 citations) described the appealingly named acoustic holography method for rendering sound fields. In 1999, the groundbreaking Creating interactive virtual acoustic environments (427 citations) took this further, laying out the theory and challenges of virtual acoustics rendering, and paving the way for highly realistic audio in today’s Virtual Reality systems.

The Schroeder reverberator was first described here, way back in 1962. It has become the basis for almost all algorithmic reverberation approaches. Manfred Schroeder was another great innovator in the audio engineering field. A long transcript of a fascinating interview is available here, and a short video interview below.

These two famous papers are the basis for the Thiele Small parameters. Thiele rigorously analysed and simulated the performance of loudspeakers in the first paper from 1971, and Small greatly extended the work in the second paper in 1972. Both had initially published the work in small Australian journals, but it didn’t get widely recognised until the JAES publications. These equations form the basis for much of loudspeaker design.

Check out;

or the dozens of youtube videos about choosing and designing loudspeakers which make use of these parameters.

This is the first English language publication to describe the Haas effect, named after the author. Also called the precedence effect, it investigated the phenomenon that when sending the same signal to two loudspeakers, a small delay between the speakers results in the sound appearing to come just from one speaker. Its now widely used in sound reinforcement systems, and in audio production to give a sense of depth or more realistic panning (the Haas trick).

Hass-effect

This is the first ever research paper published in JAES. Published in August 1949, it set a high standard for rigour, while at the same time emphasising that many publications will have strong relevance not just to researchers, but to audiophiles and practitioners as well.

It described a new instrument for frequency response measurement and display. People just love impulse response and transfer function measurements, and some of the most highly cited JAES papers are on this topic; 1983’s An efficient algorithm for measuring the impulse response using pseudorandom noise (308 citations), Transfer-function measurement with maximum-length sequences (771 citations), the 2001 paper from a Brazil-based team, Transfer-function measurement with sweeps (722 citations), and finally Comparison of different impulse response measurement techniques (276 citations) in 2002. With a direct link between theory and new applications, these papers on maximum length sequence approaches and sine sweeps were major advances over the alternatives, and changed the way such measurements are made.

And the winner is… Ville Pulkki’s Vector Base Amplitude Panning (VBAP) paper! This is the highest cited paper in JAES. Besides deriving the stereo panning law from basic geometry, it unveiled VBAP, an intuitive and now widely used spatial audio technique. Ten years later, Pulkki unveiled another groundbreaking spatial audio format, DirAC, in Spatial sound reproduction with directional audio coding (386 citations).