We’ve made it a tradition on this blog to preview the technical program at the Audio Engineering Society Conventions, as we did with the 142nd, 143rd, and 144th AES Conventions. The 145th AES convention is just around the corner, October 17 to 20 in New York. As before, the Audio Engineering research team behind this blog will be quite active at the convention.
These conventions have thousands of attendees, but aren’t so large that you get lost or overwhelmed. Away from the main exhibition hall is the Technical Program, which includes plenty of tutorials and presentations on cutting edge research.
So we’ve gathered together some information about a lot of the events that caught our eye as being unusual, exceptionally high quality involved in, attending, or just worth mentioning. And this Convention will certainly live up to the hype. Plus, its a special one, the 70th anniversary of the founding of the AES.
By the way, I don’t think I mention a single loudspeaker paper below, but the Technical Program is full of them this time. You could have a full conference just on loudspeakers from them. If you want to become an expert on loudspeaker research, this is the place to be.
Anyway, lets dive right in.
Wednesday, October 17th
We know different cultures listen to music differently, but do they listen to audio coding artifacts differently? Find out at 9:30 when Sascha Disch and co-authors present On the Influence of Cultural Differences on the Perception of Audio Coding Artifacts in Music.
ABX, AB, MUSHRA… so many choices for subjective evaluation and listening tests, so little time. Which one to use, which one gives the strongest results? Lets put them all to the test while looking at the same question. This is what was done in Investigation into the Effects of Subjective Test Interface Choice on the Validity of Results, presented at 11:30. The results are strong, and surprising. Authors include former members of the team behind this blog, Nick Jillings and Brecht de Man, myself and frequent collaborator Ryan Stables.
From 10-11:30, Steve Fenton will be presenting the poster Automatic Mixing of Multitrack Material Using Modified Loudness Models. Automatic mixing is a really hot research area, one where we’ve made quite a few contributions. And a lot of it has involved loudness models for level balancing or fader settings. Someone really should do a review of all the papers focused on that, or better yet, a meta-analysis. Dr. Fenton and co-authors also have another poster in the same session, about a Real-Time System for the Measurement of Perceived Punch. Fenton’s PhD was about perception and modelling of punchiness in audio, and I suggested to him that the thesis should have just been titled ‘Punch!’
The researchers from Harman continue their analysis of headphone preference and quality with A Survey and Analysis of Consumer and Professional Headphones Based on Their Objective and Subjective Performances at 3:30. Harman obviously have a strong interest in this, but its rigorous, high quality research, not promotion.
In the 3:00 to 4:30 poster session, Daniel Johnston presents a wonderful spatial audio application, SoundFields: A Mixed Reality Spatial Audio Game for Children with Autism Spectrum Disorder. I’m pretty sure this isn’t the quirky lo-fi singer/songwriter Daniel Johnston.
Thursday, October 18th
There’s something bizarre about the EBU R128 / ITU-R BS.1770 specification for loudness measurements. It doesn’t give the filter coefficients as a function of sample rate. So, for this and other reasons, even though the actual specification is just a few lines of code, you have to reverse engineer it if you’re doing it yourself, as was done here. At 10 am, Brecht de Man presents Evaluation of Implementations of the EBU R128 Loudness Measurement, which looks carefully at different implementations and provides full implementations in several programming languages.
Roughly one in six people in developed countries suffer some hearing impairment. If you think that seems too high, think how many wear glasses or contact lenses or had eye surgery. And given the sound exposure, I’d expect the average to be higher with music producers. But we need good data on this. Thus, Laura Sinnott’s 3 pm presentation on Risk of Sound-Induced Hearing Disorders for Audio Post Production Engineers: A Preliminary Study is particularly relevant.
Some interesting posters in the 2:45 to 4:15 session. Maree Sheehan’s Audio Portraiture –The Sound of Identity, an Indigenous Artistic Enquiry uses 3D immersive and binaural sound to create audio portraits of Maori women. Its a wonderful use of state of the art audio technologies for cultural and artistic study. Researchers from the University of Alcala in Madrid present an improved method to detect anger in speech in Precision Maximization in Anger Detection in Interactive Voice Response Systems.
Friday, October 19th
There’s plenty of interesting papers this day, but only one I’m highlighting. By coincidence, its my own presentation of work with He Peng, on Why Can You Hear a Difference between Pouring Hot and Cold Water? An Investigation of Temperature Dependence in Psychoacoustics. This was inspired by the curious phenomenon and initial investigations described in a previous blog entry.
Saturday, October 20th
Get there early on Saturday to find out about audio branding from a designer’s perspective in the 9 am Creative Approach to Audio in Corporate Brand Experiences.
Object-based audio allows broadcasters to deliver separate channels for sound effects, music and dialog, which can then be remixed on the client-side. This has high potential for delivering better sound for the hearing-impaired, as described in Lauren Ward’s Accessible Object-Based Audio Using Hierarchical Narrative Importance Metadata at 9:45. I’ve heard this demonstrated by the way, and it sounds amazing.
A big challenge with spatial audio systems is the rendering of sounds that are close to the listener. Descriptions of such systems almost always begin with ‘assume the sound source is in the far field.’ In the 10:30 to 12:00 poster session, researchers from the Chinese Academy of Science present a real advance in this subject with Near-Field Compensated Higher-Order Ambisonics Using a Virtual Source Panning Method.
Rob Maher is one of the world’s leading audio forensics experts. At 1:30 in Audio Forensic Gunshot Analysis and Multilateration, he looks at how to answer the question ‘Who shot first?’ from audio recordings. As is often the case in audio forensics, I suspect this paper was motivated by real court cases.
When visual cues disagree with auditory cues, which ones do you believe? Or conversely, does low quality audio seem more realistic if strengthened by visual cues? These sorts of questions are investigated at 2 pm in the large international collaboration Influence of Visual Content on the Perceived Audio Quality in Virtual Reality. Audio Engineering Society Conventions are full of original research, but survey and review papers are certainly welcomed, especially ones like the thorough and insightful HRTF Individualization: A Survey, presented at 2:30.
Standard devices for measuring auditory brainstem response are typically designed to work only with clicks or tone bursts. A team of researchers from Gdansk developed A Device for Measuring Auditory Brainstem Responses to Audio, presented in the 2:30 to 4 pm poster session.