Our meta-analysis wins best JAES paper 2016!

Last year, we published an Open Access article in the Journal of the Audio Engineering Society (JAES) on “A meta-analysis of high resolution audio perceptual evaluation.”


I’m very pleased and proud to announce that this paper won the award for best JAES paper for the calendar year 2016.

We discussed the research a little bit while it was ongoing, and then in more detail soon after publication. The research addressed a contentious issue in the audio industry. For decades, professionals and enthusiasts have engaged in heated debate over whether high resolution audio (beyond CD quality) really makes a difference. So I undertook a meta-analysis to assess the ability to perceive a difference between high resolution and standard CD quality audio. Meta-analysis is a popular technique in medical research, but this may be the first time that its been formally applied to audio engineering and psychoacoustics. Results showed a highly significant ability to discriminate high resolution content in trained subjects that had not previously been revealed. With over 400 participants in over 12,500 trials, it represented the most thorough investigation of high resolution audio so far.

Since publication, this paper was covered broadly across social media, popular press and trade journals. Thousands of comments were made on forums, with hundreds of thousands of reads.

Here’s one popular independent youtube video discussing it.

and an interview with Scientific American about it,

and some discussion of it in this article for Forbes magazine (which is actually about the lack of a headphone jack in the iPhone 7).

But if you want to see just how angry this research made people, check out the discussion on hydrogenaudio. Wow, I’ve never been called an intellectually dishonest placebophile apologist before 😉 .

In fact, the discussion on social media was full of misinformation, so I’ll try and clear up a few things here;

When I first started looking into this subject , it became clear that potential issues in the studies was a problem. One option would have been to just give up, but then I’d be adding no rigour to a discussion because I felt it wasn’t rigourous enough. Its the same as not publishing because you don’t get a significant result, only now on a meta scale. And though I did not have a strong opinion either way as to whether differences could be perceived, I could easily be fooling myself. I wanted to avoid any of my own biases or judgement calls. So I set some ground rules.

  • I committed to publishing all results, regardless of outcome.
  • A strong motivation for doing the meta-analysis was to avoid cherry-picking studies. So I included all studies for which there was sufficient data for them to be used in meta-analysis.  Even if I thought a study was poor, its conclusions seemed flawed or it disagreed with my own conceptions, if I could get the minimal data to do meta-analysis, I included it. I then discussed potential issues.
  • Any choices regarding analysis or transformation of data was made a priori, regardless of the result of that choice, in an attempt to minimize any of my own biases influencing the outcome.
  • I did further analysis to look at alternative methods of study selection and representation.

I found the whole process of doing a meta-analysis in this field to be fascinating. In audio engineering and psychoacoustics, there are a wealth of studies investigating big questions, and I hope others will use similar approaches to gain deeper insights and perhaps even resolve some issues.


Exciting research at the upcoming Audio Engineering Society Convention


About five months ago, we previewed the last European Audio Engineering Society Convention, which we followed with a wrap-up discussion. The next AES  convention is just around the corner, October 18 to 21st in New York. As before, the Audio Engineering research team here aim to be quite active at the convention.

These conventions are quite big, with thousands of attendees, but not so large that you get lost or overwhelmed. Away from the main exhibition hall is the Technical Program, which includes plenty of tutorials and presentations on cutting edge research.

So here, we’ve gathered together some information about a lot of the events that we will be involved in, attending, or we just thought were worth mentioning. And I’ve gotta say, the Technical Program looks amazing.


One of the first events of the Convention is the Diversity Town Hall, which introduces the AES Diversity and Inclusion Committee. I’m a firm supporter of this, and wrote a recent blog entry about female pioneers in audio engineering. The AES aims to be fully inclusive, open and encouraging to all, but that’s not yet fully reflected in its activities and membership. So expect to see some exciting initiatives in this area coming soon.

In the 10:45 to 12:15 poster session, Steve Fenton will present Alternative Weighting Filters for Multi-Track Program Loudness Measurement. We’ve published a couple of papers (Loudness Measurement of Multitrack Audio Content Using Modifications of ITU-R BS.1770, and Partial loudness in multitrack mixing) showing that well-known loudness measures don’t correlate very well with perception when used on individual tracks within a multitrack mix, so it would be interesting to see what Steve and his co-author Hyunkook Lee found out. Perhaps all this research will lead to better loudness models and measures.

At 2 pm, Cleopatra Pike will present a discussion and analysis of Direct and Indirect Listening Test Methods. I’m often sceptical when someone draws strong conclusions from indirect methods like measuring EEGs and reaction times, so I’m curious what this study found and what recommendations they propose.

The 2:15 to 3:45 poster session will feature the work with probably the coolest name, Influence of Audience Noises on the Classical Music Perception on the Example of Anti-cough Candies Unwrapping Noise. And yes, it looks like a rigorous study, using an anechoic chamber to record the sounds of sweets being unwrapped, and the signal analysis is coupled with a survey to identify the most distracting sounds. It reminds me of the DFA faders paper from the last convention.

At 4:30, researchers from Fraunhofer and the Technical University of Ilmenau present Training on the Acoustical Identification of the Listening Position in a Virtual Environment. In a recent paper in the Journal of the AES, we found that training resulted in a huge difference between participant results in a discrimination task, yet listening tests often employ untrained listeners. This suggests that maybe we can hear a lot more than what studies suggest, we just don’t know how to listen and what to listen for.


If you were to spend only one day this year immersing yourself in frontier audio engineering research, this is the day to do it.

At 9 am, researchers from Harman will present part 1 of A Statistical Model that Predicts Listeners’ Preference Ratings of In-Ear Headphones. This was a massive study involving 30 headphone models and 71 listeners under carefully controlled conditions. Part 2, on Friday, focuses on development and validation of the model based on the listening tests. I’m looking forward to both, but puzzled as to why they weren’t put back-to-back in the schedule.

At 10 am, researchers from the Tokyo University of the Arts will present Frequency Bands Distribution for Virtual Source Widening in Binaural Synthesis, a technique which seems closely related to work we presented previously on Cross-adaptive Dynamic Spectral Panning.

From 10:45 to 12:15, our own Brecht De Man will be chairing and speaking in a Workshop on ‘New Developments in Listening Test Design.’ He’s quite a leader in this field, and has developed some great software that makes the set up, running and analysis of listening tests much simpler and still rigorous.

In the 11-12:30 poster session, Nick Jillings will present Automatic Masking Reduction in Balance Mixes Using Evolutionary Computing, which deals with a challenging problem in music production, and builds on the large amount of research we’ve done on Automatic Mixing.

At 11:45, researchers from McGill will present work on Simultaneous Audio Capture at Multiple Sample Rates and Formats. This helps address one of the challenges in perceptual evaluation of high resolution audio (and see the open access journal paper on this), ensuring that the same audio is used for different versions of the stimuli, with only variation in formats.

At 1:30, renowned audio researcher John Vanderkooy will present research on how a  loudspeaker can be used as the sensor for a high-performance infrasound microphone. In the same session at 2:30, researchers from Plextek will show how consumer headphones can be augmented to automatically perform hearing assessments. Should we expect a new audiometry product from them soon?

At 2 pm, our own Marco Martinez Ramirez will present Analysis and Prediction of the Audio Feature Space when Mixing Raw Recordings into Individual Stems, which applies machine learning to challenging music production problems. Immediately following this, Stephen Roessner discusses a Tempo Analysis of Billboard #1 Songs from 1955–2015, which builds partly on other work analysing hit songs to observe trends in music and production tastes.

At 3:45, there is a short talk on Evolving the Audio Equalizer. Audio equalization is a topic on which we’ve done quite a lot of research (see our review article, and a blog entry on the history of EQ). I’m not sure where the novelty is in the author’s approach though, since dynamic EQ has been around for a while, and there are plenty of harmonic processing tools.

At 4:15, there’s a presentation on Designing Sound and Creating Soundscapes for Still Images, an interesting and unusual bit of sound design.


Judging from the abstract, the short Tutorial on the Audibility of Loudspeaker Distortion at Bass Frequencies at 5:30 looks like it will be an excellent and easy to understand review, covering practice and theory, perception and metrics. In 15 minutes, I suppose it can only give a taster of what’s in the paper.

There’s a great session on perception from 1:30 to 4. At 2, perceptual evaluation expert Nick Zacharov gives a Comparison of Hedonic and Quality Rating Scales for Perceptual Evaluation. I think people often have a favorite evaluation method without knowing if its the best one for the test. We briefly looked at pairwise versus multistimuli tests in previous work, but it looks like Nick’s work is far more focused on comparing methodologies.

Immediately after that, researchers from the University of Surrey present Perceptual Evaluation of Source Separation for Remixing Music. Techniques for remixing audio via source separation is a hot topic, with lots of applications whenever the original unmixed sources are unavailable. This work will get to the heart of which approaches sound best.

The last talk in the session, at 3:30 is on The Bandwidth of Human Perception and its Implications for Pro Audio. Judging from the abstract, this is a big picture, almost philosophical discussion about what and how we hear, but with some definitive conclusions and proposals that could be useful for psychoacoustics researchers.


Grateful Dead fans will want to check out Bridging Fan Communities and Facilitating Access to Music Archives through Semantic Audio Applications in the 9 to 10:30 poster session, which is all about an application providing wonderful new experiences for interacting with the huge archives of live Grateful Dead performances.

At 11 o’clock, Alessia Milo, a researcher in our team with a background in architecture, will discuss Soundwalk Exploration with a Textile Sonic Map. We discussed her work in a recent blog entry on Aural Fabric.

In the 2 to 3:30 poster session, I really hope there will be a live demonstration accompanying the paper on Acoustic Levitation.

At 3 o’clock, Gopal Mathur will present an Active Acoustic Meta Material Loudspeaker System. Metamaterials are receiving a lot of deserved attention, and such advances in materials are expected to lead to innovative and superior headphones and loudspeakers in the near future.


The full program can be explored on the Convention Calendar or the Convention website. Come say hi to us if you’re there! Josh Reiss (author of this blog entry), Brecht De Man, Marco Martinez and Alessia Milo from the Audio Engineering research team within the Centre for Digital Music  will all be there.


Aural fabric

This is a slightly modified version of a post that originally appeared on the Bela blog.

Alessia Milo is an architect currently researching education in acoustics for architecture while pursuing her PhD  with the audio engineering team here, as well as with the Media and Arts Technology programme.

She will present Influences of a Key Map on Soundwalk Exploration with a Textile Sonic Map at the upcoming AES Convention.

Here, she  introduces Aural Fabric, a captivating interactive sound installation consisting of a textile map which plays back field recordings when touched.

Aural Fabric is an interactive textile map allowing you to listen to selected field recordings by touching areas of the map that can sense touch. It uses conductive thread, capacitive sensing and Bela to process sensor data and play back the field recordings. The first map that was made represents a selection of sounds from the area of Greenwich, London. The field recordings of the area were captured with binaural microphones as part of a group soundwalk as part of a study on sonic perception. For the installation I chose recordings of particular locations that have a unique sonic identity, which you can listen to here. The textile map was created as a way of presenting these recordings to the general public.

When I created this project I wanted people to be able to explore the fabric surface of the map and hear the field recordings of the specific locations on the map as they touched it. An interesting way to do this was with conductive thread that I could embroider into the layout of the map. To read the touches from the conductive areas of the map I decided to use the MPR121 capacitive touch sensing board along with a Bela board.

Designing the map


I first considered the scale of the map based on how big the conductive areas could be in order to be touched comfortably, and on the limits of the embroidery machine used (Brother Pr1000E) . I finally settled on a 360mmx200mm frame. The vector traces from the map of the area (retrieved from OpenStreetMap) were reduced to the minimum amount needed to make the map recognizable and easily manageable by the embroidery PE-Design 10 software, which I used to transform the shapes into filling patterns.

Linen was chosen as the best material for the fabric base due to its availability, resistance and plain-aesthetic qualities. I decided to represent the covered areas we entered during the soundwalk as coloured reliefs completely made of grey/gold conductive thread. The park areas were left olive-green if not interactive and green mixed with the conductive thread if interactive. This was to allow the map to be clearly understood in its different elements. Courtyards we crossed were embroidered as flat areas in white with parts in conductive thread, whilst landmarks were represented with a mixture of pale grey, with conductive thread only on the side where the walk took place.

The River Thames, also present in the recordings, was depicted as a pale blue wavy surface with some conductive parts close to the sides where the walk took place. Buildings belonging to the area but not covered in the soundwalk were represented in flat pale grey hatch.

The engineering process

The fabric was meticulously embroidered with coloured rayon and conductive threads thanks to the precision of the embroidery machine. I tested the conductive thread and the different stitch configurations on a small sample of fabric to determine how well the capacitive charges and discharges caused by touching the conductive parts could be read by the breakout board.

The whole map consists of a graphical layer, an insulation layer, an embroidered circuit layer, a second insulation layer, and a bottom layer in neoprene which works as a soft base. Below the capacitive areas of the top layer I cut some holes in the insulation layer to allow the top layer to communicate with the circuit layer. Some of these areas have been also manually stitched to the circuit layer to keep the two layers in place. The fabric can be easily rolled and moved separately from the Bela board.

Some of the embroidered underlying traces. The first two traces appear too close in one point: when the fabric is not fully stretched they risk being triggered together!

Stitching the breakout board

Particular care was taken when connecting the circuit traces in the inner embroidered circuit layer to the capacitive pins of the breakout board. As this connection needs to be extremely solid it was decided to solder some conductive wire to the board, pass it through the holes beforehand, and then stitch the wires one by one to the correspondent conductive thread traces, which were previously embroidered.

Some pointers came from the process of working with the conductive thread:

  • Two traces should never be too close to one another or they will trigger false readings by shorting together.
  • A multimeter comes in handy to verify the continuity of the circuit. To avoid wasting time and material, it’s better to check for continuity on some samples before embroidering the final one as the particular materials and threads in use can behave very differently.
  • Be patient and carefully design your circuit according to the intended position of the capacitive boards. For example, I decided to place the two of them (to allow for 24 separate readings) in the top corners of the fabric.

Connecting with Bela:

The two breakout boards are connected through i2c to Bela which receives the readings from each pin of the breakout boards. The leftmost is connected through i2c to the other one, and this one goes to Bela. This cable is the only connection between the Fabric and Bela. It is possible to set an independent threshold for each pin, which will trigger the index releasing the correspondent recording. The code used to read the capacitive touch breakout board comes with the board and can be found here: examples/06-Sensors/capacitive-touch/.

MPR121 capacitive touch sensing breakout board connected to the i2c terminals of Bela.

The code to handle the recordings was nicely tweaked by Christian Heinrichs to add a natural fade in and fade out for the recordings. This code is based on the multi sample streamer example already available in Bela’s IDE which can be found here: examples/04-Audio/sample-streamer-multi/. Each recording has a pointer that keeps track of where the recording paused, so that touching the corresponding area again will resume playing from that point and not from the beginning. Multiple areas can be played at the same time allowing you to create experimental mixes of different ambiances.

Exhibition setting

This piece is best experienced through headphones as the recordings were made using binaural microphones. Nevertheless it is also possible to use speakers, with some loss of the spatial sonic image fidelity. In either case the audio output is taken directly from the Bela board. In the photograph below I made a wooden and perspex case for the board to protect it while it was installed in a gallery and powered the board with a USB 5V phone charger. Bela was set to run this project on start-up making it simple for gallery assistants to turn the piece on and off. The Aural Fabric is used for my PhD research, focused on novel approaches to strengthening the relationship between architecture and acoustics.  I’m engaging architecture students in sonic explorations and reflections on how architecture and its design contributes to defining our sonic environments.

Aural Fabric: Greenwich has been displayed at Sonic Environments in Brisbane among the installations and Inter/sections 2016 in London. More information documenting the making process is available here.


How does this sound? Evaluating audio technologies

The audio engineering team here have done a lot of work on audio evaluation, both in collaboration with companies and as an essential part of our research. Some challenges come up time and time again, not just in terms of formal approaches, but also in terms of just establishing a methodology that works. I’m aware of cases where a company has put a lot of effort into evaluating the technologies that they create, only for it to make absolutely no difference in the product. So here are some ideas about how to do it, especially from an informal industry perspective.

– When you are tasked with evaluating a technology, you should always maintain a dialogue with the developer. More than anyone else, he or she knows what the tool is supposed to do, how it all works, what content might be best to use and has suggestions on how to evaluate it.

subjective evaluation details

– Developers should always have some test audio content that they use during development. They work with this content all the time to check that the algorithm is modifying or analysing the audio correctly. We’ll come back to this.

– The first stage of evaluation is documentation. Each tool should have some form of user guide, tester guide and developer guide. The idea is that if the technology remains unused for a period of time and those who worked on it have moved on, a new person can read the guides and have a good idea how to use it and test it, and a new developer should be able to understand the algorithm and the source code. Documentation should also include test audio content, preferably both input and output files with information on how the tool should be used with this content.

– The next stage of evaluation is duplication. You should be able run the tool as suggested in the guide and get the expected results with the test audio. If anything in the documentation is incorrect or incomplete, get in touch with the developers for more information.

– Then we have the collection stage. You need test content to evaluate the tool. The most important content is that which shows off exactly what the tool is intended to do. You should also gather content that tests challenging cases, or content where you need to ensure that the effect doesn’t make things worse.

– The preparation stage is next, though this may be performed in tandem with collection. With the test content, you may need to edit it, in order that its ready to use in testing. You may also want to create manually create target content, demonstrating ideal results, or at least of similar sound quality to expected results.

– Next is informal perceptual evaluation. This is lots of listening and playing around with the tool. The goal is to identify problems, find out when it works best, identify interesting cases, problematic or preferred parameter settings.


– Now on to semi-formal evaluation. Have focused questions that you need to find the answer to and procedures and methodologies to answer them. Be sure to document your findings, so that you can say what content causes what problem, how and why, etc. This needs to be done so that the problem can be exactly replicated by developers, and so that you can see if the problem still exists in the next iteration.

– Now comes the all-important listening tests. Be sure that the technology is at a level such that the test will give meaningful results. You don’t want to ask a bunch of people to listen and evaluate if the tool still has major known bugs. You also want to make sure that the test is structured in such a way so that it gives really useful information. This is very important, and often overlooked. Finding out that people preferred implementation A over implementation B is nice, but its much better to find out why, and how much, and if listeners would have preferred something else. You also want to do this test with lots of content. If, for instance only one piece of content is used in a listening test, then you’ve only found out that people prefer A over B for one example. So, generally, listening tests should involve lots of questions, lots of content, and everything should be randomised to prevent bias. You may not have time to do everything, but its definitely worth putting significant time and effort into listening test design.

Keeping Score for the Team

We’ve developed the Web Audio Evaluation Toolbox, designed to make listening test design and implementation straightforward and high quality.

– And there is the feedback stage. Evaluation counts for very little unless all the useful information gets back to developers (and possibly others), and influences further development. All this feedback needs to be prepared and stored, so that people can always refer back to it.

– Finally, there is revisiting and reiteration. If we identify a problem, or a place for improvement, we need to perform the same evaluation on the next iteration of the tool to ensure that the problem has indeed been fixed. Otherwise, issues perpetuate and we never actually know if the tool is improving and problems are resolved and closed.

By the way, I highly recommend the book Perceptual Audio Evaluation by Bech and Zacharov, which is the bible on this subject.

Ten Years of Automatic Mixing


Automatic microphone mixers have been around since 1975. These are devices that lower the levels of microphones that are not in use, thus reducing background noise and preventing acoustic feedback. They’re great for things like conference settings, where there may be many microphones but only a few speakers should be heard at any time.

Over the next three decades, various designs appeared, but it didn’t really grow much from Dan Dugan’s original Dan Dugan’s original concept.

Enter Enrique Perez Gonzalez, a PhD student researcher and experienced sound engineer. On September 11th, 2007, exactly ten years ago from the publication of this blog post, he presented a paper “Automatic Mixing: Live Downmixing Stereo Panner.” With this work, he showed that it may be possible to automate not just fader levels in speech applications, but other tasks and for other applications. Over the course of his PhD research, he proposed methods for autonomous operation of many aspects of the music mixing process; stereo positioning, equalisation, time alignment, polarity correction, feedback prevention, selective masking minimization, etc. He also laid out a framework for further automatic mixing systems.

Enrique established a new field of research, and its been growing ever since. People have used machine learning techniques for automatic mixing, applied auditory neuroscience to the problem, and explored where the boundaries lie between the creative and technical aspects of mixing. Commercial products have arisen based on the concept. And yet all this is still only scratching the surface.

I had the privilege to supervise Enrique and have many anecdotes from that time. I remember Enrique and I going to a talk that Dan Dugan gave at an AES convention panel session and one of us asked Dan about automating other aspects of the mix besides mic levels. He had a puzzled look and basically said that he’d never considered it. It was also interesting to see the hostile reactions from some (but certainly not all) practitioners, which brings up lots of interesting questions about disruptive innovations and the threat of automation.


Next week, Salford University will host the 3rd Workshop on Intelligent Music Production, which also builds on this early research. There, Brecht De Man will present the paper ‘Ten Years of Automatic Mixing’, describing the evolution of the field, the approaches taken, the gaps in our knowledge and what appears to be the most exciting new research directions. Enrique, who is now CTO of Solid State Logic, will also be a panellist at the Workshop.

Here’s a video of one of the early Automatic Mixing demonstrators.

And here’s a list of all the early Automatic Mixing papers.

  • E. Perez Gonzalez and J. D. Reiss, A real-time semi-autonomous audio panning system for music mixing, EURASIP Journal on Advances in Signal Processing, v2010, Article ID 436895, p. 1-10, 2010.
  • Perez-Gonzalez, E. and Reiss, J. D. (2011) Automatic Mixing, in DAFX: Digital Audio Effects, Second Edition (ed U. Zölzer), John Wiley & Sons, Ltd, Chichester, UK. doi: 10.1002/9781119991298. ch13, p. 523-550.
  • E. Perez Gonzalez and J. D. Reiss, “Automatic equalization of multi-channel audio using cross-adaptive methods”, Proceedings of the 127th AES Convention, New York, October 2009
  • E. Perez Gonzalez, J. D. Reiss “Automatic Gain and Fader Control For Live Mixing”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, New York, October 18-21, 2009
  • E. Perez Gonzalez, J. D. Reiss “Determination and correction of individual channel time offsets for signals involved in an audio mixture”, 125th AES Convention, San Francisco, USA, October 2008
  • E. Perez Gonzalez, J. D. Reiss “An automatic maximum gain normalization technique with applications to audio mixing.”, 124th AES Convention, Amsterdam, Netherlands, May 2008
  • E. Perez Gonzalez, J. D. Reiss, “Improved control for selective minimization of masking using interchannel dependency effects”, 11th International Conference on Digital Audio Effects (DAFx), September 2008
  • E. Perez Gonzalez, J. D. Reiss, “Automatic Mixing: Live Downmixing Stereo Panner”, 10th International Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, September 10-15, 2007

What the f*** are DFA faders?

I’ve been meaning to write this blog entry for a while, and I’ve finally gotten around to it. At the 142nd AES Convention, there were two papers that really stood out which weren’t discussed in our convention preview or convention wrap-up. One was about Acoustic Energy Harvesting, which we discussed a few weeks ago, and the other was titled ‘The DFA Fader: Exploring the Power of Suggestion in Loudness The DFA Fader: Exploring the Power of Suggestion in Loudness Judgments.’ When I mentioned this paper to others, their response was always the same, “What’s a DFA Fader?” . Well, the answer is hinted at in the title of this blog entry.

The basic idea is that musicians often give instructions to the sound engineer that he or she can’t or doesn’t want to follow. For instance, a vocalist might say “Turn me up” in a soundcheck, but the sound engineer knows that the vocals are at a nice level already and any more amplification might cause feedback. Sometimes, this sort of thing can be communicated back to the musician in a nice way. But there’s also the fallback option; a fader on the mixing console that “Does F*** All”, aka DFA. The engineer can slide the fader or twiddle an unconnected dial, smile back and say ‘Ok, does this sound a bit better?’.

A couple of companies have had fun with this idea. Funk Logic’s Palindrometer, shown below, is nothing more than a filler for empty rack space. Its an interface that looks like it might do something, but at best, it just flashes some LEDs when one toggles switches and turns the knobs.


RANE have the PI 14 Pseudoacoustic Infector . Its worth checking out the full description, complete with product review and data sheets. I especially like the schematic, copied below.


And in 2014, our own Brecht De Man  released The Wire, a freely available VST and AudioUnit plug-in that emulates a gold-plated, balanced, 100% lossless audio connector.


Anyway, the authors of this paper had the bright idea of doing legitimate subjective evaluation of DFA faders. They didn’t make jokes in the paper, not even to explain the DFA acronym. They took 22 participants and divided them into an 11 person control group and an 11 person test group. In the control group, each subject participated in twenty trials where two identical musical excerpts were presented and the subject had to rate the difference in loudness of vocals between the two excerpts. Only ten excerpts were used, so each pair was used in two trials. In the test group, a sound engineer was present and he made scripted suggestions that he was adjusting the levels in each trial. He could be seen, but participants couldn’t see his hands moving on the console.

Not surprisingly, most trials showed a statistically significant difference between test and control groups, confirming the effectiveness of verbal suggestions associated with the DFA fader. And the authors picked up on an interesting point; results were far more significant for stimuli where vocals were masked by other instruments. This links the work to psychoacoustic studies. Not only is our perception of loudness and timbre influenced by the presence of a masker, but we have a more difficult time judging loudness and hence are more likely to accept the suggestion from an expert.

The authors did an excellent job of critiquing their results. But unfortunately, the full data was not made available with the paper. So we are left with a lot of questions. What were these scripted suggestions? It could make a big difference if the engineer said “I’m going to turn the vocals way up” versus “Let me try something. Does it sound any different now?” And were some participants immune to the suggestions? And because participants couldn’t see a fader being adjusted (interviews with sound engineers had stressed the importance of verbal suggestions), we don’t know how that could influence results.

There is something else that’s very interesting about this. It’s a ‘false experiment’. The whole listening test is a trick since for all participants and in all trials, there was never any loudness differences between the two presented stimuli. So indirectly, it looks at an ‘auditory placebo effect’ that is more fundamental than DFA faders. What were the ratings for loudness differences that participants gave? For the control group especially, did they judge these differences to be small because they trusted their ears, or large because they knew that loudness judging is the nature of the test? Perhaps there is a natural uncertainty in loudness perception regardless of bias. How much weaker does a listener’s judgment become when repeatedly asked to make very subtle choices in a listening test? There’s been some prior work tackling some of these questions, but I think this DFA Faders paper opened up a lot of avenues of interesting research.

Female pioneers in audio engineering

The Heyser lecture is a distinguished talk given at each AES Convention by eminent individuals in audio engineering and related fields. At the 140th AES Convention, Rozenn Nicol was the Heyser lecturer. This was well-deserved, and she has made major contributions to the field of immersive audio. But what was shocking about this is that she is the first woman Heyser lecturer. Its an indicator that woman are under-represented and under-recognised in the field. With that in mind, I’d like to highlight some women who have made major contributions to the field, especially in research and innovation.

  • Birgitta Berglund led major research into the impact of noise on communities. Her influential research resulted in guidelines from the World Health Organisation, and greatly advanced our understanding of noise and its effects on society. She was the 2009 IOA Rayleigh medal recipient.
  • Marina Bosi is a past AES president of the AES. She has been instrumental in the development of standards for audio coding and digital content management standards and formats, including develop the AC-2, AC-3, and MPEG-2 Advanced Audio Coding technologies,
  • Anne-Marie Bruneau has been one of the most important researchers on electrodynamic loudspeaker design, exploring motion impedance and radiation patterns, as well as establishing some of the main analysis and measurement approaches used today. She co-founded the Laboratoire d’Acoustique de l’Université du Maine, now a leading acoustics research center.
  • Ilene J. Busch-Vishniac is responsible for major advances in the theory and understanding of electret microphones, as well as patenting several new designs. She received the ASA R. Bruce Lindsay Award in 1987, and the Silver Medal in Engineering Acoustics in 2001. President of the ASA 2003-4.
  • Elizabeth (Betsy) Cohen was the first female president of the Audio Engineering Society. She was presented with the AES Fellowship Award in 1995 for contributions to understanding the acoustics and psychoacoustics of sound in rooms. In 2001, she was presented with the AES Citation Award for pioneering the technology enabling collaborative multichannel performance over the broadband internet.
  • crumPoppy Crum is head scientist at Dolby Laboratories whose research involves computer research in music and acoustics. At Dolby, she is responsible for integrating neuroscience and knowledge of sensory perception into algorithm design, technological development, and technology strategy.
  • Delia Derbyshire (1937-2001) was an innovator in electronic music who pushed the boundaries of technology and composition. She is most well-known for her electronic arrangement of the theme for Doctor Who, an important example of Musique Concrète. Each note was individually crafted by cutting, splicing, and stretching or compressing segments of analogue tape which contained recordings of a plucked string, oscillators and white noise. Here’s a video detailing a lot of the effects she used, which have now become popular tools in digital music production.
  •  Ann Dowling is the first female president of the Royal Academy of Engineering. Her research focuses on noise analysis and reduction, especially from engines, and she is a leading educator in acoustics. A quick glance at google scholar shows how influential her research has been.
  • Marion Downs was an audiometrist at Colorado Medical Center in Denver, who invented the tests used to measure hearing both In newly born babies and in fetuses.
  • Judy Dubno is Director of Hearing Research at the Medical University of South Carolina. Her research focuses on human auditory function, with emphasis on the processing of auditory information and the recognition of speech, and how these abilities change in adverse listening conditions, with age, and with hearing loss. Recipient of the James Jerger Career Award for Research in Audiology from the American Academy of Audiology and Carhart Memorial Lecturer for the American Auditory Society. President of the ASA in 2014-15.
  • thumb_FiebrinkPhoto3Rebecca Fiebrink researches Human Computer Interaction (HCI) and its application of machine learning to real-time, interactive, and creative domains. She is the creator of the popular Wekinator, which allows anyone to use machine learning to build new musical instruments, real-time music information retrieval and audio analysis systems, computer listening systems and more.
  • Katherine Safford Harris pioneered EMG studies of speech production and auditory perception. Her research was fundamental to speech recognition, speech synthesis, reading machines for the blind, and the motor theory of speech perception. She was elected Fellow of the ASA, the AAAS, the American Speech-Language-Hearing Association, and the New York Academy of Sciences. She was President of the ASA (2000-2001), awarded the Silver Medal in 2005 and Gold Medal in 2007.
  • Rhona Hellman was a Fellow of the ASA. She was a distinguished hearing scientist and preeminent expert in auditory perceptual phenomena. Her research spanned almost 50 years, beginning in 1960. She tackled almost every aspect of loudness, and the work resulted in major advances and developments of loudness standards.
  • Mara Helmuth developed software for composition and improvisation involving granular synthesis. Throughout the 1990s, she paved the way forward by exploring and implementing systems for collaborative performance over the Internet. From 2008-10 she was President of the International Computer Music Association.
  • Carleen_HutchinsCarlene Hutchins (1911-2009) was a leading researcher in the study of violin acoustics, with over a hundred publications in the field. She was founder and president of the Catgut Society, an organization devoted to the study and appreciation of stringed instruments .
  • Sophie Germain (1776-1831) was a French mathematician, scientist and philosopher. She won a major prize from the French Academy of Sciences for developing a theory to explain the vibration of plates due to sound. The history behind her contribution, and the reactions of leading French mathematicians to having a female of similar calibre in their midst, is fascinating. Joseph Fourier, whose work underpins much of audio signal processing, was a champion of her work.
  • Bronwyn Jones was a psychoacoustician at the CBS Technology Center during the 70s and 80s. In seminal work with co-author Emil Torrick, she developed one of the first loudness meters, incorporating both psychoacoustic principles and detailed listening tests. It paved the way for what became major initiatives for loudness measurement, and in some ways outperforms the modern ITU 1770 standard
  • Bozena Kostek is editor of the Journal of the Audio Engineering Society. Her most significant contributions include the applications of neural networks, fuzzy logic and rough sets to musical acoustics, and the application of data processing and information retrieval to the psychophysiology of hearing. Her research has garnered dozens of prizes and awards.
  • Daphne Oram (1925 –2003) was a pioneer of ‘musique concrete’ and a central figure in the evolution of electronic music. She devised the Oramics technique for creating electronic sounds, co-founded the BBC Radiophonic Workshop, and was possibly the first woman to direct an electronic music studio, to set up a personal electronic music studio and to design and construct an electronic musical instrument.
  • scalettiCarla Scaletti is an innovator in computer generated music. She designed the Kyma sound generation computer language in 1986 and co-founded Symbolic Sound Corporation in 1989. Kyma is one of the first graphical programming languages for real time digital audio signal processing, a precursor to MaxMSP and PureData, and is still popular today.
  • Bridget Shield was professor of acoustics at London Southbank University. Her research is most significant in our understanding of the effects of noise on children, and has influenced many government initiatives. From 2012-14, she was the first female President of the Institute of Acoustics.
  • Laurie Spiegel created one of the first computer-based music composition programs, Music Mouse: an Intelligent Instrument, which also has some early examples of algorithmic composition and intelligent automation, both of which are hot research topics today.
  • maryMary Desiree Waller (1886-1959) wrote a definitive treatise on Chladni figures, which are the shapes and patterns made by surface vibrations due to sound, see Sophie Germain, above. It gave far deeper insight into the figures than any previous work.
  • Megan (or Margaret) Watts-Hughes is the inventor of the Eidophone, an early instrument for visualising the sounds made by your voice. She rediscovered this simple method of generating Chladni figures without knowledge of Sophie Germain or Ernst Chladni’s work. There is a great description of her experiments and analysis in her own words.

The Eidophone, demonstrated by Grace Digney.

Do you know some others who should be mentioned? We’d love to hear your thoughts.

Thanks to Theresa Leonard for information on past AES presidents. She was the third female president.  will be the fourth.

And check out Women in Audio: contributions and challenges in music
technology and production for a detailed analysis of the current state of the field.