Digging the didgeridoo

The Ig Nobel prizes are tongue-in-cheek awards given every year to celebrate unusual or trivial achievements in science. Named as a play on the Nobel prize and the word ignoble, they are intended to ‘“honor achievements that first make people laugh, and then make them think.” Previously, when discussing graphene-based headphones graphene-based headphones, I mentioned Andre Geim, the only scientist to have won both a Nobel and Ig Nobel prize.

I only recently noticed that the 2017 Ig Nobel Peace Prize went to an international team that demonstrated that playing a didgeridoo is an effective treatment for obstructive sleep apnoea and snoring. Here’s a photo of one of the authors of the study playing the didge at the award ceremony.

59bd25dffc7e9387108b4567

My own nominees for Ig Nobel prizes, from audio-related research published this past year, would included ‘Influence of Audience Noises on the Classical Music Perception on the Example of Anti-cough Candies Unwrapping Noise’, which we discussed in our preview of the 143rd Audio Engineering Society Convention, and the ‘The DFA Fader: Exploring the Power of Suggestion in Loudness Judgments’ , for which we had the blog entry ‘What the f*** are DFA faders‘.

But lets return to Digeridoo research. Its a fascinating aboriginal Australian instrument, with a rich history and interesting acoustics, and produces an eerie drone-like sound.

A search on google scholar, once removing patents and citations, shows only 38 research papers with Didgeridoo in the title. That’s great news if you want to be an expert on research in the subject. The work of Neville H. Fletcher over about a thirty year period beginning in the early 1980s is probably the main starting point.

The passive acoustics of the didgeridoo are well understood. Its a long truncated conical horn where the player’s lips at the smaller end form a pressure-controlled valve. Knowing the length and diameters involved, its not to difficult to determine the fundamental frequencies (often around 50-100 Hz) and modes excited, and their strengths, in much the same way as can be done for many woodwind instruments.

But that’s just the passive acoustics. Fletcher pointed out that traditional, solo didgeridoo players don’t pay much attention to the resonant frequencies and they’re mainly important when its played in Western music, and needs to fit with the rest of an ensemble.

Things start getting really interesting when one considers the sounding mechanism. Players make heavy use of circular breathing, breathing in through the nose while breathing out through the mouth, even more so, and more rhythmically, than is typical in performing Western brass instruments like trumpets and tubas. Changes in lip motion and vocal tract shape are then used to control the formants, allowing the manipulation of very rich timbres.

Its these aspects of didgeridoo playing that intrigued the authors of the sleep apnoea study. Like the DFA and cough drop wrapper studies mentioned above, these were serious studies on a seemingly not so serious subject. Circular breathing and training of respiratory muscles may go a long way towards improving nighttime breathing, and hence reducing snoring and sleep disturbances. The study was controlled and randomised. But, its incredibly difficult in these sorts of studies to eliminate or control for all the other variables, and very hard to identify which aspect of the didgeridoo playing was responsible for the better sleep. The authors quite rightly highlighted what I think is one of the biggest question marks in the study;

A limitation is that those in the control group were simply put on a waiting list because a sham intervention for didgeridoo playing would be difficult. A control intervention such as playing a recorder would have been an option, but we would not be able to exclude effects on the upper airways and compliance might be poor.

In that respect, drug trials are somewhat easier to interpret than practice-based intervention. But the effect was abundantly clear and quite strong. One certainly should not dismiss the results because of limitations (the limitations give rise to question marks, but they’re not mistakes) in the study.

 

Advertisements

Audio Research Year in Review- Part 2, the Headlines

Last week featured the first part of our ‘Audio research year in review.’ It focused on our own achievements. This week is the second, concluding part, with a few news stories related to the topics of this blog (music production, psychoacoustics, sound synthesis and everything in between) for each month of the year.

Browsing through the list, some interesting things pop up. Several news stories related to speech intelligibility in broadcast TV, which has been a recurring story the last few years. Effects of noise pollution on wildlife is also a theme in this year’s audio research headlines. And quite a few of the psychological studies are telling us what we already know. The fact that musicians (who are trained in a task that involves quick response to stimuli) have faster reaction times than non-musicians (who may not be trained in such a task) is not a surprise. Nor is the fact that if you hear the cork popping from a wine bottle, you may think it tastes better, although that’s a wonderful example of the placebo effect. But studies that end up confirming assumptions are still worth doing.

January

February

March

April

May

string wine glass

June

July

August

September

October

November

December

My favorite sessions from the 143rd AES Convention

AES_NY

Recently, several researchers from the audio engineering research team here attended the 143rd Audio Engineering Society Convention in New York. Before the Convention, I wrote a blog entry highlighting a lot of the more interesting or adventurous research that was being presented there. As is usually the case at these Conventions, I have so many meetings to attend that I miss out on a lot of highlights, even ones that I flag up beforehand as ‘must see’. Still, I managed to attend some real gems this time, and I’ll discuss a few of them here.

I’m glad that I attended ‘Audio Engineering with Hearing Loss—A Practical Symposium’ . Hearing loss amongst musicians, audiophiles and audio engineers is an important topic that needs more attention. Overexposure, both prolonged and too loud, is a major cause of hearing dage. In addition to all the issues it causes for anybody, for those in the industry, it affects their ability to work or even appreciate their passion. The session had lots of interesting advice.

The most interesting presentation in the session was from Richard Einhorn, a composer and music producer. In 2010, he lost much of his hearing due to a virus. He woke up one day to find that he had completely lost hearing in his right ear, a condition known as Idiopathic Sudden Sensorineural Hearing Loss. This then evolved into hyperacusis, with extreme distortion, excessive volume and speech intelligibility. In many ways, deafness in the right ear would have been preferred. On top of that, his left ear suffered otosclerosis, where everything was at greatly reduced volume. And given that this was his only functioning ear, the risk of surgery to correct it was too great.

Richard has found some wonderful ways to still function, and even continue working in audio and music, with the limited hearing he still has. There’s a wonderful description of them in Hearing Loss Magazine, and they include the use of the ‘Companion Mic,’ which allowed him to hear from many different locations around a busy, noisy environment, like a crowded restaurant.

Thomas Lund presented ‘The Bandwidth of Human Perception and its Implications for Pro Audio.’ I really wasn’t sure about this before the Convention. I had read the abstract, and thought it might be some meandering, somewhat philosophical talk about hearing perception, with plenty of speculation but lacking in substance. I was very glad to be proven wrong! It had aspects of all of that, but in a very positive sense. It was quite rigorous, essentially a systematic review of research in the field that had been published in medical journals. It looks at the question of auditory perceptual bandwidth, where bandwidth is in a general information theoretic and cognitive sense, not specifically frequency range. The research revolves around the fact that, though we receive many megabits of sensory information every second, it seems that we only use dozens of bits per second of information in our higher level perception. This has lots of implications for listening test design, notably on how to deal with aspects like sample duration or training of participants. This was probably the most fascinating technical talk I saw at the Convention.

There were two papers that I had flagged up as having the most interesting titles, ‘Influence of Audience Noises on the Classical Music Perception on the Example of Anti-cough Candies Unwrapping Noise’, and ‘Acoustic Levitation—Standing Wave Demonstration.’ I had an interesting chat with an author of the first one, Adam Pilch. When walking around much later looking for the poster for the second one, I bump into Adam again. Turns out, he was a co-author on both of them! It looks like Adam Pilch and Bartlomiej Chojnacki (the shared authors on those papers) and their co-authors have an appreciation of the joy of doing research for fun and curiousity, and an appreciation for a good paper title.

Leslie Ann Jones was the Heyser lecturer. The Heyser lecture, named after Richard C. Heyser, is an evening talk given by an eminent individual in audio engineering or related fields. Leslie has had a fascinating career, and gave a talk that makes one realise just how much the industry is changing and growing, and how important are the individuals and opportunities that one encounters in a career.

The last session I attended was also one of the best. Chris Pike, who recently became leader of the audio research team at BBC R&D (he has big shoes to fill, but fits them well and is already racing ahead), presented ‘What’s This? Doctor Who with Spatial Audio!’ . I knew this was going to be good because it involved two of my favorite things, but it was much better than that. The audience were all handed headphones so that they could listen to binaural renderings used throughout the presentation. I love props at technical talks! I also expected the talk to focus almost completely on the binaural, 3d sound rendering for a recent episode, but it was so much more than that. There was quite detailed discussion of audio innovation throughout the more than 50 years of Doctor Who, some of which we have discussed when mentioning Daphne Oram and Delia Derbyshire in our blog entry on female pioneers in audio engineering.

There’s a nice short interview with Chris and colleagues Darran Clement (sound mixer) and Catherine Robinson (audio supervisor) about the binaural sound in Doctor Who on BBC R&D’s blog, and here’s a youtube video promoting the binaural sound in the recent episode;

 

Exciting research at the upcoming Audio Engineering Society Convention

aes143

About five months ago, we previewed the last European Audio Engineering Society Convention, which we followed with a wrap-up discussion. The next AES  convention is just around the corner, October 18 to 21st in New York. As before, the Audio Engineering research team here aim to be quite active at the convention.

These conventions are quite big, with thousands of attendees, but not so large that you get lost or overwhelmed. Away from the main exhibition hall is the Technical Program, which includes plenty of tutorials and presentations on cutting edge research.

So here, we’ve gathered together some information about a lot of the events that we will be involved in, attending, or we just thought were worth mentioning. And I’ve gotta say, the Technical Program looks amazing.

Wednesday

One of the first events of the Convention is the Diversity Town Hall, which introduces the AES Diversity and Inclusion Committee. I’m a firm supporter of this, and wrote a recent blog entry about female pioneers in audio engineering. The AES aims to be fully inclusive, open and encouraging to all, but that’s not yet fully reflected in its activities and membership. So expect to see some exciting initiatives in this area coming soon.

In the 10:45 to 12:15 poster session, Steve Fenton will present Alternative Weighting Filters for Multi-Track Program Loudness Measurement. We’ve published a couple of papers (Loudness Measurement of Multitrack Audio Content Using Modifications of ITU-R BS.1770, and Partial loudness in multitrack mixing) showing that well-known loudness measures don’t correlate very well with perception when used on individual tracks within a multitrack mix, so it would be interesting to see what Steve and his co-author Hyunkook Lee found out. Perhaps all this research will lead to better loudness models and measures.

At 2 pm, Cleopatra Pike will present a discussion and analysis of Direct and Indirect Listening Test Methods. I’m often sceptical when someone draws strong conclusions from indirect methods like measuring EEGs and reaction times, so I’m curious what this study found and what recommendations they propose.

The 2:15 to 3:45 poster session will feature the work with probably the coolest name, Influence of Audience Noises on the Classical Music Perception on the Example of Anti-cough Candies Unwrapping Noise. And yes, it looks like a rigorous study, using an anechoic chamber to record the sounds of sweets being unwrapped, and the signal analysis is coupled with a survey to identify the most distracting sounds. It reminds me of the DFA faders paper from the last convention.

At 4:30, researchers from Fraunhofer and the Technical University of Ilmenau present Training on the Acoustical Identification of the Listening Position in a Virtual Environment. In a recent paper in the Journal of the AES, we found that training resulted in a huge difference between participant results in a discrimination task, yet listening tests often employ untrained listeners. This suggests that maybe we can hear a lot more than what studies suggest, we just don’t know how to listen and what to listen for.

Thursday

If you were to spend only one day this year immersing yourself in frontier audio engineering research, this is the day to do it.

At 9 am, researchers from Harman will present part 1 of A Statistical Model that Predicts Listeners’ Preference Ratings of In-Ear Headphones. This was a massive study involving 30 headphone models and 71 listeners under carefully controlled conditions. Part 2, on Friday, focuses on development and validation of the model based on the listening tests. I’m looking forward to both, but puzzled as to why they weren’t put back-to-back in the schedule.

At 10 am, researchers from the Tokyo University of the Arts will present Frequency Bands Distribution for Virtual Source Widening in Binaural Synthesis, a technique which seems closely related to work we presented previously on Cross-adaptive Dynamic Spectral Panning.

From 10:45 to 12:15, our own Brecht De Man will be chairing and speaking in a Workshop on ‘New Developments in Listening Test Design.’ He’s quite a leader in this field, and has developed some great software that makes the set up, running and analysis of listening tests much simpler and still rigorous.

In the 11-12:30 poster session, Nick Jillings will present Automatic Masking Reduction in Balance Mixes Using Evolutionary Computing, which deals with a challenging problem in music production, and builds on the large amount of research we’ve done on Automatic Mixing.

At 11:45, researchers from McGill will present work on Simultaneous Audio Capture at Multiple Sample Rates and Formats. This helps address one of the challenges in perceptual evaluation of high resolution audio (and see the open access journal paper on this), ensuring that the same audio is used for different versions of the stimuli, with only variation in formats.

At 1:30, renowned audio researcher John Vanderkooy will present research on how a  loudspeaker can be used as the sensor for a high-performance infrasound microphone. In the same session at 2:30, researchers from Plextek will show how consumer headphones can be augmented to automatically perform hearing assessments. Should we expect a new audiometry product from them soon?

At 2 pm, our own Marco Martinez Ramirez will present Analysis and Prediction of the Audio Feature Space when Mixing Raw Recordings into Individual Stems, which applies machine learning to challenging music production problems. Immediately following this, Stephen Roessner discusses a Tempo Analysis of Billboard #1 Songs from 1955–2015, which builds partly on other work analysing hit songs to observe trends in music and production tastes.

At 3:45, there is a short talk on Evolving the Audio Equalizer. Audio equalization is a topic on which we’ve done quite a lot of research (see our review article, and a blog entry on the history of EQ). I’m not sure where the novelty is in the author’s approach though, since dynamic EQ has been around for a while, and there are plenty of harmonic processing tools.

At 4:15, there’s a presentation on Designing Sound and Creating Soundscapes for Still Images, an interesting and unusual bit of sound design.

Friday

Judging from the abstract, the short Tutorial on the Audibility of Loudspeaker Distortion at Bass Frequencies at 5:30 looks like it will be an excellent and easy to understand review, covering practice and theory, perception and metrics. In 15 minutes, I suppose it can only give a taster of what’s in the paper.

There’s a great session on perception from 1:30 to 4. At 2, perceptual evaluation expert Nick Zacharov gives a Comparison of Hedonic and Quality Rating Scales for Perceptual Evaluation. I think people often have a favorite evaluation method without knowing if its the best one for the test. We briefly looked at pairwise versus multistimuli tests in previous work, but it looks like Nick’s work is far more focused on comparing methodologies.

Immediately after that, researchers from the University of Surrey present Perceptual Evaluation of Source Separation for Remixing Music. Techniques for remixing audio via source separation is a hot topic, with lots of applications whenever the original unmixed sources are unavailable. This work will get to the heart of which approaches sound best.

The last talk in the session, at 3:30 is on The Bandwidth of Human Perception and its Implications for Pro Audio. Judging from the abstract, this is a big picture, almost philosophical discussion about what and how we hear, but with some definitive conclusions and proposals that could be useful for psychoacoustics researchers.

Saturday

Grateful Dead fans will want to check out Bridging Fan Communities and Facilitating Access to Music Archives through Semantic Audio Applications in the 9 to 10:30 poster session, which is all about an application providing wonderful new experiences for interacting with the huge archives of live Grateful Dead performances.

At 11 o’clock, Alessia Milo, a researcher in our team with a background in architecture, will discuss Soundwalk Exploration with a Textile Sonic Map. We discussed her work in a recent blog entry on Aural Fabric.

In the 2 to 3:30 poster session, I really hope there will be a live demonstration accompanying the paper on Acoustic Levitation.

At 3 o’clock, Gopal Mathur will present an Active Acoustic Meta Material Loudspeaker System. Metamaterials are receiving a lot of deserved attention, and such advances in materials are expected to lead to innovative and superior headphones and loudspeakers in the near future.

 

The full program can be explored on the Convention Calendar or the Convention website. Come say hi to us if you’re there! Josh Reiss (author of this blog entry), Brecht De Man, Marco Martinez and Alessia Milo from the Audio Engineering research team within the Centre for Digital Music  will all be there.
 

 

Ten Years of Automatic Mixing

tenyears

Automatic microphone mixers have been around since 1975. These are devices that lower the levels of microphones that are not in use, thus reducing background noise and preventing acoustic feedback. They’re great for things like conference settings, where there may be many microphones but only a few speakers should be heard at any time.

Over the next three decades, various designs appeared, but it didn’t really grow much from Dan Dugan’s original Dan Dugan’s original concept.

Enter Enrique Perez Gonzalez, a PhD student researcher and experienced sound engineer. On September 11th, 2007, exactly ten years ago from the publication of this blog post, he presented a paper “Automatic Mixing: Live Downmixing Stereo Panner.” With this work, he showed that it may be possible to automate not just fader levels in speech applications, but other tasks and for other applications. Over the course of his PhD research, he proposed methods for autonomous operation of many aspects of the music mixing process; stereo positioning, equalisation, time alignment, polarity correction, feedback prevention, selective masking minimization, etc. He also laid out a framework for further automatic mixing systems.

Enrique established a new field of research, and its been growing ever since. People have used machine learning techniques for automatic mixing, applied auditory neuroscience to the problem, and explored where the boundaries lie between the creative and technical aspects of mixing. Commercial products have arisen based on the concept. And yet all this is still only scratching the surface.

I had the privilege to supervise Enrique and have many anecdotes from that time. I remember Enrique and I going to a talk that Dan Dugan gave at an AES convention panel session and one of us asked Dan about automating other aspects of the mix besides mic levels. He had a puzzled look and basically said that he’d never considered it. It was also interesting to see the hostile reactions from some (but certainly not all) practitioners, which brings up lots of interesting questions about disruptive innovations and the threat of automation.

wimp3

Next week, Salford University will host the 3rd Workshop on Intelligent Music Production, which also builds on this early research. There, Brecht De Man will present the paper ‘Ten Years of Automatic Mixing’, describing the evolution of the field, the approaches taken, the gaps in our knowledge and what appears to be the most exciting new research directions. Enrique, who is now CTO of Solid State Logic, will also be a panellist at the Workshop.

Here’s a video of one of the early Automatic Mixing demonstrators.

And here’s a list of all the early Automatic Mixing papers.

  • E. Perez Gonzalez and J. D. Reiss, A real-time semi-autonomous audio panning system for music mixing, EURASIP Journal on Advances in Signal Processing, v2010, Article ID 436895, p. 1-10, 2010.
  • Perez-Gonzalez, E. and Reiss, J. D. (2011) Automatic Mixing, in DAFX: Digital Audio Effects, Second Edition (ed U. Zölzer), John Wiley & Sons, Ltd, Chichester, UK. doi: 10.1002/9781119991298. ch13, p. 523-550.
  • E. Perez Gonzalez and J. D. Reiss, “Automatic equalization of multi-channel audio using cross-adaptive methods”, Proceedings of the 127th AES Convention, New York, October 2009
  • E. Perez Gonzalez, J. D. Reiss “Automatic Gain and Fader Control For Live Mixing”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, New York, October 18-21, 2009
  • E. Perez Gonzalez, J. D. Reiss “Determination and correction of individual channel time offsets for signals involved in an audio mixture”, 125th AES Convention, San Francisco, USA, October 2008
  • E. Perez Gonzalez, J. D. Reiss “An automatic maximum gain normalization technique with applications to audio mixing.”, 124th AES Convention, Amsterdam, Netherlands, May 2008
  • E. Perez Gonzalez, J. D. Reiss, “Improved control for selective minimization of masking using interchannel dependency effects”, 11th International Conference on Digital Audio Effects (DAFx), September 2008
  • E. Perez Gonzalez, J. D. Reiss, “Automatic Mixing: Live Downmixing Stereo Panner”, 10th International Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, September 10-15, 2007

The Mix Evaluation Dataset

Still at the upcoming International Conference on Digital Audio Effects in Edinburgh, 5-8 September, our group’s Brecht De Man will be presenting a paper on his Mix Evaluation Dataset (a pre-release of which can be read here).
It is a collection of mixes and evaluations of these mixes, amassed over the course of his PhD research, that has already been the subject of several studies on best practices and perception of mix engineering processes.
With over 180 mixes of 18 different songs, and evaluations from 150 subjects totalling close to 13k statements (like ‘snare drum too dry’ and ‘good vocal presence’), the dataset is certainly the largest and most diverse of its kind.

Unlike the bulk of previous research in this topic, the data collection methodology presented here has maximally preserved ecological validity by allowing participating mix engineers to use representative, professional tools in their preferred environment. Mild constraints on software, such as the agreement to use the DAW’s native plug-ins, means that mixes can be recreated completely and analysed in depth from the DAW session files, which are also shared.

The listening test experiments offered a unique opportunity for the participating mix engineers to receive anonymous feedback from peers, and helped create a large body of ratings and free-field text comments. Annotation and analysis of these comments further helped understand the relative importance of various music production aspects, as well as correlate perceptual constructs (such as reverberation amount) with objective features.

Proportional representation of processors in subjective comments

An interface to browse the songs, audition the mixes, and dissect the comments is provided at http://c4dm.eecs.qmul.ac.uk/multitrack/MixEvaluation/, from where the audio (insofar the source is licensed under Creative Commons, or copyrighted but available online) and perceptual evaluation data can be downloaded as well.

The Mix Evaluation Dataset browsing interface

Sound Effects Taxonomy

At the upcoming International Conference on Digital Audio Effects, Dave Moffat will be presenting recent work on creating a sound effects taxonomy using unsupervised learning. The paper can be found here.

A taxonomy of sound effects is useful for a range of reasons. Sound designers often spend considerable time searching for sound effects. Classically, sound effects are arranged based on some key word tagging, and based on what caused the sound to be created – such as bacon cooking would have the name “BaconCook”, the tags “Bacon Cook, Sizzle, Open Pan, Food” and be placed in the category “cooking”. However, most sound designers know that the sound of frying bacon can sound very similar to the sound of rain (See this TED talk for more info), but rain is in an entirely different folder, in a different section of the SFx Library.

The approach, is to analyse the raw content of the audio files in the sound effects library, and allow a computer to determine which sounds are similar, based on the actual sonic content of the sound sample. As such, the sounds of rain and frying bacon will be placed much closer together, allowing a sound designer to quickly and easily find related sounds that relate to each other.

Here’s a figure from the paper, comparing the generated taxonomy to the original sound effect library classification scheme.

sfxtaxonomy