Exciting research at the upcoming Audio Engineering Society Convention


About five months ago, we previewed the last European Audio Engineering Society Convention, which we followed with a wrap-up discussion. The next AES  convention is just around the corner, October 18 to 21st in New York. As before, the Audio Engineering research team here aim to be quite active at the convention.

These conventions are quite big, with thousands of attendees, but not so large that you get lost or overwhelmed. Away from the main exhibition hall is the Technical Program, which includes plenty of tutorials and presentations on cutting edge research.

So here, we’ve gathered together some information about a lot of the events that we will be involved in, attending, or we just thought were worth mentioning. And I’ve gotta say, the Technical Program looks amazing.


One of the first events of the Convention is the Diversity Town Hall, which introduces the AES Diversity and Inclusion Committee. I’m a firm supporter of this, and wrote a recent blog entry about female pioneers in audio engineering. The AES aims to be fully inclusive, open and encouraging to all, but that’s not yet fully reflected in its activities and membership. So expect to see some exciting initiatives in this area coming soon.

In the 10:45 to 12:15 poster session, Steve Fenton will present Alternative Weighting Filters for Multi-Track Program Loudness Measurement. We’ve published a couple of papers (Loudness Measurement of Multitrack Audio Content Using Modifications of ITU-R BS.1770, and Partial loudness in multitrack mixing) showing that well-known loudness measures don’t correlate very well with perception when used on individual tracks within a multitrack mix, so it would be interesting to see what Steve and his co-author Hyunkook Lee found out. Perhaps all this research will lead to better loudness models and measures.

At 2 pm, Cleopatra Pike will present a discussion and analysis of Direct and Indirect Listening Test Methods. I’m often sceptical when someone draws strong conclusions from indirect methods like measuring EEGs and reaction times, so I’m curious what this study found and what recommendations they propose.

The 2:15 to 3:45 poster session will feature the work with probably the coolest name, Influence of Audience Noises on the Classical Music Perception on the Example of Anti-cough Candies Unwrapping Noise. And yes, it looks like a rigorous study, using an anechoic chamber to record the sounds of sweets being unwrapped, and the signal analysis is coupled with a survey to identify the most distracting sounds. It reminds me of the DFA faders paper from the last convention.

At 4:30, researchers from Fraunhofer and the Technical University of Ilmenau present Training on the Acoustical Identification of the Listening Position in a Virtual Environment. In a recent paper in the Journal of the AES, we found that training resulted in a huge difference between participant results in a discrimination task, yet listening tests often employ untrained listeners. This suggests that maybe we can hear a lot more than what studies suggest, we just don’t know how to listen and what to listen for.


If you were to spend only one day this year immersing yourself in frontier audio engineering research, this is the day to do it.

At 9 am, researchers from Harman will present part 1 of A Statistical Model that Predicts Listeners’ Preference Ratings of In-Ear Headphones. This was a massive study involving 30 headphone models and 71 listeners under carefully controlled conditions. Part 2, on Friday, focuses on development and validation of the model based on the listening tests. I’m looking forward to both, but puzzled as to why they weren’t put back-to-back in the schedule.

At 10 am, researchers from the Tokyo University of the Arts will present Frequency Bands Distribution for Virtual Source Widening in Binaural Synthesis, a technique which seems closely related to work we presented previously on Cross-adaptive Dynamic Spectral Panning.

From 10:45 to 12:15, our own Brecht De Man will be chairing and speaking in a Workshop on ‘New Developments in Listening Test Design.’ He’s quite a leader in this field, and has developed some great software that makes the set up, running and analysis of listening tests much simpler and still rigorous.

In the 11-12:30 poster session, Nick Jillings will present Automatic Masking Reduction in Balance Mixes Using Evolutionary Computing, which deals with a challenging problem in music production, and builds on the large amount of research we’ve done on Automatic Mixing.

At 11:45, researchers from McGill will present work on Simultaneous Audio Capture at Multiple Sample Rates and Formats. This helps address one of the challenges in perceptual evaluation of high resolution audio (and see the open access journal paper on this), ensuring that the same audio is used for different versions of the stimuli, with only variation in formats.

At 1:30, renowned audio researcher John Vanderkooy will present research on how a  loudspeaker can be used as the sensor for a high-performance infrasound microphone. In the same session at 2:30, researchers from Plextek will show how consumer headphones can be augmented to automatically perform hearing assessments. Should we expect a new audiometry product from them soon?

At 2 pm, our own Marco Martinez Ramirez will present Analysis and Prediction of the Audio Feature Space when Mixing Raw Recordings into Individual Stems, which applies machine learning to challenging music production problems. Immediately following this, Stephen Roessner discusses a Tempo Analysis of Billboard #1 Songs from 1955–2015, which builds partly on other work analysing hit songs to observe trends in music and production tastes.

At 3:45, there is a short talk on Evolving the Audio Equalizer. Audio equalization is a topic on which we’ve done quite a lot of research (see our review article, and a blog entry on the history of EQ). I’m not sure where the novelty is in the author’s approach though, since dynamic EQ has been around for a while, and there are plenty of harmonic processing tools.

At 4:15, there’s a presentation on Designing Sound and Creating Soundscapes for Still Images, an interesting and unusual bit of sound design.


Judging from the abstract, the short Tutorial on the Audibility of Loudspeaker Distortion at Bass Frequencies at 5:30 looks like it will be an excellent and easy to understand review, covering practice and theory, perception and metrics. In 15 minutes, I suppose it can only give a taster of what’s in the paper.

There’s a great session on perception from 1:30 to 4. At 2, perceptual evaluation expert Nick Zacharov gives a Comparison of Hedonic and Quality Rating Scales for Perceptual Evaluation. I think people often have a favorite evaluation method without knowing if its the best one for the test. We briefly looked at pairwise versus multistimuli tests in previous work, but it looks like Nick’s work is far more focused on comparing methodologies.

Immediately after that, researchers from the University of Surrey present Perceptual Evaluation of Source Separation for Remixing Music. Techniques for remixing audio via source separation is a hot topic, with lots of applications whenever the original unmixed sources are unavailable. This work will get to the heart of which approaches sound best.

The last talk in the session, at 3:30 is on The Bandwidth of Human Perception and its Implications for Pro Audio. Judging from the abstract, this is a big picture, almost philosophical discussion about what and how we hear, but with some definitive conclusions and proposals that could be useful for psychoacoustics researchers.


Grateful Dead fans will want to check out Bridging Fan Communities and Facilitating Access to Music Archives through Semantic Audio Applications in the 9 to 10:30 poster session, which is all about an application providing wonderful new experiences for interacting with the huge archives of live Grateful Dead performances.

At 11 o’clock, Alessia Milo, a researcher in our team with a background in architecture, will discuss Soundwalk Exploration with a Textile Sonic Map. We discussed her work in a recent blog entry on Aural Fabric.

In the 2 to 3:30 poster session, I really hope there will be a live demonstration accompanying the paper on Acoustic Levitation.

At 3 o’clock, Gopal Mathur will present an Active Acoustic Meta Material Loudspeaker System. Metamaterials are receiving a lot of deserved attention, and such advances in materials are expected to lead to innovative and superior headphones and loudspeakers in the near future.


The full program can be explored on the Convention Calendar or the Convention website. Come say hi to us if you’re there! Josh Reiss (author of this blog entry), Brecht De Man, Marco Martinez and Alessia Milo from the Audio Engineering research team within the Centre for Digital Music  will all be there.



Physically Derived Sound Synthesis Model of a Propeller

I recently presented my work on the real-time sound synthesis of a propeller at the 12th International Audio Mostly Conference in London. This sound effect is a continuation of my research into aeroacoustic sounds generated by physical models; an extension of my previous work on the Aeolian harp, sword sounds and Aeolian tones.

A demo video of the propeller model attached to an aircraft object in unity is given here. I use the Unity Doppler effect which I have since discovered is not the best and adds a high-pitched artefact but you’ll get the idea! The propeller physical model was implemented in Pure Data and transferred to Unity using the Heavy compiler.

So, when I was looking for an indication of the different sound sources in a propeller sound I found an excellent paper by JE Marte and DW Kurtz. (A review of aerodynamic noise from propellers, rotors, and lift fans. Jet Propulsion Laboratory, California Institute of Technology, 1970) This paper provides a breakdown of the different sound sources, replicated for you here.

The sounds are split into periodic and broadband groups. In the periodic sounds, there are rotational sounds associated with the forces on the blade and interaction and distortion effects. The first rotational sound is the Loading sounds. These are associated with the thrust and torque of each propeller blade.

To picture these forces, imagine you are sitting on an aircraft wing, looking down the span, travelling at a fixed speed and uniform air flowing over the aerofoil. From your point of view the wing will have a lift force associated with it and a drag force. Now if we change the aircraft wing to a propeller blade with similar profile to an aerofoil, spinning at a set RPM. If you are sitting at a point on the blade the thrust and torque will be constant at the point you are sat.

Now stepping off the propeller blade and examining the disk of rotation the thrust and torque forces will appear as pulses at the blade passing frequency. For example, a propeller with 2 blades, rotating at 2400 RPM will have a blade passing frequency of 80Hz. A similar propeller with 4 blades, rotating at the same RPM will have a blade passing frequency of 160Hz.

Thickness noise is the sound generated as the blade moves the air aside when passing. This sound is found to be small when blades are moving at the speed of sound, 343 m/s, (known as a speed of Mach 1), and is not considered in our model.

Interaction and distortion effects are associated with helicopter rotors and lift fans. Because these have horizontally rotating blades an effect called blade slap occurs, where the rotating blade passes through the vortices shed by the previous blade causing a large slapping sound. Horizontal blades also have AM and FM modulated signals related with them as well as other effects. Since we are looking at propellers that spin mostly vertically, we have omitted these effects.

The broadband sounds of the propeller are closely related to the Aeolian tone models I have spoken about previously. The vortex sounds are from the vortex shedding, identical to out sword model. This difference in this case is that a propeller has a set shape which more like an aerofoil than a cylinder.

In the Aeolian tone paper, published at AES, LA, 2016, it was found that for a cylinder the frequency can be determined by an equation defined by Strouhal. The ratio of the diameter, frequency and airspeed are related by the Strouhal number, found for a cylinder to be approximately 0.2. In the paper D Brown and JB Ollerhead, Propeller noise at low tip speeds. Technical report, DTIC Document, 1971, a Strouhal number of 0.85 was found for propellers. This was used in our model, along with the chord length of the propeller instead of the diameter.

We also include the wake sound in the Aeolian tone model which is similar to the turbulence sounds. These are only noticeable at high speeds.

The paper by Martz et. al. outlines a procedure by Hamilton Standard, a propeller manufacturer, for predicting the far field loading sounds. Along with the RPM, number of blades, distance, azimuth angle we need the blade diameter, and engine power. We first decided which aircraft we were going to model. This was determined by the fact that we wanted to carry out a perceptual test and had a limited number of clips of known aircraft.

We settled on a Hercules C130, Boeing B17 Flying Fortress, Tiger Moth, Yak-52, Cessna 340 and a P51 Mustang. The internet was searched for details like blade size, blade profile (to calculate chord lengths along the span of the blade), engine power, top speed and maximum RPM. This gave enough information for the models to be created in pure data and the sound effect to be as realistic as possible.

This enables us to calculate the loading sounds and broadband vortex sounds, adding in a Doppler effect for realism. What was missing is an engine sound – the aeroacoustic sounds will not happen in isolation in our model. To rectify this a model from Andy Farnell’s Designing Sound was modified to act as our engine sound.

A copy of the pure data software can be downloaded from this site, https://code.soundsoftware.ac.uk/hg/propeller-model. We performed listening tests on all the models, comparing them with an alternative synthesis model (SMS) and the real recordings we had. The tests highlighted that the real sounds are still the most plausible but our model performed as well as the alternative synthesis method. This is a great result considering the alternative method starts with a real recording of a propeller, analyses it and re-synthesizes it. Our model starts with real world physical parameters like the blade profile, engine power, distance and azimuth angles to produce the sound effect.

An example of the propeller sound effect is mixed into this famous scene from North by Northwest. As you can hear the effect still has some way to go to be as good as the original but this physical model is the first step in incorporating fluid dynamics of a propeller into the synthesis process.

From the editor: Check out all Rod’s videos at https://www.youtube.com/channel/UCIB4yxyZcndt06quMulIpsQ

A copy the paper published at Audio Mostly 2017 can be found here >> Propeller_AuthorsVersion

12th International Audio Mostly Conference, London 2017

by Rod Selfridge & David Moffat. Photos by Beici Liang.

Audio Mostly – Augmented and Participatory Sound and Music Experiences, was held at Queen Mary University of London between 23 – 26 August. The conference brought together a wide variety of audio and music designers, technologists, practitioners and enthusiasts from all over the world.

The opening day of the conference ran in parallel with the Web Audio Conference, also being held at Queen Mary, with sessions open to all delegates. The day opened with a joint Keynote from the computer scientist and author of the highly influential sound effect book – Designing Sound, Andy Farnell. Andy covered a number of topics and invited audience participation which grew into a discussion regarding intellectual property – the pros and cons if it was done away with.

Andy Farnell

The paper session then opened with an interesting talk by Luca Turchet from Queen Mary’s Centre for Digital Music. Luca presented his paper on The Hyper Mandolin, an augmented music instrument allowing real-time control of digital effects and sound generators. The session concluded with the second talk I’ve seen in as many months by Charles Martin. This time Charles presented Deep Models for Ensemble Touch-Screen Improvisation where an artificial neural network model has been used to implement a live performance and sniffed touch gestures of three virtual players.

In the afternoon, I got to present my paper, co-authored by David Moffat and Josh Reiss, on a Physically Derived Sound Synthesis Model of a Propeller. Here I continue the theme of my PhD by applying equations obtained through fluid dynamics research to generate authentic sound synthesis models.

Rod Selfridge

The final session of the day saw Geraint Wiggins, our former Head of School at EECS, Queen Mary, present Callum Goddard’s work on designing Computationally Creative Musical Performance Systems, looking at questions like what makes performance virtuosic and how this can be implemented using the Creative Systems Framework.

The oral sessions continued throughout Thursday, one presentation that I found interesting was by Anna Xambo titles Turn-Taking and Chatting in Collaborative Music Live Coding. In this research the authors explored collaborative music live coding using the live coding environment and pedagogical tool EarSketch, focusing on the benefits to both performance and education.

Thursday’s Keynote was by Goldsmith’s Rebecca Fiebrink, who was mentioned in a previous blog, discussing how machine learning can be used to support human creative experiences, aiding human designers for rapid prototyping and refinement of new interactions within sound and media.

Rebecca Fiebrink

The Gala Dinner and Boat Cruise was held on Thursday evening where all the delegates were taken on a boat up and down the Thames, seeing the sites and enjoying food and drink. Prizes were awarded and appreciation expressed to the excellent volunteers, technical teams, committee members and chairpersons who brought together the event.

Tower Bridge

A session on Sports Augmentation and Health / Safety Monitoring was held on Friday Morning which included a number of excellent talks. The presentation of the conference went to Tim Ryan who presented his paper on 2K-Reality: An Acoustic Sports Entertainment Augmentation for Pickup Basketball Play Spaces. Tim re-contextualises sounds appropriated from a National Basketball Association (NBA) video game to create interactive sonic experiences for players and spectators. I was lucky enough to have a play around with this system during a coffee break and can easily see how it could give an amazing experience for basketball enthusiasts, young and old, as well as drawing in a crowd to share.

Workshops ran on Friday afternoon. I went to Andy Farnell’s Zero to Hero Pure Data Workshop where participants managed to go from scratch to having a working bass drum, snare and high-hat synthesis models. Andy managed to illustrate how quickly these could be developed and included in a simple sequencer to give a basic drum machine.

Throughout the conference a number of fixed media, demos were available for delegates to view as well as poster sessions where authors presented their work.

Alessia Milo

Live music events were held on both Wednesday and Friday. A joint session titled Web Audio Mostly Concert was held on Wednesday which was a joint event for delegates of Audio Mostly and the Web Audio Conference. This included an augmented reality musical performance, a human-playable robotic zither, the Hyper Mandolin and DJs.

The Audio Mostly Concert on the Friday included a Transmusicking performance from a laptop orchestra from around the world, where 14 different performers collaborated online. The performance was curated by Anna Xambo. Alan Chamberlain and David De Roure performed The Gift of the Algorithm, which was a computer music performance inspired by Ada Lovelace. The wood and the water was an immersive performance of interactivity and gestural control of both a Harp and lighting for the performance, by Balandino Di Donato and Eleanor Turner. GrainField, by Benjamin Matuszewski and Norbert Schnell, was an interactive audio performance that demanded entire audience involvement, for the performance to exist, this collective improvisational piece demonstrated a how digital technology can really be used to augment the traditional musical experience. GrainField was awarded the prize for the best musical performance.

Adib Mehrabi

The final day of the conference was a full day’s workshop. I attended the one titled Designing Sounds in the Cloud. The morning was spent presenting two ongoing European Horizon 2020 projects, Audio Commons (www.audiocommons.org/) and Rapid-Mix. The Audio Commons initiative aims to promote the use of open audio content by providing a digital ecosystem that connects content providers and creative end users. The Rapid-Mix project focuses on multimodal and procedural interactions leveraging on rich sensing capabilities, machine learning and embodied ways to interact with sound.

Before lunch we took part in a sound walk around the Queen Mary Mile End Campus, with one of each group blindfolded, informing the other what they could hear. The afternoon session had teams of participants designing and prototyping new ways to use the APIs from each of the two Horizon 2020 projects – very much in the feel of a hackathon. We devised a system which captured expressive Italian hand gestures using the Leap Motion and classified them using machine learning techniques. Then in pure data each new classification triggered a sound effect taken from the Freesound website (part of the audio commons project). If time would have allowed the project would have been extended to have pure data link to the audio commons API and play sound effects straight from the web.

Overall, I found the conference informative, yet informal, enjoyable and inclusive. The social events were spectacular and ones that will be remembered by delegates for a long time.

AES Berlin 2017: Keynotes from the technical program


The 142nd AES Convention was held last month in the creative heart of Berlin. The four-day program and its more than 2000 attendees covered several workshops, tutorials, technical tours and special events, all related to the latest trends and developments in audio research. But as much as scale, it’s attention to detail that makes AES special. There’s an emphasis on the research side of audio topics as much as the side of panels of experts discussing a range of provocative and practical topics.

It can be said that 3D Audio: Recording and Reproduction, Binaural Listening and Audio for VR were the most popular topics among workshops, tutorial, papers and engineering briefs. However, a significant portion of the program was also devoted to common audio topics such as digital filter design, live audio, loudspeaker design, recording, audio encoding, microphones, and music production techniques just to name a few.

For this reason, here at the Audio Engineering research team within C4DM, we bring you what we believe were the highlights, the key talks or the most relevant topics that took place during the convention.

The future of mastering

What better way to start AES than with a mastering experts’ workshop discussing about the future of the field?  Jonathan Wyner (iZotope) introduced us to the current challenges that this discipline faces.  This related to the demographic, economic and target formatting issues that are constantly evolving and changing due to advances in the music technology industry and its consumers.

When discussing the future of mastering, the panel was reluctant to a fully automated future. But pointed out that the main challenge of assistive tools is to understand artistry intentions and genre-based decisions without the need of the expert knowledge of the mastering engineer. Concluding that research efforts should go towards the development of an intelligent assistant, able to function as an smart preset that provides master engineers a starting point.

Virtual analog modeling of dynamic range compression systems

This paper described a method to digitally model an analogue dynamic range compression. Based on the analysis of processed and unprocessed audio waveforms, a generic model of dynamic range compression is proposed and its parameters are derived from iterative optimization techniques.

Audio samples were reproduced and the quality of the audio produced by the digital model was demonstrated. However, it should be noted that the parameters of the digital compressor can not be changed, thus, this could be an interesting future work path, as well as the inclusion of other audio effects such as equalizers or delay lines.

Evaluation of alternative audio mixing interfaces

In the paperFormal Usability Evaluation of Audio Track Widget Graphical Representation for Two-Dimensional Stage Audio Mixing Interface‘  an evaluation of different graphical track visualization styles is proposed. Multitrack visualizations included text only, different colour conventions for circles containing text or icons related to the type of instruments, circles with opacity assigned to audio features and also a traditional channel strip mixing interface.

Efficiency was tested and it was concluded that subjects preferred instrument icons as well as the traditional mixing interface. In this way, taking into account several works and proposals on alternative mixing interfaces (2D and 3D), there is still a lot of scope to explore on how to build an intuitive, efficient and simple interface capable of replacing the good known channel strip.

Perceptually motivated filter design with application to loudspeaker-room equalization

This tutorial, was based on the engineering briefQuantization Noise of Warped and Parallel Filters Using Floating Point Arithmetic’  where warped parallel filters are proposed, which aim to have the frequency resolution of the human ear.

Thus, via Matlab, we explored various approaches for achieving this goal, including warped FIR and IIR, Kautz, and fixed-pole parallel filters. Providing in this way a very useful tool that can be used for various applications such as room EQ, physical modelling synthesis and perhaps to improve existing intelligent music production systems.

Source Separation in Action: Demixing the Beatles at the Hollywood Bowl

Abbey Road’s James Clarke presented a great poster with the actual algorithm that was used for the remixed, remastered and expanded version of The Beatles’ album Live at the Hollywood Bowl. The method achieved to isolate the crowd noise, allowing to separate into clean tracks everything that Paul McCartney, John Lennon, Ringo Starr and George Harrison played live in 1964.

The results speak for themselves (audio comparison). Thus, based on a Non-negative Matrix Factorization (NMF) algorithm, this work provides a great research tool for source separation and reverse-engineer of mixes.

Other keynotes worth to mention:

Close Miking Empirical Practice Verification: A Source Separation Approach

Analysis of the Subgrouping Practices of Professional Mix Engineers

New Developments in Listening Test Design

Data-Driven Granular Synthesis

A Study on Audio Signal Processed by “Instant Mastering” Services

The rest of the paper proceedings are available in the AES E-library.

Applause, applause! (thank you, thank you. You’re too kind)

“You must be prepared to work always without applause.”
―  Ernest Hemingway, By-line

In a recent blog entry , we discussed research into the sound of screams. Its one of those everyday sounds that we are particularly attuned to, but that there hasn’t been much research on. This got me thinking about what are some other under-researched sounds. Applause certainly fits. We all know when we hear it, and a quick search of famous quotes reveals that there are many ways to describe the many types of applause; thunderous applause, tumultuous applause, a smattering of applause, sarcastic applause, and of course, the dreaded slow hand clap. But from an auditory perspective, what makes it special?

Applause is nothing more than the sound of many people gathered in one place clapping their hands. Clapping your hands together is one of the simplest ways in which we can approximate an impulse, or short broadband sound, without the need for any equipment. Impulsive sounds are used for rhythm, for tagging important moments on a timeline, or for estimating the acoustic properties of a room. clappers and clapsticks are musical instruments, typically consisting of two pieces of wood that are clapped together to produce percussive sounds. In film and television, clapperboards have widespread use. The clapperboard produces a sharp clap noise that can be easily identified on the audio track, and the shutting of the clapstick at the top of the board can similarly be identified on the visual track. Thus, they are effective used to synchronising sound and picture, as well as to designate the starts of scenes or takes during production. And in acoustic measurement, if one can produce an impulsive sound at a given location and record the result, one can get an idea of the reverberation that the room will apply to any sound produced from that location.

But a hand clap is a crude approximation for an impulse. Hand claps do not have completely flat impulse responses, are not completely omnidirectional, have significant duration and are not very high energy. Seetharaman and colleagues investigated the effectiveness of hand claps as impulse sources. They found that, with a small amount of additional but automated signal processing, the claps can produce reliable acoustical measurements.
Hanahara, Tada and Muroi exploited the impulse-like nature of hand claps for devising a means of Human-Robot Communication. The hand claps and their timing are relatively easy for a robot to decode, and not that difficult for a human to encode. But why the authors completely dismissed Morse code and all other simple forms of binary encoding is beyond me. And as voice recognition and related technologies continue to advance, the need for hand clap-based communication diminishes.
So what does a single hand clap sound like? This whole field of applause and clapping studies originated with a well-cited 1987 study by Bruno Repp, “The sound of two hands clapping.” He distinguished 8 hand clap positions;
Hands parallel and flat
P1: palm-to-palm
P2: halfway between P1 and P3
P3: fingers-to-palm

Hands held at an angle
A1: palm-to-palm
A2: halfway between P1 and P3
A3: fingers-to-palm
A1+: A1 with hands very cupped
A1-: A1 with hands fully flat

The figure below shows photos of these eight configurations of hand claps, excerpted from Leevi Peltola’s 2004 MSc thesis.

clap positions.png

Repp’s acoustic analyses and perceptual experiments mainly involved 20 test subjects who were each asked to clap at their normal rate for 10 seconds in a quiet room. The spectra of individual claps varied widely, but there was no evidence of influence of sex or hand size on the clap spectrum. He also measured his own clapping with the eight modes above. If the palms struck each other (P1, A1) there was a narrow frequency peak below 1 kHz together with a notch around 2.5 kHz. If the fingers of one hand struck the palm of the other hand (P3, A3) there was a broad spectral peak near 2 kHz.

Repp then tried to determine whether the subjects were able to extract information about the clapper from listening to the signal. Subjects generally assumed that slow, loud and low-pitched hand claps were from male clappers, and fast, soft and high-pitched hand claps were from female clappers. But this was not the case. The speed, intensity and pitch were uncorrelated with sex and thus it seemed that test subjects could correctly identify genre only slightly better than chance. Perceived differences were attributed mainly to hand configurations rather than hand size.

So much for individuals clapping, but what about applause. That’s when some interesting physics comes into play. Neda and colleagues recorded applause from several theatre and opera performances. They observed that the applause begins with incoherent random clapping, but then synchronization and periodic behaviour develops after a few seconds. This transition can be quite sudden and very strong, and is an unusual example of self-organization in a large coupled system. Neda gives quite a clear explanation of what is happening, and why.

Here’s a nice video of the phenomenon.

The fact that sonic aspects of hand claps can differ so significantly, and can often be identified by listeners, suggests that it may be possible to tell a lot about the source by signal analysis. Such was the case in work by Jylhä and colleagues, who proposed methods to identify a person by their hand claps, or identify the configuration (à  la Repp’s study) of the hand clap. Christian Uhle looked at the more general question of identifying applause in an audio stream.

Understanding of applause, beyond the synchronization phenomenon observed by Neda, is quite useful for encoding applause signals which so often accompany musical recordings- especially those recordings that are considered worth redistributing! And the important spatial and temporal aspects of applause signals are known to make then particularly tricky signals to encode and decode. As noted in research by Adami and colleagues, the more standard perceptual features like pitch or loudness do not do a good job of characterising grainy sound textures like applause. They introduced a new feature, applause density, which is loosely related to the overall clapping rate, but derived from perceptual experiments. Just a month before this blog entry, Adami and co-authors published a follow-up paper which used density and other characteristics to investigate the realism of upmixed (mono to stereo) applause signals. In fact, talking with one of the co-authors was a motivation for me to write this entry.

Upmixing is an important problem in its own right. But the placement and processing of sounds for a stereo or multichannel environment can be considered part of the general problem of sound synthesis. Synthesis of clapping and applause sounds was covered in detail, and to great effect, by Peltola and co-authors. They presented physics-based analysis, synthesis, and control systems capable of both producing individual hand-claps, or mimicking the applause of a group of clappers. The synthesis models were derived from experimental measurements and built both on the work of Repp and of Neda. Researchers here in the Centre for Digital Music’s Audio Engineering research team are trying to build on their work, creating a synthesis system that could incorporate cheering and other aspects of an appreciative crowd. More on that soon, hopefully.

“I think that’s just how the world will come to an end: to general applause from wits who believe it’s a joke.”
― Søren Kierkegaard, Either/Or, Part I

And for those who might be interested, here’s a short bibliography of applause and hand-clapping references;

1. Adami, A., Disch, S., Steba, G., & Herre, J. ‘Assessing Applause Density Perception Using Synthesized Layered Applause Signals,’ 19th International Conference on Digital Audio Effects (DAFx-16), Brno, Czech Republic, 2016
2. Adami, A.; Brand, L.; Herre, J., ‘Investigations Towards Plausible Blind Upmixing of Applause Signals,’ 142nd AES Convention, May 2017
3. W. Ahmad, AM Kondoz, Analysis and Synthesis of Hand Clapping Sounds Based on Adaptive Dictionary. ICMC, 2011
4. K. Hanahara, Y. Tada, and T. Muroi, “Human-robot communication by means of hand-clapping (preliminary experiment with hand-clapping language),” IEEE Int. Conf. on Systems, Man and Cybernetics(ISIC-2007),Oct2007,pp.2995–3000.
5. Farner, Snorre; Solvang, Audun; Sæbo, Asbjørn; Svensson, U. Peter ‘Ensemble Hand-Clapping Experiments under the Influence of Delay and Various Acoustic Environments’, Journal of the Audio Engineering Society, Volume 57 Issue 12 pp. 1028-1041; December 2009
6. A. Jylhä and C. Erkut, “Inferring the Hand Configuration from Hand Clapping Sounds,” 11th International Conference on Digital Audio Effects (DAFx-08), Espoo, Finland, 2008.
7. Jylhä, Antti; Erkut, Cumhur; Simsekli, Umut; Cemgil, A. Taylan ‘Sonic Handprints: Person Identification with Hand Clapping Sounds by a Model-Based Method’, AES 45th Conference, March 2012
8. Kawahara, Kazuhiko; Kamamoto, Yutaka; Omoto, Akira; Moriya, Takehiro ‘Evaluation of the Low-Delay Coding of Applause and Hand-Clapping Sounds Caused by Music Appreciation’ 138th AES Convention, May 2015.
9. Kawahara, Kazuhiko; Fujimori, Akiho; Kamamoto, Yutaka; Omoto, Akira; Moriya, Takehiro Implementation and Demonstration of Applause and Hand-Clapping Feedback System for Live Viewing,’ 141st AES Convention, September 2016.
10. Laitinen, Mikko-Ville; Kuech, Fabrian; Disch, Sascha; Pulkki, ‘Ville Reproducing Applause-Type Signals with Directional Audio Coding,’ Journal of the Audio Engineering Society, Volume 59 Issue 1/2 pp. 29-43; January 2011
11. Z. Néda, E. Ravasz, T. Vicsek, Y. Brechet, and A.-L. Barabási, “Physics of the rhythmic applause,” Phys. Rev. E, vol. 61, no. 6, pp. 6987–6992, 2000.
12. Z. Néda, E. Ravasz, Y. Brechet, T. Vicsek, and A.-L. Barabási, “The sound of many hands clapping: Tumultuous applause can transform itself into waves of synchronized clapping,” Nature, vol. 403, pp. 849–850, 2000.
13. Z. Néda, A. Nikitin, and T. Vicsek. ‘Synchronization of two-mode stochastic oscillators: a new model for rythmic applause an much more,’ Physica A: Statistical Mechanics and its Applications, 321:238–247, 2003.
14. L. Peltola, C. Erkut, P. R. Cook, and V. Välimäki, “Synthesis of Hand Clapping Sounds,”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 3, pp. 1021– 1029, 2007.
15. B. H. Repp. ‘The sound of two hands clapping: an exploratory study,’ J. of the Acoustical Society of America, 81:1100–1109, April 1987.
16. P. Seetharaman, S. P. Tarzia, ‘The Hand Clap as an Impulse Source for Measuring Room Acoustics,’ 132nd AES Convention, April 2012.
17. Uhle, C. ‘Applause Sound Detection’ , Journal of the Audio Engineering Society, Volume 59 Issue 4 pp. 213-224, April 2011

Cool stuff at the Audio Engineering Society Convention in Berlin

aesberlin17_IDS_headerThe next Audio Engineering Society convention is just around the corner, May 20-23 in Berlin. This is an event where we always have a big presence. After all, this blog is brought to you by the Audio Engineering research team within the Centre for Digital Music, so its a natural fit for a lot of what we do.

These conventions are quite big, with thousands of attendees, but not so big that you get lost or overwhelmed. The attendees fit loosely into five categories: the companies, the professionals and practitioners, students, enthusiasts, and the researchers. That last category is where we fit.

I thought I’d give you an idea of some of the highlights of the Convention. These are some of the events that we will be involved in or just attending, but of course, there’s plenty else going on.

On Saturday May 20th, 9:30-12:30, Dave Ronan from the team here will be presenting a poster on ‘Analysis of the Subgrouping Practices of Professional Mix Engineers.’ Subgrouping is a greatly understudied, but important part of the mixing process. Dave surveyed 10 award winning mix engineers to find out how and why they do subgrouping. He then subjected the results to detailed thematic analysis to uncover best practices and insights into the topic.

2:45-4:15 pm there is a workshop on ‘Perception of Temporal Response and Resolution in Time Domain.’ Last year we published an article in the Journal of the Audio Engineering Society  on ‘A meta-analysis of high resolution audio perceptual evaluation.’ There’s a blog entry about it too. The research showed very strong evidence that people can hear a difference between high resolution audio and standard, CD quality audio. But this brings up the question of why? Many people have suggested that the fine temporal resolution of oversampled audio might be perceived. I expect that this Workshop will shed some light on this as yet unresolved question.

Overlapping that workshop, there are some interesting posters from 3 to 6 pm. ‘Mathematical Model of the Acoustic Signal Generated by the Combustion Engine‘ is about synthesis of engine sounds, specifically for electric motorbikes. We are doing a lot of sound synthesis research here, and so are always on the lookout for new approaches and new models. ‘A Study on Audio Signal Processed by “Instant Mastering” Services‘ investigates the effects applied to ten songs by various online, automatic mastering platforms. One of those platforms, LandR, was a high tech spin-out from our research a few years ago, so we’ll be very interested in what they found.

For those willing to get up bright and early Sunday morning, there’s a 9 am panel on ‘Audio Education—What Does the Future Hold,’ where I will be one of the panellists. It should have some pretty lively discussion.

Then there’s some interesting posters from 9:30 to 12:30. We’ve done a lot of work on new interfaces for audio mixing, so will be quite interested in ‘The Mixing Glove and Leap Motion Controller: Exploratory Research and Development of Gesture Controllers for Audio Mixing.’ And returning to the subject of high resolution audio, there is ‘Discussion on Subjective Characteristics of High Resolution Audio,’ by Mitsunori Mizumachi. Mitsunori was kind enough to give me details about his data and experiments in hi-res audio, which I then used in the meta-analysis paper. He’ll also be looking at what factors affect high resolution audio perception.

From 10:45 to 12:15, our own Brecht De Man will be chairing and speaking in a Workshop on ‘New Developments in Listening Test Design.’ He’s quite a leader in this field, and has developed some great software that makes the set up, running and analysis of listening tests much simpler and still rigorous.

From 1 to 2 pm, there is the meeting of the Technical Committee on High Resolution Audio, of which I am co-chair along with Vicki Melchior. The Technical Committee aims for comprehensive understanding of high resolution audio technology in all its aspects. The meeting is open to all, so for those at the Convention, feel free to stop by.

Sunday evening at 6:30 is the Heyser lecture. This is quite prestigious, a big talk by one of the eminent people in the field. This one is given by Jorg Sennheiser of, well, Sennheiser Electronic.

Monday morning 10:45-12:15, there’s a tutorial on ‘Developing Novel Audio Algorithms and Plugins – Moving Quickly from Ideas to Real-time Prototypes,’ given by Mathworks, the company behind Matlab. They have a great new toolbox for audio plugin development, which should make life a bit simpler for all those students and researchers who know Matlab well and want to demo their work in an audio workstation.

Again in the mixing interface department, we look forward to hearing about ‘Formal Usability Evaluation of Audio Track Widget Graphical Representation for Two-Dimensional Stage Audio Mixing Interface‘ on Tuesday, 11-11:30. The authors gave us a taste of this work at the Workshop on Intelligent Music Production which our group hosted last September.

In the same session – which is all about ‘Recording and Live Sound‘ so very close to home – a new approach to acoustic feedback suppression is discussed in ‘Using a Speech Codec to Suppress Howling in Public Address Systems‘, 12-12:30. With several past projects on gain optimization for live sound, we are curious to hear (or not hear) the results!

The full program can be explored on the AES Convention planner or the Convention website. Come say hi to us if you’re there!



Sound Synthesis of an Aeolian Harp


Synthesising the Aeolian harp is part of a project into synthesising sounds that fall into a class called aeroacoustics. The synthesis model operates in real-time and is based on the physics that generate the sounds in nature. 

The Aeolian harp is an instrument that is played by the wind. It is believed to date back to ancient Greece; legend states that King David hung a harp in the tree to hear it being played by the wind. They became popular in Europe in the romantic period and Aeolian harps can be designed as garden ornaments, part of sculptures or large scale sound installations.  

The sound created by Aeolian harp has often been described as meditative and inspiring. A poem by Ralph Emerson describes it as follows:
Keep your lips or finger-tips
For flute or spinet’s dancing chips; 
I await a tenderer touch
I ask more or not so much:

Give me to the atmosphere.


The harp in the picture is taken from Professor Henry Gurr’s website. This has an excellent review of the principles behind design and operation of Aeolian harps. 
Basic Principles

As air flows past a cylinder vortices are shed at a frequency that is proportional to the cylinder diameter and speed of the air. This has been discussed in the previous blog entry on Aeolian tones. We now think of the cylinders as a string, like that of a harp, guitar, violin, etc. When a string of one of these instruments is plucked it vibrates at it’s natural frequency. The natural frequency is proportional to the tension, length and mass of the string.  

Instead of a pluck or a bow exciting a string, in an Aeolian harp it is the vortex shedding that stimulates the strings. When the frequency of the vortex shedding is in the region of the natural vibration frequency of the string, or one of it’s harmonics, a phenomenon known as lock-in occurs. While in lock-in the string starts to vibrate at the relevant harmonic frequency. For a range of airspeed the string vibration is the dominant factor that dictates the frequency of the vortex shedding; changing the air speed does not change the frequency of vortex shedding, hence the process is locked-in. 

While in lock-in a FM type acoustic output is generated giving the harp its unique sound, described by the poet Samuel Coleridge as a “soft floating witchery of sound”.
Our Model 

As with the Aeolian tone model we calculate the frequency of vortex shedding for a given string dimensions and airspeed. We also calculate the fundamental natural vibrational frequency and harmonics of a string given its properties. 

There is a specific area of airspeed that leads to string vibration and vortex shedding locking in. This is calculated and the specific frequencies for the FM acoustic signal generated. There is a hysteresis effect on the vibration amplitude based on the increase and decrease of the airspeed which is also implemented. 

 A used interface is provided that allows a user to select up to 13 strings, adjusting their length, diameter, tension, mass and the amount of damping (which reduces the vibration effects as the harmonic number increases). This interface is shown below which includes presets of an number of different string and wind configurations. 

A copy of the pure data patch can be downloaded here. The video below was made to give an overview of the principles, sounds generated and variety of Aeolian harp constructions.