The AES Semantic Audio Conference

Last week saw the 2017 International Conference on Semantic Audio by the Audio Engineering Society. Held at Fraunhofer Institute for Integrated Circuits in Erlangen, Germany, delegates enjoyed a well-organised and high-quality programme, interleaved with social and networking events such as a jazz concert and a visit to Erlangen’s famous beer cellars. The conference was a combined effort of Fraunhofer IIS, Friedrich-Alexander Universität, and their joint venture Audio Labs.

As the topic is of great relevance to our team, Brecht De Man and Adán Benito attended and presented their work there. With 5 papers and a late-breaking demo, the Centre for Digital Music in general was the most strongly represented institution, surpassing even the hosting organisations.

benito-reverb

Benito’s intelligent multitrack reverberation architecture

Adán Benito presented “Intelligent Multitrack Reverberation Based on Hinge-Loss Markov Random Fields“,  machine learning approach to automatic application of a reverb effect to musical audio.

Brecht De Man demoed the “Mix Evaluation Browser“, an online interface to access a dataset comprising several mixes of a number of songs, complete with corresponding DAW files, raw tracks, preference ratings, and annotated comments from subjective listening tests.

MixEvaluationBrowser

The Mix Evaluation Browser: an interface to visualise De Man’s dataset of raw tracks, mixes, and subjective evaluation results.

Still from the Centre for Digital Music, Delia Fano Yela delivered a beautifully hand-drawn and compelling presentation about source separation in general and how temporal context can be employed to considerably improve vocal extraction.

Rodrigo Schramm and Emmanouil Benetos won the Best Paper award for their paper “Automatic Transcription of a Cappella recordings from Multiple Singers”.

Emmanouil further presented another paper, “Polyphonic Note and Instrument Tracking Using Linear Dynamical Systems”, and coauthored “Assessing the Relevance of Onset Information for Note Tracking in Piano Music Transcription”.

 

Several other delegates were frequent collaborators or previously affiliated with Queen Mary. The opening keynote was delivered by Mark Plumbley, former director of the Centre for Digital Music, who gave an overview of the field of machine listening, specifically audio event detection and scene recognition. Nick Jillings, formerly research assistant and master project student at the Audio Engineering group, and currently a PhD student at Birmingham City University cosupervised by Josh Reiss, head of our Audio Engineering group, presented his paper “Investigating Music Production Using a Semantically Powered Digital Audio Workstation in the Browser” and demoed “Automatic channel routing using musical instrument linked data”.

Other keynotes were delivered by Udo Zölzer, best known from editing the collection “DAFX: Digital Audio Effects”, and Masataka Goto, a household name in the MIR community who discussed his own web-based implementations of music discovery and visualisation.

Paper proceedings are already available in the AES E-library, free for AES members.

Advertisements

Applause, applause! (thank you, thank you. You’re too kind)

“You must be prepared to work always without applause.”
―  Ernest Hemingway, By-line

In a recent blog entry , we discussed research into the sound of screams. Its one of those everyday sounds that we are particularly attuned to, but that there hasn’t been much research on. This got me thinking about what are some other under-researched sounds. Applause certainly fits. We all know when we hear it, and a quick search of famous quotes reveals that there are many ways to describe the many types of applause; thunderous applause, tumultuous applause, a smattering of applause, sarcastic applause, and of course, the dreaded slow hand clap. But from an auditory perspective, what makes it special?

Applause is nothing more than the sound of many people gathered in one place clapping their hands. Clapping your hands together is one of the simplest ways in which we can approximate an impulse, or short broadband sound, without the need for any equipment. Impulsive sounds are used for rhythm, for tagging important moments on a timeline, or for estimating the acoustic properties of a room. clappers and clapsticks are musical instruments, typically consisting of two pieces of wood that are clapped together to produce percussive sounds. In film and television, clapperboards have widespread use. The clapperboard produces a sharp clap noise that can be easily identified on the audio track, and the shutting of the clapstick at the top of the board can similarly be identified on the visual track. Thus, they are effective used to synchronising sound and picture, as well as to designate the starts of scenes or takes during production. And in acoustic measurement, if one can produce an impulsive sound at a given location and record the result, one can get an idea of the reverberation that the room will apply to any sound produced from that location.

Murri_artefacts_clapsticksclapstick
But a hand clap is a crude approximation for an impulse. Hand claps do not have completely flat impulse responses, are not completely omnidirectional, have significant duration and are not very high energy. Seetharaman and colleagues investigated the effectiveness of hand claps as impulse sources. They found that, with a small amount of additional but automated signal processing, the claps can produce reliable acoustical measurements.
Hanahara, Tada and Muroi exploited the impulse-like nature of hand claps for devising a means of Human-Robot Communication. The hand claps and their timing are relatively easy for a robot to decode, and not that difficult for a human to encode. But why the authors completely dismissed Morse code and all other simple forms of binary encoding is beyond me. And as voice recognition and related technologies continue to advance, the need for hand clap-based communication diminishes.
So what does a single hand clap sound like? This whole field of applause and clapping studies originated with a well-cited 1987 study by Bruno Repp, “The sound of two hands clapping.” He distinguished 8 hand clap positions;
Hands parallel and flat
P1: palm-to-palm
P2: halfway between P1 and P3
P3: fingers-to-palm

Hands held at an angle
A1: palm-to-palm
A2: halfway between P1 and P3
A3: fingers-to-palm
A1+: A1 with hands very cupped
A1-: A1 with hands fully flat

The figure below shows photos of these eight configurations of hand claps, excerpted from Leevi Peltola’s 2004 MSc thesis.

clap positions.png

Repp’s acoustic analyses and perceptual experiments mainly involved 20 test subjects who were each asked to clap at their normal rate for 10 seconds in a quiet room. The spectra of individual claps varied widely, but there was no evidence of influence of sex or hand size on the clap spectrum. He also measured his own clapping with the eight modes above. If the palms struck each other (P1, A1) there was a narrow frequency peak below 1 kHz together with a notch around 2.5 kHz. If the fingers of one hand struck the palm of the other hand (P3, A3) there was a broad spectral peak near 2 kHz.

Repp then tried to determine whether the subjects were able to extract information about the clapper from listening to the signal. Subjects generally assumed that slow, loud and low-pitched hand claps were from male clappers, and fast, soft and high-pitched hand claps were from female clappers. But this was not the case. The speed, intensity and pitch were uncorrelated with sex and thus it seemed that test subjects could correctly identify genre only slightly better than chance. Perceived differences were attributed mainly to hand configurations rather than hand size.

So much for individuals clapping, but what about applause. That’s when some interesting physics comes into play. Neda and colleagues recorded applause from several theatre and opera performances. They observed that the applause begins with incoherent random clapping, but then synchronization and periodic behaviour develops after a few seconds. This transition can be quite sudden and very strong, and is an unusual example of self-organization in a large coupled system. Neda gives quite a clear explanation of what is happening, and why.

Here’s a nice video of the phenomenon.

The fact that sonic aspects of hand claps can differ so significantly, and can often be identified by listeners, suggests that it may be possible to tell a lot about the source by signal analysis. Such was the case in work by Jylhä and colleagues, who proposed methods to identify a person by their hand claps, or identify the configuration (à  la Repp’s study) of the hand clap. Christian Uhle looked at the more general question of identifying applause in an audio stream.

Understanding of applause, beyond the synchronization phenomenon observed by Neda, is quite useful for encoding applause signals which so often accompany musical recordings- especially those recordings that are considered worth redistributing! And the important spatial and temporal aspects of applause signals are known to make then particularly tricky signals to encode and decode. As noted in research by Adami and colleagues, the more standard perceptual features like pitch or loudness do not do a good job of characterising grainy sound textures like applause. They introduced a new feature, applause density, which is loosely related to the overall clapping rate, but derived from perceptual experiments. Just a month before this blog entry, Adami and co-authors published a follow-up paper which used density and other characteristics to investigate the realism of upmixed (mono to stereo) applause signals. In fact, talking with one of the co-authors was a motivation for me to write this entry.

Upmixing is an important problem in its own right. But the placement and processing of sounds for a stereo or multichannel environment can be considered part of the general problem of sound synthesis. Synthesis of clapping and applause sounds was covered in detail, and to great effect, by Peltola and co-authors. They presented physics-based analysis, synthesis, and control systems capable of both producing individual hand-claps, or mimicking the applause of a group of clappers. The synthesis models were derived from experimental measurements and built both on the work of Repp and of Neda. Researchers here in the Centre for Digital Music’s Audio Engineering research team are trying to build on their work, creating a synthesis system that could incorporate cheering and other aspects of an appreciative crowd. More on that soon, hopefully.

“I think that’s just how the world will come to an end: to general applause from wits who believe it’s a joke.”
― Søren Kierkegaard, Either/Or, Part I

And for those who might be interested, here’s a short bibliography of applause and hand-clapping references;

1. Adami, A., Disch, S., Steba, G., & Herre, J. ‘Assessing Applause Density Perception Using Synthesized Layered Applause Signals,’ 19th International Conference on Digital Audio Effects (DAFx-16), Brno, Czech Republic, 2016
2. Adami, A.; Brand, L.; Herre, J., ‘Investigations Towards Plausible Blind Upmixing of Applause Signals,’ 142nd AES Convention, May 2017
3. W. Ahmad, AM Kondoz, Analysis and Synthesis of Hand Clapping Sounds Based on Adaptive Dictionary. ICMC, 2011
4. K. Hanahara, Y. Tada, and T. Muroi, “Human-robot communication by means of hand-clapping (preliminary experiment with hand-clapping language),” IEEE Int. Conf. on Systems, Man and Cybernetics(ISIC-2007),Oct2007,pp.2995–3000.
5. Farner, Snorre; Solvang, Audun; Sæbo, Asbjørn; Svensson, U. Peter ‘Ensemble Hand-Clapping Experiments under the Influence of Delay and Various Acoustic Environments’, Journal of the Audio Engineering Society, Volume 57 Issue 12 pp. 1028-1041; December 2009
6. A. Jylhä and C. Erkut, “Inferring the Hand Configuration from Hand Clapping Sounds,” 11th International Conference on Digital Audio Effects (DAFx-08), Espoo, Finland, 2008.
7. Jylhä, Antti; Erkut, Cumhur; Simsekli, Umut; Cemgil, A. Taylan ‘Sonic Handprints: Person Identification with Hand Clapping Sounds by a Model-Based Method’, AES 45th Conference, March 2012
8. Kawahara, Kazuhiko; Kamamoto, Yutaka; Omoto, Akira; Moriya, Takehiro ‘Evaluation of the Low-Delay Coding of Applause and Hand-Clapping Sounds Caused by Music Appreciation’ 138th AES Convention, May 2015.
9. Kawahara, Kazuhiko; Fujimori, Akiho; Kamamoto, Yutaka; Omoto, Akira; Moriya, Takehiro Implementation and Demonstration of Applause and Hand-Clapping Feedback System for Live Viewing,’ 141st AES Convention, September 2016.
10. Laitinen, Mikko-Ville; Kuech, Fabrian; Disch, Sascha; Pulkki, ‘Ville Reproducing Applause-Type Signals with Directional Audio Coding,’ Journal of the Audio Engineering Society, Volume 59 Issue 1/2 pp. 29-43; January 2011
11. Z. Néda, E. Ravasz, T. Vicsek, Y. Brechet, and A.-L. Barabási, “Physics of the rhythmic applause,” Phys. Rev. E, vol. 61, no. 6, pp. 6987–6992, 2000.
12. Z. Néda, E. Ravasz, Y. Brechet, T. Vicsek, and A.-L. Barabási, “The sound of many hands clapping: Tumultuous applause can transform itself into waves of synchronized clapping,” Nature, vol. 403, pp. 849–850, 2000.
13. Z. Néda, A. Nikitin, and T. Vicsek. ‘Synchronization of two-mode stochastic oscillators: a new model for rythmic applause an much more,’ Physica A: Statistical Mechanics and its Applications, 321:238–247, 2003.
14. L. Peltola, C. Erkut, P. R. Cook, and V. Välimäki, “Synthesis of Hand Clapping Sounds,”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 3, pp. 1021– 1029, 2007.
15. B. H. Repp. ‘The sound of two hands clapping: an exploratory study,’ J. of the Acoustical Society of America, 81:1100–1109, April 1987.
16. P. Seetharaman, S. P. Tarzia, ‘The Hand Clap as an Impulse Source for Measuring Room Acoustics,’ 132nd AES Convention, April 2012.
17. Uhle, C. ‘Applause Sound Detection’ , Journal of the Audio Engineering Society, Volume 59 Issue 4 pp. 213-224, April 2011