The future of headphones

headphonememe

Headphones have been around for over a hundred years, but recently there has been a surge in new technologies, spurred on in part by the explosive popularity of Beats headphones. In this blog, we will look at three advances in headphones arising from high tech start-ups. I’ve been introduced to each of these companies recently, but don’t have any affiliation with them.

EAVE (formerly Eartex) are a London-based company, who have developed headphones aimed at the industrial workplace; construction sites, the maritime industry… Typical ear defenders do a good job of blocking out noise, but make communication extremely difficult. EAVE’s headphones are designed to protect from excessive noise, yet still allow effective communication with others. One of the founders, David Greenberg, has a background in auditory neuroscience, focusing on hearing disorders. He brought that knowledge to the company. He used his knowledge of hearing aids to design headphones that amplify speech while attenuating noise sources. They are designed for use in existing communication networks, and use beam forming microphones to focus the microphone on the speaker’s voice. They also have sensors to monitor noise levels so that noise maps can be created and personal noise exposure data can be gathered.

This use of additional sensors in the headset opens up lots of opportunities. Ossic are a company that emerged from Abbey Road Red, the start-up incubator established by the legendary Abbey Road Studios. Their headphone is packed with sensors, measuring the shape of your ears, head and torso. This allows them to estimate your own head-related transfer function, or HRTF, which describes how sounds are filtered as they travel from to your ear canal. They can then apply this filtering to the headphone output, allowing sounds to be far more accurately placed around you. Without HRTF filtering, sources always appear to be coming from inside your head.

Its not as simple as that of course. For instance, when you move your head, you can still identify the direction of arrival of different sound sources. So the Ossic headphones also incorporate head tracking. And a well-measured HRTF is essential for accurate localization, but calibration to the ear is not perfect. So their headphones also have eight drivers rather than the usual two, allowing more careful positioning of sounds over a wide range of frequencies.

Ossic was funded by a Kickstarter campaign. Another headphone start-up, Ora, currently has a Kickstarter campaign. Ora is a venture that was founded at Tandem Launch, who create companies often arising from academic research, and have previously invested in research arising from the audio engineering research team behind this blog.

Ora aim to release ‘the world’s first graphene headphones.’ Graphene is a form of carbon, shaped in a one atom thick lattice of hexagons. In 2004, Andre Geim and Konstantin Novoselov of the University of Manchester, isolated the material, analysed its properties, and showed how it could be easily fabricated, for which they won the Nobel prize in 2010. Andre Geim, by the way, is a colourful character, and the only person to have won both the Nobel and Ig Nobel prizes, the latter awarded for experiments involving levitating frogs.

graphene-headGraphene

Graphene has some amazing properties. Its 200 times stronger than the strongest steel, efficiently conducts heat and electricity and is nearly transparent. In 2013, Zhou and Zettl published early results on a graphene-based loudspeaker. In 2014, Dejan Todorovic and colleagues investigated the feasibility of graphene as a microphone membrane, and simulations suggested that it could have high sensitivity (the voltage generated in response to a pressure input) over a wide frequency range, far better than conventional microphones. Later that year, Peter Gaskell and others from McGill University performed physical and acoustical measurements of graphene oxide which confirmed Todorovic’s simulation results. Interestingly, they seemed unaware of Todorovic’s work.

graphene_speaker_640Graphene loudspeaker, courtesy Zettl Research Group, Lawrence Berkeley National Laboratory and University of California at Berkeley

Ora’s founders include some of the graphene microphone researchers from McGill University. Ora’s headphone uses a Graphene-based composite material optimized for use in acoustic transducers. One of the many benefits is the very wide frequency range, making it an appealing choice for high resolution audio reproduction.

I should be clear. This blog is not meant as an endorsement of any of the mentioned companies. I haven’t tried their products. They are a sample of what is going on at the frontiers of headphone technology, but by no means cover the full range of exciting developments. Still, one thing is clear. High-end headphones in the near future will sound very different from the typical consumer headphones around today.

SMC Conference, Espoo, Finland

I have recently returned from the 14th Sound and Music Computing Conference hosted by Aalto University, Espoo, Finland. All 4 days were full of variety and quality, ensuring there was something of interest for all. There was also live performances during an afternoon session and 2 evenings as well as the banquet on Hanasaari, a small island in Espoo. This provided a friendly framework for all the delegates to interact, making or renew connections.
The paper presentations were the main content of the programme with presenters from all over the globe. Papers that stood out for me were Johnty Wang et al – Explorations with Digital Control of MIDI-enabled Pipe Organs where I heard the movement of an unborn child control the audio output of a pipe organ. I became aware of the Championship of Standstill where participants are challenged to standstill while a number of musical pieces are played – The Musical Influence on People’s Micromotion when Standing Still in Groups.

 

Does Singing a Low-Pitch Tone Make You Look Angrier? well it looked like it in  this interesting presentation! A social media music app was presented in Exploring Social Mobile Music with Tiny Touch-Screen Performances where we can interact with others by layering 5 second clips of sound to create a collaborative mix.
Analysis and synthesis was well represented with a presentation on Virtual Analog Simulation and Extensions of Plate Reverberation by Silvan Willemson et al and The Effectiveness of Two Audiovisual Mappings to Control a Concatenate Synthesiser by Augoustinos Tiros et al. The paper on Virtual Analog Model of the Lockhart Wavefolder explaining a method of modelling West Coast style analogue synthesiser.
Automatic mixing was also represented. Flavio Everard’s paper on Towards an Automated Multitrack Mixing Tool using Answer Set Programming, citing at least 8 papers from the Intelligent Audio Engineering group at C4DM.
In total 65 papers were presented orally or in the poster sessions with sessions on Music performance analysis and rendering, Music information retrieval, Spatial sound and sonification, Computer music languages and software, Analysis, synthesis and modification of sound, Social interaction, Computer-based music analysis and lastly Automatic systems and interactive performance. All papers are available at http://smc2017.aalto.fi/proceedings.html.

Award2

Having been treated to a wide variety of live music, technical papers and meeting colleagues from around the world, it was a added honour to be presented with one of the Best Paper Awards for our paper on Real-Time Physical Model for Synthesis of Sword Sounds. The conference closed with a short presentation from the next host….. SMC2018 – Cyprus!

sónar innovation challenge 2017: the enhanced dj assistant

Screen Shot 2017-06-27 at 19.17.01

The Audio Engineering team (C4DMwas present in this year’s edition of Sónar+D in Barcelona. Sónar+D is an international conference integrated to Sónar festival that focus on the interdisciplinary approach between creativity and technology.

The Sónar Innovation Challenge (SIC), co-organized by the MTG, <<is an online and on site platform for the creative minds that want to be one step ahead and experiment with the future of technology. It brings together innovative tech companies and creators, collaborating to solve challenges that will lead to disruptive prototypes showcased in Sónar+D.>>

In this year’s challenge, Marco Martínez was part of the enhanced dj assistant by the Music Technology Group at Universitat Pompeu Fabra, which challenged participants to create a user-friendly, visually appealing and musically motivated system that DJs can use to remix music collections in exciting new ways.

Screen Shot 2017-06-27 at 19.00.34

Thus, after nearly one month of online meetings, the challengers and mentors finally met at Sónar, and during 4 days of intensive brain-storming-programming-prototyping at more than 30°C the team came with ATOMIX:

Screen Shot 2017-06-27 at 19.13.19

Visualize, explore and manipulate atoms of sound from
multitrack recordings, enhancing the creative
possibilities for live artists and DJs.

From multitrack recording (stems) and using advanced algorithms and cutting edge technologies in feature extraction, clustering, synthesis and visualisation. It segments a collection of stems into atoms of sound and groups them by timbre similarity. Thus, through concatenative synthesis, ATOMIX allows you to manipulate and exchange atoms of sound in real-time with professional DAW controls, achieving a one-of-a-kind live music exploration.

The project is still in a prototype stage and we hope to hear news of development very soon.

 

 

Acoustic Energy Harvesting

At the recent Audio Engineering Society Convention, one of the most interesting talks was in the E-Briefs sessions. These are usually short presentations, dealing with late-breaking research results, work in progress, or engineering reports. The work, by Charalampos Papadokos presented an e-brief titled ‘Power Out of Thin Air: Harvesting of Acoustic Energy’.

Ambient energy sources are those sources all around us, like solar and kinetic energy. Energy harvesting is the capture and storage of ambient energy. It’s not a new concept at all, and dates back to the windmill and the waterwheel. Ambient power has been collected from electromagnetic radiation since the invention of crystal radios by Sir Jagadish Chandra Bose, a true renaissance man who made important contributions to many fields. But nowadays, people are looking for energy harvesting from many more possible sources, often for powering small devices, like wearable electronics and wireless sensor networks. The big advantages, of course, is that energy harvesters do not consume resources like oil or coal, and energy harvesting might enable some devices to operate almost indefinitely.

But two of the main challenges is that many ambient energy sources are very low power, and the harvesting may be difficult.

Typical power densities from energy harvesting can vary over orders of magnitude. Here’s the energy densities for various ambient sources, taken from the Open Access book chapter ‘Electrostatic Conversion for Vibration Energy Harvesting‘ by S. Boisseau, G. Despesse and B. Ahmed Seddik ‘.

EnergyHarvesting

You can see that vibration, which includes acoustic vibrations, has about 1/100th the energy density of solar power, or even less. The numbers are arguable, but at first glance it looks like it will be exceedingly difficult to get any significant energy from acoustic sources unless one can harvest over a very large area.

That’s where this e-brief paper comes in. Papadokos and his co-author, John Mourjopoulos, have a patented approach to harvesting the acoustic energy inside a loudspeaker enclosure. Others had considered harvesting the sound energy from loudspeakers before (see the work of Matsuda, for instance), but mainly just as a way of testing their harvesting approach, and not really exploiting the properties of loudspeakers. Papadokos and Mourjopoulos had the insight to realise that many loudspeakers are enclosed and the enclosure has abundant acoustic energy that might be harvested without interfering with the external design and without interfering with the sound presented to the listener. In earlier work, Papadokos and Mourjopoulos found that sound pressure within the enclosure often exceeds 130 dBs within a loudspeaker enclosure. Here, they simulated the effect of a piezoelectric plate in the enclosure, to convert the acoustic energy to electrical energy. Results showed that it might be possible to generate 2.6 volts under regular operating conditions, thus proving the concept of harvesting acoustic energy from loudspeaker enclosures, at least in simulation.

AES Berlin 2017: Keynotes from the technical program

aes2017

The 142nd AES Convention was held last month in the creative heart of Berlin. The four-day program and its more than 2000 attendees covered several workshops, tutorials, technical tours and special events, all related to the latest trends and developments in audio research. But as much as scale, it’s attention to detail that makes AES special. There’s an emphasis on the research side of audio topics as much as the side of panels of experts discussing a range of provocative and practical topics.

It can be said that 3D Audio: Recording and Reproduction, Binaural Listening and Audio for VR were the most popular topics among workshops, tutorial, papers and engineering briefs. However, a significant portion of the program was also devoted to common audio topics such as digital filter design, live audio, loudspeaker design, recording, audio encoding, microphones, and music production techniques just to name a few.

For this reason, here at the Audio Engineering research team within C4DM, we bring you what we believe were the highlights, the key talks or the most relevant topics that took place during the convention.

The future of mastering

What better way to start AES than with a mastering experts’ workshop discussing about the future of the field?  Jonathan Wyner (iZotope) introduced us to the current challenges that this discipline faces.  This related to the demographic, economic and target formatting issues that are constantly evolving and changing due to advances in the music technology industry and its consumers.

When discussing the future of mastering, the panel was reluctant to a fully automated future. But pointed out that the main challenge of assistive tools is to understand artistry intentions and genre-based decisions without the need of the expert knowledge of the mastering engineer. Concluding that research efforts should go towards the development of an intelligent assistant, able to function as an smart preset that provides master engineers a starting point.

Virtual analog modeling of dynamic range compression systems

This paper described a method to digitally model an analogue dynamic range compression. Based on the analysis of processed and unprocessed audio waveforms, a generic model of dynamic range compression is proposed and its parameters are derived from iterative optimization techniques.

Audio samples were reproduced and the quality of the audio produced by the digital model was demonstrated. However, it should be noted that the parameters of the digital compressor can not be changed, thus, this could be an interesting future work path, as well as the inclusion of other audio effects such as equalizers or delay lines.

Evaluation of alternative audio mixing interfaces

In the paperFormal Usability Evaluation of Audio Track Widget Graphical Representation for Two-Dimensional Stage Audio Mixing Interface‘  an evaluation of different graphical track visualization styles is proposed. Multitrack visualizations included text only, different colour conventions for circles containing text or icons related to the type of instruments, circles with opacity assigned to audio features and also a traditional channel strip mixing interface.

Efficiency was tested and it was concluded that subjects preferred instrument icons as well as the traditional mixing interface. In this way, taking into account several works and proposals on alternative mixing interfaces (2D and 3D), there is still a lot of scope to explore on how to build an intuitive, efficient and simple interface capable of replacing the good known channel strip.

Perceptually motivated filter design with application to loudspeaker-room equalization

This tutorial, was based on the engineering briefQuantization Noise of Warped and Parallel Filters Using Floating Point Arithmetic’  where warped parallel filters are proposed, which aim to have the frequency resolution of the human ear.

Thus, via Matlab, we explored various approaches for achieving this goal, including warped FIR and IIR, Kautz, and fixed-pole parallel filters. Providing in this way a very useful tool that can be used for various applications such as room EQ, physical modelling synthesis and perhaps to improve existing intelligent music production systems.

Source Separation in Action: Demixing the Beatles at the Hollywood Bowl

Abbey Road’s James Clarke presented a great poster with the actual algorithm that was used for the remixed, remastered and expanded version of The Beatles’ album Live at the Hollywood Bowl. The method achieved to isolate the crowd noise, allowing to separate into clean tracks everything that Paul McCartney, John Lennon, Ringo Starr and George Harrison played live in 1964.

The results speak for themselves (audio comparison). Thus, based on a Non-negative Matrix Factorization (NMF) algorithm, this work provides a great research tool for source separation and reverse-engineer of mixes.

Other keynotes worth to mention:

Close Miking Empirical Practice Verification: A Source Separation Approach

Analysis of the Subgrouping Practices of Professional Mix Engineers

New Developments in Listening Test Design

Data-Driven Granular Synthesis

A Study on Audio Signal Processed by “Instant Mastering” Services

The rest of the paper proceedings are available in the AES E-library.

The AES Semantic Audio Conference

Last week saw the 2017 International Conference on Semantic Audio by the Audio Engineering Society. Held at Fraunhofer Institute for Integrated Circuits in Erlangen, Germany, delegates enjoyed a well-organised and high-quality programme, interleaved with social and networking events such as a jazz concert and a visit to Erlangen’s famous beer cellars. The conference was a combined effort of Fraunhofer IIS, Friedrich-Alexander Universität, and their joint venture Audio Labs.

As the topic is of great relevance to our team, Brecht De Man and Adán Benito attended and presented their work there. With 5 papers and a late-breaking demo, the Centre for Digital Music in general was the most strongly represented institution, surpassing even the hosting organisations.

benito-reverb
Benito’s intelligent multitrack reverberation architecture

Adán Benito presented “Intelligent Multitrack Reverberation Based on Hinge-Loss Markov Random Fields“,  machine learning approach to automatic application of a reverb effect to musical audio.

Brecht De Man demoed the “Mix Evaluation Browser“, an online interface to access a dataset comprising several mixes of a number of songs, complete with corresponding DAW files, raw tracks, preference ratings, and annotated comments from subjective listening tests.

MixEvaluationBrowser
The Mix Evaluation Browser: an interface to visualise De Man’s dataset of raw tracks, mixes, and subjective evaluation results.

Still from the Centre for Digital Music, Delia Fano Yela delivered a beautifully hand-drawn and compelling presentation about source separation in general and how temporal context can be employed to considerably improve vocal extraction.

Rodrigo Schramm and Emmanouil Benetos won the Best Paper award for their paper “Automatic Transcription of a Cappella recordings from Multiple Singers”.

Emmanouil further presented another paper, “Polyphonic Note and Instrument Tracking Using Linear Dynamical Systems”, and coauthored “Assessing the Relevance of Onset Information for Note Tracking in Piano Music Transcription”.

 

Several other delegates were frequent collaborators or previously affiliated with Queen Mary. The opening keynote was delivered by Mark Plumbley, former director of the Centre for Digital Music, who gave an overview of the field of machine listening, specifically audio event detection and scene recognition. Nick Jillings, formerly research assistant and master project student at the Audio Engineering group, and currently a PhD student at Birmingham City University cosupervised by Josh Reiss, head of our Audio Engineering group, presented his paper “Investigating Music Production Using a Semantically Powered Digital Audio Workstation in the Browser” and demoed “Automatic channel routing using musical instrument linked data”.

Other keynotes were delivered by Udo Zölzer, best known from editing the collection “DAFX: Digital Audio Effects”, and Masataka Goto, a household name in the MIR community who discussed his own web-based implementations of music discovery and visualisation.

Paper proceedings are already available in the AES E-library, free for AES members.

Applause, applause! (thank you, thank you. You’re too kind)

“You must be prepared to work always without applause.”
―  Ernest Hemingway, By-line

In a recent blog entry , we discussed research into the sound of screams. Its one of those everyday sounds that we are particularly attuned to, but that there hasn’t been much research on. This got me thinking about what are some other under-researched sounds. Applause certainly fits. We all know when we hear it, and a quick search of famous quotes reveals that there are many ways to describe the many types of applause; thunderous applause, tumultuous applause, a smattering of applause, sarcastic applause, and of course, the dreaded slow hand clap. But from an auditory perspective, what makes it special?

Applause is nothing more than the sound of many people gathered in one place clapping their hands. Clapping your hands together is one of the simplest ways in which we can approximate an impulse, or short broadband sound, without the need for any equipment. Impulsive sounds are used for rhythm, for tagging important moments on a timeline, or for estimating the acoustic properties of a room. clappers and clapsticks are musical instruments, typically consisting of two pieces of wood that are clapped together to produce percussive sounds. In film and television, clapperboards have widespread use. The clapperboard produces a sharp clap noise that can be easily identified on the audio track, and the shutting of the clapstick at the top of the board can similarly be identified on the visual track. Thus, they are effective used to synchronising sound and picture, as well as to designate the starts of scenes or takes during production. And in acoustic measurement, if one can produce an impulsive sound at a given location and record the result, one can get an idea of the reverberation that the room will apply to any sound produced from that location.

Murri_artefacts_clapsticksclapstick
But a hand clap is a crude approximation for an impulse. Hand claps do not have completely flat impulse responses, are not completely omnidirectional, have significant duration and are not very high energy. Seetharaman and colleagues investigated the effectiveness of hand claps as impulse sources. They found that, with a small amount of additional but automated signal processing, the claps can produce reliable acoustical measurements.
Hanahara, Tada and Muroi exploited the impulse-like nature of hand claps for devising a means of Human-Robot Communication. The hand claps and their timing are relatively easy for a robot to decode, and not that difficult for a human to encode. But why the authors completely dismissed Morse code and all other simple forms of binary encoding is beyond me. And as voice recognition and related technologies continue to advance, the need for hand clap-based communication diminishes.
So what does a single hand clap sound like? This whole field of applause and clapping studies originated with a well-cited 1987 study by Bruno Repp, “The sound of two hands clapping.” He distinguished 8 hand clap positions;
Hands parallel and flat
P1: palm-to-palm
P2: halfway between P1 and P3
P3: fingers-to-palm

Hands held at an angle
A1: palm-to-palm
A2: halfway between P1 and P3
A3: fingers-to-palm
A1+: A1 with hands very cupped
A1-: A1 with hands fully flat

The figure below shows photos of these eight configurations of hand claps, excerpted from Leevi Peltola’s 2004 MSc thesis.

clap positions.png

Repp’s acoustic analyses and perceptual experiments mainly involved 20 test subjects who were each asked to clap at their normal rate for 10 seconds in a quiet room. The spectra of individual claps varied widely, but there was no evidence of influence of sex or hand size on the clap spectrum. He also measured his own clapping with the eight modes above. If the palms struck each other (P1, A1) there was a narrow frequency peak below 1 kHz together with a notch around 2.5 kHz. If the fingers of one hand struck the palm of the other hand (P3, A3) there was a broad spectral peak near 2 kHz.

Repp then tried to determine whether the subjects were able to extract information about the clapper from listening to the signal. Subjects generally assumed that slow, loud and low-pitched hand claps were from male clappers, and fast, soft and high-pitched hand claps were from female clappers. But this was not the case. The speed, intensity and pitch were uncorrelated with sex and thus it seemed that test subjects could correctly identify genre only slightly better than chance. Perceived differences were attributed mainly to hand configurations rather than hand size.

So much for individuals clapping, but what about applause. That’s when some interesting physics comes into play. Neda and colleagues recorded applause from several theatre and opera performances. They observed that the applause begins with incoherent random clapping, but then synchronization and periodic behaviour develops after a few seconds. This transition can be quite sudden and very strong, and is an unusual example of self-organization in a large coupled system. Neda gives quite a clear explanation of what is happening, and why.

Here’s a nice video of the phenomenon.

The fact that sonic aspects of hand claps can differ so significantly, and can often be identified by listeners, suggests that it may be possible to tell a lot about the source by signal analysis. Such was the case in work by Jylhä and colleagues, who proposed methods to identify a person by their hand claps, or identify the configuration (à  la Repp’s study) of the hand clap. Christian Uhle looked at the more general question of identifying applause in an audio stream.

Understanding of applause, beyond the synchronization phenomenon observed by Neda, is quite useful for encoding applause signals which so often accompany musical recordings- especially those recordings that are considered worth redistributing! And the important spatial and temporal aspects of applause signals are known to make then particularly tricky signals to encode and decode. As noted in research by Adami and colleagues, the more standard perceptual features like pitch or loudness do not do a good job of characterising grainy sound textures like applause. They introduced a new feature, applause density, which is loosely related to the overall clapping rate, but derived from perceptual experiments. Just a month before this blog entry, Adami and co-authors published a follow-up paper which used density and other characteristics to investigate the realism of upmixed (mono to stereo) applause signals. In fact, talking with one of the co-authors was a motivation for me to write this entry.

Upmixing is an important problem in its own right. But the placement and processing of sounds for a stereo or multichannel environment can be considered part of the general problem of sound synthesis. Synthesis of clapping and applause sounds was covered in detail, and to great effect, by Peltola and co-authors. They presented physics-based analysis, synthesis, and control systems capable of both producing individual hand-claps, or mimicking the applause of a group of clappers. The synthesis models were derived from experimental measurements and built both on the work of Repp and of Neda. Researchers here in the Centre for Digital Music’s Audio Engineering research team are trying to build on their work, creating a synthesis system that could incorporate cheering and other aspects of an appreciative crowd. More on that soon, hopefully.

“I think that’s just how the world will come to an end: to general applause from wits who believe it’s a joke.”
― Søren Kierkegaard, Either/Or, Part I

And for those who might be interested, here’s a short bibliography of applause and hand-clapping references;

1. Adami, A., Disch, S., Steba, G., & Herre, J. ‘Assessing Applause Density Perception Using Synthesized Layered Applause Signals,’ 19th International Conference on Digital Audio Effects (DAFx-16), Brno, Czech Republic, 2016
2. Adami, A.; Brand, L.; Herre, J., ‘Investigations Towards Plausible Blind Upmixing of Applause Signals,’ 142nd AES Convention, May 2017
3. W. Ahmad, AM Kondoz, Analysis and Synthesis of Hand Clapping Sounds Based on Adaptive Dictionary. ICMC, 2011
4. K. Hanahara, Y. Tada, and T. Muroi, “Human-robot communication by means of hand-clapping (preliminary experiment with hand-clapping language),” IEEE Int. Conf. on Systems, Man and Cybernetics(ISIC-2007),Oct2007,pp.2995–3000.
5. Farner, Snorre; Solvang, Audun; Sæbo, Asbjørn; Svensson, U. Peter ‘Ensemble Hand-Clapping Experiments under the Influence of Delay and Various Acoustic Environments’, Journal of the Audio Engineering Society, Volume 57 Issue 12 pp. 1028-1041; December 2009
6. A. Jylhä and C. Erkut, “Inferring the Hand Configuration from Hand Clapping Sounds,” 11th International Conference on Digital Audio Effects (DAFx-08), Espoo, Finland, 2008.
7. Jylhä, Antti; Erkut, Cumhur; Simsekli, Umut; Cemgil, A. Taylan ‘Sonic Handprints: Person Identification with Hand Clapping Sounds by a Model-Based Method’, AES 45th Conference, March 2012
8. Kawahara, Kazuhiko; Kamamoto, Yutaka; Omoto, Akira; Moriya, Takehiro ‘Evaluation of the Low-Delay Coding of Applause and Hand-Clapping Sounds Caused by Music Appreciation’ 138th AES Convention, May 2015.
9. Kawahara, Kazuhiko; Fujimori, Akiho; Kamamoto, Yutaka; Omoto, Akira; Moriya, Takehiro Implementation and Demonstration of Applause and Hand-Clapping Feedback System for Live Viewing,’ 141st AES Convention, September 2016.
10. Laitinen, Mikko-Ville; Kuech, Fabrian; Disch, Sascha; Pulkki, ‘Ville Reproducing Applause-Type Signals with Directional Audio Coding,’ Journal of the Audio Engineering Society, Volume 59 Issue 1/2 pp. 29-43; January 2011
11. Z. Néda, E. Ravasz, T. Vicsek, Y. Brechet, and A.-L. Barabási, “Physics of the rhythmic applause,” Phys. Rev. E, vol. 61, no. 6, pp. 6987–6992, 2000.
12. Z. Néda, E. Ravasz, Y. Brechet, T. Vicsek, and A.-L. Barabási, “The sound of many hands clapping: Tumultuous applause can transform itself into waves of synchronized clapping,” Nature, vol. 403, pp. 849–850, 2000.
13. Z. Néda, A. Nikitin, and T. Vicsek. ‘Synchronization of two-mode stochastic oscillators: a new model for rythmic applause an much more,’ Physica A: Statistical Mechanics and its Applications, 321:238–247, 2003.
14. L. Peltola, C. Erkut, P. R. Cook, and V. Välimäki, “Synthesis of Hand Clapping Sounds,”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 3, pp. 1021– 1029, 2007.
15. B. H. Repp. ‘The sound of two hands clapping: an exploratory study,’ J. of the Acoustical Society of America, 81:1100–1109, April 1987.
16. P. Seetharaman, S. P. Tarzia, ‘The Hand Clap as an Impulse Source for Measuring Room Acoustics,’ 132nd AES Convention, April 2012.
17. Uhle, C. ‘Applause Sound Detection’ , Journal of the Audio Engineering Society, Volume 59 Issue 4 pp. 213-224, April 2011

Why can you hear the difference between hot and cold water ?

I recently found out about an interesting little experiment where it was shown that people could identify when hot or cold water was being poured from the sound alone. This is a little surprising since we don’t usually think of temperature as having a sound.
Here are two sound samples;

Which one do you think was hot water and which was cold water? Scroll down for the answer..

.
.

.

..
.

.

.
Keep scrolling
.
.
.
.
.
.
.
.
Yes, the first sound sample was cold water being poured, and the second was hot water.
The work was first done by a London advertising agency, Condiment Junkie, who use sound design in branding and marketing, in collaboration with researchers from University of Oxford, and they published a research paper on this. The experiment is first described in Condiment Junkie’s blog, and was picked up by NPR and lots of others. There’s even a YouTube video about this phenomenon that has over 600,000 views.
However, there wasn’t really a good explanation as to why we hear the difference. The academic paper did not really discuss this. The youtube video simply states ‘change in the splashing of the water changes the sound that it makes because of various complex fluid dynamic reasons,’ which really doesn’t explain anything. According to one of the founders of Condiment Junkie, “more bubbling in a liquid that’s hot… you tend to get higher frequency sounds from it,” but further discussion on NPR noted “Cold water is more viscous… That’s what makes that high pitched ringing.” Are they both right? There is even a fair amount of discussion of this on physics forums.
But its all speculation. Most of the arguments are half-formed and involve a fair amount of handwaving. No one actually analysed the audio.

So I put the two samples above through some analysis using Sonic Visualiser. Spectrograms are very good for this sort of thing because they show you how the frequency content is changing over time. But you have to be careful because if you don’t choose how to visualise it carefully, you’ll easily overlook the interesting stuff.

Here’s the spectrograms of the two files, cold water on top, hot water on bottom. Frequency is on a log scale (otherwise all the detail will be crammed at the bottom) and the peak frequencies are heavily emphasised (there’s an awful lot of noise).

cold

hot

There’s more analysis than shown, but the most striking feature is that the same frequencies are present in both signals! There is a strong, dominant frequency that linearly increases from about 650 Hz to just over 1 kilohertz. And there is a second frequency that appears a little later, starting at around 720 Hz, falling all the way to 250 Hz, then climbing back up again.

These frequencies are pretty much the same in both hot and cold cases. The difference is mainly that cold water has a much stronger second frequency (the one that dips).
So all those people who speculated on why and how hot and cold water sound different seem to have gotten it wrong. If they had actually analysed the audio, they would have seen that the same frequencies are produced, but with different strengths.
My first guess was that the second frequency is due to the size of water droplets being dependent on the rate of water flow. When more water is flowing, in the middle of the pour, the droplets are large and so produce lower frequencies. Hot water is less viscuous (more runny) and so doesn’t separate into these droplets so much.
I was less sure about the first frequency. Maybe this is due to a default droplet size, and only some water droplets have a larger size. But why would this first frequency be linearly increasing? Maybe after water hits the surface, it always separates into small droplets and so this is them splashing back down after initial impact. Perhaps, the more water on the floor, the smaller the droplets splashing back up, giving the increase in this frequency.
But Rod Selfridge, a researcher in the Audio Engineering team here, gave a better possible explanation, which I’ll repeat verbatim here.
The higher frequency line in the spectrogram which linearly increases could be related to the volume of air left in the vessel the liquid is being poured into. As the fluid is poured in the volume of air decreases and the resonant frequency of the remaining ‘chamber’ increases.
The lower line of frequencies could be related to the force of liquid being added. As the pouring speed increases, increasing the force, the falling liquid pushes further into the reservoir. This means a deeper column of air is trapped and becomes a bubble. The larger the bubble the lower the resonant frequency. This is the theory of Minneart and described in the attached paper.
My last thought was that for hot water, especially boiling, there will be steam in the vessel and surrounding the contact area of the pour. Perhaps the steam has an acoustic filtering effect and/or a physical effect on the initial pour or splashes.
 Of course, a more definitive answer would involve a few experiments, pouring differing amounts of water into differing containers. But I think this already demonstrates the need to test the theory of what sound will occur against analysis of the actual sounds produced.

Cool stuff at the Audio Engineering Society Convention in Berlin

aesberlin17_IDS_headerThe next Audio Engineering Society convention is just around the corner, May 20-23 in Berlin. This is an event where we always have a big presence. After all, this blog is brought to you by the Audio Engineering research team within the Centre for Digital Music, so its a natural fit for a lot of what we do.

These conventions are quite big, with thousands of attendees, but not so big that you get lost or overwhelmed. The attendees fit loosely into five categories: the companies, the professionals and practitioners, students, enthusiasts, and the researchers. That last category is where we fit.

I thought I’d give you an idea of some of the highlights of the Convention. These are some of the events that we will be involved in or just attending, but of course, there’s plenty else going on.

On Saturday May 20th, 9:30-12:30, Dave Ronan from the team here will be presenting a poster on ‘Analysis of the Subgrouping Practices of Professional Mix Engineers.’ Subgrouping is a greatly understudied, but important part of the mixing process. Dave surveyed 10 award winning mix engineers to find out how and why they do subgrouping. He then subjected the results to detailed thematic analysis to uncover best practices and insights into the topic.

2:45-4:15 pm there is a workshop on ‘Perception of Temporal Response and Resolution in Time Domain.’ Last year we published an article in the Journal of the Audio Engineering Society  on ‘A meta-analysis of high resolution audio perceptual evaluation.’ There’s a blog entry about it too. The research showed very strong evidence that people can hear a difference between high resolution audio and standard, CD quality audio. But this brings up the question of why? Many people have suggested that the fine temporal resolution of oversampled audio might be perceived. I expect that this Workshop will shed some light on this as yet unresolved question.

Overlapping that workshop, there are some interesting posters from 3 to 6 pm. ‘Mathematical Model of the Acoustic Signal Generated by the Combustion Engine‘ is about synthesis of engine sounds, specifically for electric motorbikes. We are doing a lot of sound synthesis research here, and so are always on the lookout for new approaches and new models. ‘A Study on Audio Signal Processed by “Instant Mastering” Services‘ investigates the effects applied to ten songs by various online, automatic mastering platforms. One of those platforms, LandR, was a high tech spin-out from our research a few years ago, so we’ll be very interested in what they found.

For those willing to get up bright and early Sunday morning, there’s a 9 am panel on ‘Audio Education—What Does the Future Hold,’ where I will be one of the panellists. It should have some pretty lively discussion.

Then there’s some interesting posters from 9:30 to 12:30. We’ve done a lot of work on new interfaces for audio mixing, so will be quite interested in ‘The Mixing Glove and Leap Motion Controller: Exploratory Research and Development of Gesture Controllers for Audio Mixing.’ And returning to the subject of high resolution audio, there is ‘Discussion on Subjective Characteristics of High Resolution Audio,’ by Mitsunori Mizumachi. Mitsunori was kind enough to give me details about his data and experiments in hi-res audio, which I then used in the meta-analysis paper. He’ll also be looking at what factors affect high resolution audio perception.

From 10:45 to 12:15, our own Brecht De Man will be chairing and speaking in a Workshop on ‘New Developments in Listening Test Design.’ He’s quite a leader in this field, and has developed some great software that makes the set up, running and analysis of listening tests much simpler and still rigorous.

From 1 to 2 pm, there is the meeting of the Technical Committee on High Resolution Audio, of which I am co-chair along with Vicki Melchior. The Technical Committee aims for comprehensive understanding of high resolution audio technology in all its aspects. The meeting is open to all, so for those at the Convention, feel free to stop by.

Sunday evening at 6:30 is the Heyser lecture. This is quite prestigious, a big talk by one of the eminent people in the field. This one is given by Jorg Sennheiser of, well, Sennheiser Electronic.

Monday morning 10:45-12:15, there’s a tutorial on ‘Developing Novel Audio Algorithms and Plugins – Moving Quickly from Ideas to Real-time Prototypes,’ given by Mathworks, the company behind Matlab. They have a great new toolbox for audio plugin development, which should make life a bit simpler for all those students and researchers who know Matlab well and want to demo their work in an audio workstation.

Again in the mixing interface department, we look forward to hearing about ‘Formal Usability Evaluation of Audio Track Widget Graphical Representation for Two-Dimensional Stage Audio Mixing Interface‘ on Tuesday, 11-11:30. The authors gave us a taste of this work at the Workshop on Intelligent Music Production which our group hosted last September.

In the same session – which is all about ‘Recording and Live Sound‘ so very close to home – a new approach to acoustic feedback suppression is discussed in ‘Using a Speech Codec to Suppress Howling in Public Address Systems‘, 12-12:30. With several past projects on gain optimization for live sound, we are curious to hear (or not hear) the results!

The full program can be explored on the AES Convention planner or the Convention website. Come say hi to us if you’re there!

 

 

Scream!

Audio and informatics researchers are perhaps quite familiar with retrieval systems that try to analyse recordings to identify when an important word or phrase was spoken, or when a song was played. But I once did some collaboration with a company who did laughter and question detection, two audio informatics problems I hadn’t heard of before. I asked them about it. The company was developing audio analytics software to assist Call Centres. Call Centres wanted to keep track of the unusual or problematic calls, and in particular, any laughter when someone is calling tech support would be worth investigating. And I suppose all sorts of unusual sounds should indicate that something about the call is worth noting. Which brings me to the subject of this blog entry.

scream

Screams!

Screams occupy an important evolutionary niche, since they are used as a warning and alert signal, and hence are intended to be a sound which we strongly and quickly focus on. A 2015 study by Arnal et al. showed that screams contain a strong modulation component, typically within the 30 to 150 Hz range. This sort of modulation is sometimes called roughness. Arnal showed that roughness occurs in both natural and artificial alarm sounds, and that adding roughness to a sound can make it be perceived as more alarming or fearful.

This new study suggests that a peculiar set of features may be appropriate for detecting screams. And like most fields of research, if you dig deep enough, you find that quite a few people have already scratched the surface.

I did a quick search of AES and IEEE papers and found ten that had ‘scream’ in the title, not counting those referring to systems or algorithms given the acronym SCREAM. This is actually very few, indicating that the field is underdeveloped. One of them, is really about screams and growls in death metal music, which though interesting in its own right, is quite different. Most of the rest all seem to mostly just ‘applying my favourite machine learning technique to scream data’. This is an issue with a lot of papers, deserving of a blog entry in future.

But one of the most detailed analyses of screams was conducted by audio forensics researcher and consultant Durand Begault. In 2008 he published  ‘Forensic Analysis of the Audibility of Female Screams’ In it, he notes “the local frequency modulation (‘warble’ or ‘vibrato’)” that was later focused on in Arnal’s paper.

Begault also has some interesting discussion of investigations of scream audibility for a court case. He was asked to determine whether a woman screaming in one location could be heard by potential witnesses in a nearby community. He tested this on site by playing back prerecorded screams at the site of the incident. The test screams were generated by asking female subjects ‘to scream as loudly as possible, as if you had just been surprised by something very scary.’ Thirty screams were recorded, ranging from 123 to 102 decibels. The end result was that these screams could easily be heard more than 100 meters away, even with background noise and obstructions.

This is certainly not the only audio analysis and processing that has found its way into the courtroom. One high profile case was in February 2012. Neighborhood watch coordinator George Zimmerman shot and killed black teenager Trayvon Martin in Sanford, Florida. In Zimmerman’s trial for second degree murder, experts offered analysis of a scream heard in the background of a 911 phone call that also captured the sound of the gunshot that killed Martin. If the screamer was Zimmerman, it would strengthen the case that he acted in self-defense, but if it was Martin, it would imply that Zimmerman was the aggressor. But FBI audio analysis experts testified in the case about the difficulties in identifying the speaker, or even his age, from the screams , and news outlets also called on experts who noted the lack of robust ‘screamer identification’ technologies.

The issue of scream audibility thus begs the question, ‘how loud is a scream.’ We know they can be attention-grabbing, ear –piercing shrieks. The loudest scream Begault recorded was 123 dB, and he stated that scream “frequency content seems almost tailored to frequencies of maximal sensitivity on an equal-loudness contour.”

And apparently, one can get a lot louder with a scream than a shout. According to the Guinness Book of World Records, the loudest shout was 121.7 dBA by Annalisa Flanagan, shouting the word ‘Quiet!’. And the loudest shout ever recorded is 129 dB (C-Weighted), by Jill Drake. Not surprisingly, both Jill and Annalisa are teachers, who seem to have found a very effective way to deal with unruly classrooms.

Interestingly, one might have a false conception of the diversity of screaming sounds if one’s understanding is based on films. The Wilhelm Scream, a sound sample that has been used in over 300 films. This overuse perhaps gives a certain familiarity to the listener, and lessens the alarming nature of the sound.

For more on the Wilhelm scream, see the blog entry ‘Swinging microphones and slashing lightsabres’. But here’s a short video on the sound, which includes a few more examples of its use than were given in the previous blog entry.