The Audio Engineering team (C4DM) was present in this year’s edition of Sónar+D in Barcelona. Sónar+D is an international conference integrated to Sónar festival that focus on the interdisciplinary approach between creativity and technology.
The Sónar Innovation Challenge (SIC), co-organized by the MTG, <<is an online and on site platform for the creative minds that want to be one step ahead and experiment with the future of technology. It brings together innovative tech companies and creators, collaborating to solve challenges that will lead to disruptive prototypes showcased in Sónar+D.>>
In this year’s challenge, Marco Martínez was part of the enhanced dj assistant by the Music Technology Group at Universitat Pompeu Fabra, which challenged participants to create a user-friendly, visually appealing and musically motivated system that DJs can use to remix music collections in exciting new ways.
Thus, after nearly one month of online meetings, the challengers and mentors finally met at Sónar, and during 4 days of intensive brain-storming-programming-prototyping at more than 30°C the team came with ATOMIX:
Visualize, explore and manipulate atoms of sound from
multitrack recordings, enhancing the creative
possibilities for live artists and DJs.
From multitrack recording (stems) and using advanced algorithms and cutting edge technologies in feature extraction, clustering, synthesis and visualisation. It segments a collection of stems into atoms of sound and groups them by timbre similarity. Thus, through concatenative synthesis, ATOMIX allows you to manipulate and exchange atoms of sound in real-time with professional DAW controls, achieving a one-of-a-kind live music exploration.
The project is still in a prototype stage and we hope to hear news of development very soon.
At the recent Audio Engineering Society Convention, one of the most interesting talks was in the E-Briefs sessions. These are usually short presentations, dealing with late-breaking research results, work in progress, or engineering reports. The work, by Charalampos Papadokos presented an e-brief titled ‘Power Out of Thin Air: Harvesting of Acoustic Energy’.
Ambient energy sources are those sources all around us, like solar and kinetic energy. Energy harvesting is the capture and storage of ambient energy. It’s not a new concept at all, and dates back to the windmill and the waterwheel. Ambient power has been collected from electromagnetic radiation since the invention of crystal radios by Sir Jagadish Chandra Bose, a true renaissance man who made important contributions to many fields. But nowadays, people are looking for energy harvesting from many more possible sources, often for powering small devices, like wearable electronics and wireless sensor networks. The big advantages, of course, is that energy harvesters do not consume resources like oil or coal, and energy harvesting might enable some devices to operate almost indefinitely.
But two of the main challenges is that many ambient energy sources are very low power, and the harvesting may be difficult.
Typical power densities from energy harvesting can vary over orders of magnitude. Here’s the energy densities for various ambient sources, taken from the Open Access book chapter ‘Electrostatic Conversion for Vibration Energy Harvesting‘ by S. Boisseau, G. Despesse and B. Ahmed Seddik ‘.
You can see that vibration, which includes acoustic vibrations, has about 1/100th the energy density of solar power, or even less. The numbers are arguable, but at first glance it looks like it will be exceedingly difficult to get any significant energy from acoustic sources unless one can harvest over a very large area.
That’s where this e-brief paper comes in. Papadokos and his co-author, John Mourjopoulos, have a patented approach to harvesting the acoustic energy inside a loudspeaker enclosure. Others had considered harvesting the sound energy from loudspeakers before (see the work of Matsuda, for instance), but mainly just as a way of testing their harvesting approach, and not really exploiting the properties of loudspeakers. Papadokos and Mourjopoulos had the insight to realise that many loudspeakers are enclosed and the enclosure has abundant acoustic energy that might be harvested without interfering with the external design and without interfering with the sound presented to the listener. In earlier work, Papadokos and Mourjopoulos found that sound pressure within the enclosure often exceeds 130 dBs within a loudspeaker enclosure. Here, they simulated the effect of a piezoelectric plate in the enclosure, to convert the acoustic energy to electrical energy. Results showed that it might be possible to generate 2.6 volts under regular operating conditions, thus proving the concept of harvesting acoustic energy from loudspeaker enclosures, at least in simulation.
The 142nd AES Convention was held last month in the creative heart of Berlin. The four-day program and its more than 2000 attendees covered several workshops, tutorials, technical tours and special events, all related to the latest trends and developments in audio research. But as much as scale, it’s attention to detail that makes AES special. There’s an emphasis on the research side of audio topics as much as the side of panels of experts discussing a range of provocative and practical topics.
It can be said that 3D Audio: Recording and Reproduction, Binaural Listening and Audio for VR were the most popular topics among workshops, tutorial, papers and engineering briefs. However, a significant portion of the program was also devoted to common audio topics such as digital filter design, live audio, loudspeaker design, recording, audio encoding, microphones, and music production techniques just to name a few.
For this reason, here at the Audio Engineering research team within C4DM, we bring you what we believe were the highlights, the key talks or the most relevant topics that took place during the convention.
The future of mastering
What better way to start AES than with a mastering experts’ workshop discussing about the future of the field? Jonathan Wyner (iZotope) introduced us to the current challenges that this discipline faces. This related to the demographic, economic and target formatting issues that are constantly evolving and changing due to advances in the music technology industry and its consumers.
When discussing the future of mastering, the panel was reluctant to a fully automated future. But pointed out that the main challenge of assistive tools is to understand artistry intentions and genre-based decisions without the need of the expert knowledge of the mastering engineer. Concluding that research efforts should go towards the development of an intelligent assistant, able to function as an smart preset that provides master engineers a starting point.
Virtual analog modeling of dynamic range compression systems
This paper described a method to digitally model an analogue dynamic range compression. Based on the analysis of processed and unprocessed audio waveforms, a generic model of dynamic range compression is proposed and its parameters are derived from iterative optimization techniques.
Audio samples were reproduced and the quality of the audio produced by the digital model was demonstrated. However, it should be noted that the parameters of the digital compressor can not be changed, thus, this could be an interesting future work path, as well as the inclusion of other audio effects such as equalizers or delay lines.
Evaluation of alternative audio mixing interfaces
In the paper ‘Formal Usability Evaluation of Audio Track Widget Graphical Representation for Two-Dimensional Stage Audio Mixing Interface‘ an evaluation of different graphical track visualization styles is proposed. Multitrack visualizations included text only, different colour conventions for circles containing text or icons related to the type of instruments, circles with opacity assigned to audio features and also a traditional channel strip mixing interface.
Efficiency was tested and it was concluded that subjects preferred instrument icons as well as the traditional mixing interface. In this way, taking into account several works and proposals on alternative mixing interfaces (2D and 3D), there is still a lot of scope to explore on how to build an intuitive, efficient and simple interface capable of replacing the good known channel strip.
Perceptually motivated filter design with application to loudspeaker-room equalization
This tutorial, was based on the engineering brief ‘Quantization Noise of Warped and Parallel Filters Using Floating Point Arithmetic’ where warped parallel filters are proposed, which aim to have the frequency resolution of the human ear.
Thus, via Matlab, we explored various approaches for achieving this goal, including warped FIR and IIR, Kautz, and fixed-pole parallel filters. Providing in this way a very useful tool that can be used for various applications such as room EQ, physical modelling synthesis and perhaps to improve existing intelligent music production systems.
Source Separation in Action: Demixing the Beatles at the Hollywood Bowl
Abbey Road’s James Clarke presented a great poster with the actual algorithm that was used for the remixed, remastered and expanded version of The Beatles’ album Live at the Hollywood Bowl. The method achieved to isolate the crowd noise, allowing to separate into clean tracks everything that Paul McCartney, John Lennon, Ringo Starr and George Harrison played live in 1964.
The results speak for themselves (audio comparison). Thus, based on a Non-negative Matrix Factorization (NMF) algorithm, this work provides a great research tool for source separation and reverse-engineer of mixes.
Other keynotes worth to mention:
The rest of the paper proceedings are available in the AES E-library.
Last week saw the 2017 International Conference on Semantic Audio by the Audio Engineering Society. Held at Fraunhofer Institute for Integrated Circuits in Erlangen, Germany, delegates enjoyed a well-organised and high-quality programme, interleaved with social and networking events such as a jazz concert and a visit to Erlangen’s famous beer cellars. The conference was a combined effort of Fraunhofer IIS, Friedrich-Alexander Universität, and their joint venture Audio Labs.
As the topic is of great relevance to our team, Brecht De Man and Adán Benito attended and presented their work there. With 5 papers and a late-breaking demo, the Centre for Digital Music in general was the most strongly represented institution, surpassing even the hosting organisations.
Adán Benito presented “Intelligent Multitrack Reverberation Based on Hinge-Loss Markov Random Fields“, machine learning approach to automatic application of a reverb effect to musical audio.
Brecht De Man demoed the “Mix Evaluation Browser“, an online interface to access a dataset comprising several mixes of a number of songs, complete with corresponding DAW files, raw tracks, preference ratings, and annotated comments from subjective listening tests.
Still from the Centre for Digital Music, Delia Fano Yela delivered a beautifully hand-drawn and compelling presentation about source separation in general and how temporal context can be employed to considerably improve vocal extraction.
Rodrigo Schramm and Emmanouil Benetos won the Best Paper award for their paper “Automatic Transcription of a Cappella recordings from Multiple Singers”.
Emmanouil further presented another paper, “Polyphonic Note and Instrument Tracking Using Linear Dynamical Systems”, and coauthored “Assessing the Relevance of Onset Information for Note Tracking in Piano Music Transcription”.
Several other delegates were frequent collaborators or previously affiliated with Queen Mary. The opening keynote was delivered by Mark Plumbley, former director of the Centre for Digital Music, who gave an overview of the field of machine listening, specifically audio event detection and scene recognition. Nick Jillings, formerly research assistant and master project student at the Audio Engineering group, and currently a PhD student at Birmingham City University cosupervised by Josh Reiss, head of our Audio Engineering group, presented his paper “Investigating Music Production Using a Semantically Powered Digital Audio Workstation in the Browser” and demoed “Automatic channel routing using musical instrument linked data”.
Other keynotes were delivered by Udo Zölzer, best known from editing the collection “DAFX: Digital Audio Effects”, and Masataka Goto, a household name in the MIR community who discussed his own web-based implementations of music discovery and visualisation.
Paper proceedings are already available in the AES E-library, free for AES members.
“You must be prepared to work always without applause.”
― Ernest Hemingway, By-line
In a recent blog entry , we discussed research into the sound of screams. Its one of those everyday sounds that we are particularly attuned to, but that there hasn’t been much research on. This got me thinking about what are some other under-researched sounds. Applause certainly fits. We all know when we hear it, and a quick search of famous quotes reveals that there are many ways to describe the many types of applause; thunderous applause, tumultuous applause, a smattering of applause, sarcastic applause, and of course, the dreaded slow hand clap. But from an auditory perspective, what makes it special?
Applause is nothing more than the sound of many people gathered in one place clapping their hands. Clapping your hands together is one of the simplest ways in which we can approximate an impulse, or short broadband sound, without the need for any equipment. Impulsive sounds are used for rhythm, for tagging important moments on a timeline, or for estimating the acoustic properties of a room. clappers and clapsticks are musical instruments, typically consisting of two pieces of wood that are clapped together to produce percussive sounds. In film and television, clapperboards have widespread use. The clapperboard produces a sharp clap noise that can be easily identified on the audio track, and the shutting of the clapstick at the top of the board can similarly be identified on the visual track. Thus, they are effective used to synchronising sound and picture, as well as to designate the starts of scenes or takes during production. And in acoustic measurement, if one can produce an impulsive sound at a given location and record the result, one can get an idea of the reverberation that the room will apply to any sound produced from that location.
But a hand clap is a crude approximation for an impulse. Hand claps do not have completely flat impulse responses, are not completely omnidirectional, have significant duration and are not very high energy. Seetharaman and colleagues investigated the effectiveness of hand claps as impulse sources. They found that, with a small amount of additional but automated signal processing, the claps can produce reliable acoustical measurements.
Hanahara, Tada and Muroi exploited the impulse-like nature of hand claps for devising a means of Human-Robot Communication. The hand claps and their timing are relatively easy for a robot to decode, and not that difficult for a human to encode. But why the authors completely dismissed Morse code and all other simple forms of binary encoding is beyond me. And as voice recognition and related technologies continue to advance, the need for hand clap-based communication diminishes.
So what does a single hand clap sound like? This whole field of applause and clapping studies originated with a well-cited 1987 study by Bruno Repp, “The sound of two hands clapping.” He distinguished 8 hand clap positions;
Hands parallel and flat
P2: halfway between P1 and P3
Hands held at an angle
A2: halfway between P1 and P3
A1+: A1 with hands very cupped
A1-: A1 with hands fully flat
The figure below shows photos of these eight configurations of hand claps, excerpted from Leevi Peltola’s 2004 MSc thesis.
Repp’s acoustic analyses and perceptual experiments mainly involved 20 test subjects who were each asked to clap at their normal rate for 10 seconds in a quiet room. The spectra of individual claps varied widely, but there was no evidence of inﬂuence of sex or hand size on the clap spectrum. He also measured his own clapping with the eight modes above. If the palms struck each other (P1, A1) there was a narrow frequency peak below 1 kHz together with a notch around 2.5 kHz. If the ﬁngers of one hand struck the palm of the other hand (P3, A3) there was a broad spectral peak near 2 kHz.
Repp then tried to determine whether the subjects were able to extract information about the clapper from listening to the signal. Subjects generally assumed that slow, loud and low-pitched hand claps were from male clappers, and fast, soft and high-pitched hand claps were from female clappers. But this was not the case. The speed, intensity and pitch were uncorrelated with sex and thus it seemed that test subjects could correctly identify genre only slightly better than chance. Perceived differences were attributed mainly to hand conﬁgurations rather than hand size.
So much for individuals clapping, but what about applause. That’s when some interesting physics comes into play. Neda and colleagues recorded applause from several theatre and opera performances. They observed that the applause begins with incoherent random clapping, but then synchronization and periodic behaviour develops after a few seconds. This transition can be quite sudden and very strong, and is an unusual example of self-organization in a large coupled system. Neda gives quite a clear explanation of what is happening, and why.
Here’s a nice video of the phenomenon.
The fact that sonic aspects of hand claps can differ so significantly, and can often be identified by listeners, suggests that it may be possible to tell a lot about the source by signal analysis. Such was the case in work by Jylhä and colleagues, who proposed methods to identify a person by their hand claps, or identify the configuration (à la Repp’s study) of the hand clap. Christian Uhle looked at the more general question of identifying applause in an audio stream.
Understanding of applause, beyond the synchronization phenomenon observed by Neda, is quite useful for encoding applause signals which so often accompany musical recordings- especially those recordings that are considered worth redistributing! And the important spatial and temporal aspects of applause signals are known to make then particularly tricky signals to encode and decode. As noted in research by Adami and colleagues, the more standard perceptual features like pitch or loudness do not do a good job of characterising grainy sound textures like applause. They introduced a new feature, applause density, which is loosely related to the overall clapping rate, but derived from perceptual experiments. Just a month before this blog entry, Adami and co-authors published a follow-up paper which used density and other characteristics to investigate the realism of upmixed (mono to stereo) applause signals. In fact, talking with one of the co-authors was a motivation for me to write this entry.
Upmixing is an important problem in its own right. But the placement and processing of sounds for a stereo or multichannel environment can be considered part of the general problem of sound synthesis. Synthesis of clapping and applause sounds was covered in detail, and to great effect, by Peltola and co-authors. They presented physics-based analysis, synthesis, and control systems capable of both producing individual hand-claps, or mimicking the applause of a group of clappers. The synthesis models were derived from experimental measurements and built both on the work of Repp and of Neda. Researchers here in the Centre for Digital Music’s Audio Engineering research team are trying to build on their work, creating a synthesis system that could incorporate cheering and other aspects of an appreciative crowd. More on that soon, hopefully.
“I think that’s just how the world will come to an end: to general applause from wits who believe it’s a joke.”
― Søren Kierkegaard, Either/Or, Part I
And for those who might be interested, here’s a short bibliography of applause and hand-clapping references;
1. Adami, A., Disch, S., Steba, G., & Herre, J. ‘Assessing Applause Density Perception Using Synthesized Layered Applause Signals,’ 19th International Conference on Digital Audio Effects (DAFx-16), Brno, Czech Republic, 2016
2. Adami, A.; Brand, L.; Herre, J., ‘Investigations Towards Plausible Blind Upmixing of Applause Signals,’ 142nd AES Convention, May 2017
3. W. Ahmad, AM Kondoz, Analysis and Synthesis of Hand Clapping Sounds Based on Adaptive Dictionary. ICMC, 2011
4. K. Hanahara, Y. Tada, and T. Muroi, “Human-robot communication by means of hand-clapping (preliminary experiment with hand-clapping language),” IEEE Int. Conf. on Systems, Man and Cybernetics(ISIC-2007),Oct2007,pp.2995–3000.
5. Farner, Snorre; Solvang, Audun; Sæbo, Asbjørn; Svensson, U. Peter ‘Ensemble Hand-Clapping Experiments under the Influence of Delay and Various Acoustic Environments’, Journal of the Audio Engineering Society, Volume 57 Issue 12 pp. 1028-1041; December 2009
6. A. Jylhä and C. Erkut, “Inferring the Hand Conﬁguration from Hand Clapping Sounds,” 11th International Conference on Digital Audio Effects (DAFx-08), Espoo, Finland, 2008.
7. Jylhä, Antti; Erkut, Cumhur; Simsekli, Umut; Cemgil, A. Taylan ‘Sonic Handprints: Person Identification with Hand Clapping Sounds by a Model-Based Method’, AES 45th Conference, March 2012
8. Kawahara, Kazuhiko; Kamamoto, Yutaka; Omoto, Akira; Moriya, Takehiro ‘Evaluation of the Low-Delay Coding of Applause and Hand-Clapping Sounds Caused by Music Appreciation’ 138th AES Convention, May 2015.
9. Kawahara, Kazuhiko; Fujimori, Akiho; Kamamoto, Yutaka; Omoto, Akira; Moriya, Takehiro Implementation and Demonstration of Applause and Hand-Clapping Feedback System for Live Viewing,’ 141st AES Convention, September 2016.
10. Laitinen, Mikko-Ville; Kuech, Fabrian; Disch, Sascha; Pulkki, ‘Ville Reproducing Applause-Type Signals with Directional Audio Coding,’ Journal of the Audio Engineering Society, Volume 59 Issue 1/2 pp. 29-43; January 2011
11. Z. Néda, E. Ravasz, T. Vicsek, Y. Brechet, and A.-L. Barabási, “Physics of the rhythmic applause,” Phys. Rev. E, vol. 61, no. 6, pp. 6987–6992, 2000.
12. Z. Néda, E. Ravasz, Y. Brechet, T. Vicsek, and A.-L. Barabási, “The sound of many hands clapping: Tumultuous applause can transform itself into waves of synchronized clapping,” Nature, vol. 403, pp. 849–850, 2000.
13. Z. Néda, A. Nikitin, and T. Vicsek. ‘Synchronization of two-mode stochastic oscillators: a new model for rythmic applause an much more,’ Physica A: Statistical Mechanics and its Applications, 321:238–247, 2003.
14. L. Peltola, C. Erkut, P. R. Cook, and V. Välimäki, “Synthesis of Hand Clapping Sounds,”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 3, pp. 1021– 1029, 2007.
15. B. H. Repp. ‘The sound of two hands clapping: an exploratory study,’ J. of the Acoustical Society of America, 81:1100–1109, April 1987.
16. P. Seetharaman, S. P. Tarzia, ‘The Hand Clap as an Impulse Source for Measuring Room Acoustics,’ 132nd AES Convention, April 2012.
17. Uhle, C. ‘Applause Sound Detection’ , Journal of the Audio Engineering Society, Volume 59 Issue 4 pp. 213-224, April 2011
I recently found out about an interesting little experiment where it was shown that people could identify when hot or cold water was being poured from the sound alone. This is a little surprising since we don’t usually think of temperature as having a sound.
Here are two sound samples;
Which one do you think was hot water and which was cold water? Scroll down for the answer..
The work was first done by a London advertising agency, Condiment Junkie, who use sound design in branding and marketing, in collaboration with researchers from University of Oxford, and they published a research paper on this. The experiment is first described in Condiment Junkie’s blog, and was picked up by NPR and lots of others. There’s even a YouTube video about this phenomenon that has over 600,000 views.
But its all speculation. Most of the arguments are half-formed and involve a fair amount of handwaving. No one actually analysed the audio.
So I put the two samples above through some analysis using Sonic Visualiser. Spectrograms are very good for this sort of thing because they show you how the frequency content is changing over time. But you have to be careful because if you don’t choose how to visualise it carefully, you’ll easily overlook the interesting stuff.
Here’s the spectrograms of the two files, cold water on top, hot water on bottom. Frequency is on a log scale (otherwise all the detail will be crammed at the bottom) and the peak frequencies are heavily emphasised (there’s an awful lot of noise).
There’s more analysis than shown, but the most striking feature is that the same frequencies are present in both signals! There is a strong, dominant frequency that linearly increases from about 650 Hz to just over 1 kilohertz. And there is a second frequency that appears a little later, starting at around 720 Hz, falling all the way to 250 Hz, then climbing back up again.
The higher frequency line in the spectrogram which linearly increases could be related to the volume of air left in the vessel the liquid is being poured into. As the fluid is poured in the volume of air decreases and the resonant frequency of the remaining ‘chamber’ increases.The lower line of frequencies could be related to the force of liquid being added. As the pouring speed increases, increasing the force, the falling liquid pushes further into the reservoir. This means a deeper column of air is trapped and becomes a bubble. The larger the bubble the lower the resonant frequency. This is the theory of Minneart and described in the attached paper.My last thought was that for hot water, especially boiling, there will be steam in the vessel and surrounding the contact area of the pour. Perhaps the steam has an acoustic filtering effect and/or a physical effect on the initial pour or splashes.