What the f*** are DFA faders?

I’ve been meaning to write this blog entry for a while, and I’ve finally gotten around to it. At the 142nd AES Convention, there were two papers that really stood out which weren’t discussed in our convention preview or convention wrap-up. One was about Acoustic Energy Harvesting, which we discussed a few weeks ago, and the other was titled ‘The DFA Fader: Exploring the Power of Suggestion in Loudness The DFA Fader: Exploring the Power of Suggestion in Loudness Judgments.’ When I mentioned this paper to others, their response was always the same, “What’s a DFA Fader?” . Well, the answer is hinted at in the title of this blog entry.

The basic idea is that musicians often give instructions to the sound engineer that he or she can’t or doesn’t want to follow. For instance, a vocalist might say “Turn me up” in a soundcheck, but the sound engineer knows that the vocals are at a nice level already and any more amplification might cause feedback. Sometimes, this sort of thing can be communicated back to the musician in a nice way. But there’s also the fallback option; a fader on the mixing console that “Does F*** All”, aka DFA. The engineer can slide the fader or twiddle an unconnected dial, smile back and say ‘Ok, does this sound a bit better?’.

A couple of companies have had fun with this idea. Funk Logic’s Palindrometer, shown below, is nothing more than a filler for empty rack space. Its an interface that looks like it might do something, but at best, it just flashes some LEDs when one toggles switches and turns the knobs.

pal_main

RANE have the PI 14 Pseudoacoustic Infector . Its worth checking out the full description, complete with product review and data sheets. I especially like the schematic, copied below.

pi14bd.png

And in 2014, our own Brecht De Man  released The Wire, a freely available VST and AudioUnit plug-in that emulates a gold-plated, balanced, 100% lossless audio connector.

TheWire

Anyway, the authors of this paper had the bright idea of doing legitimate subjective evaluation of DFA faders. They didn’t make jokes in the paper, not even to explain the DFA acronym. They took 22 participants and divided them into an 11 person control group and an 11 person test group. In the control group, each subject participated in twenty trials where two identical musical excerpts were presented and the subject had to rate the difference in loudness of vocals between the two excerpts. Only ten excerpts were used, so each pair was used in two trials. In the test group, a sound engineer was present and he made scripted suggestions that he was adjusting the levels in each trial. He could be seen, but participants couldn’t see his hands moving on the console.

Not surprisingly, most trials showed a statistically significant difference between test and control groups, confirming the effectiveness of verbal suggestions associated with the DFA fader. And the authors picked up on an interesting point; results were far more significant for stimuli where vocals were masked by other instruments. This links the work to psychoacoustic studies. Not only is our perception of loudness/timbre influenced by the presence of a masker, but we have a more difficult time judging loudness and hence are more likely to accept the suggestion from an expert.

The authors did an excellent job of critiquing their results. But unfortunately, the full data was not made available with the paper. So we are left with a lot of questions. What were these scripted suggestions? It could make a big difference if the engineer said “I’m going to turn the vocals way up” versus “Let me try something. Does it sound any different now?” And were some participants immune to the suggestions? And because participants couldn’t see a fader being adjusted (interviews with sound engineers had stressed the importance of verbal suggestions), we don’t know how that could influence results.

There is something else that’s very interesting about this. It’s a ‘false experiment’. The whole listening test is a trick since for all participants and in all trials, there was never any loudness differences between the two presented stimuli. So indirectly, it looks at an ‘auditory placebo effect’ that is more fundamental than DFA faders. What were the ratings for loudness differences that participants gave? For the control group especially, did they judge these differences to be small because they trusted their ears, or large because they knew that loudness judging is the nature of the test? Perhaps there is a natural uncertainty in loudness perception regardless of bias. How much weaker does a listener’s judgment become when repeatedly asked to make very subtle choices in a listening test? There’s been some prior work tackling some of these questions, but I think this DFA Faders paper opened up a lot of avenues of interesting research.

The future of headphones

headphonememe

Headphones have been around for over a hundred years, but recently there has been a surge in new technologies, spurred on in part by the explosive popularity of Beats headphones. In this blog, we will look at three advances in headphones arising from high tech start-ups. I’ve been introduced to each of these companies recently, but don’t have any affiliation with them.

EAVE (formerly Eartex) are a London-based company, who have developed headphones aimed at the industrial workplace; construction sites, the maritime industry… Typical ear defenders do a good job of blocking out noise, but make communication extremely difficult. EAVE’s headphones are designed to protect from excessive noise, yet still allow effective communication with others. One of the founders, David Greenberg, has a background in auditory neuroscience, focusing on hearing disorders. He brought that knowledge to the company. He used his knowledge of hearing aids to design headphones that amplify speech while attenuating noise sources. They are designed for use in existing communication networks, and use beam forming microphones to focus the microphone on the speaker’s voice. They also have sensors to monitor noise levels so that noise maps can be created and personal noise exposure data can be gathered.

This use of additional sensors in the headset opens up lots of opportunities. Ossic are a company that emerged from Abbey Road Red, the start-up incubator established by the legendary Abbey Road Studios. Their headphone is packed with sensors, measuring the shape of your ears, head and torso. This allows them to estimate your own head-related transfer function, or HRTF, which describes how sounds are filtered as they travel from to your ear canal. They can then apply this filtering to the headphone output, allowing sounds to be far more accurately placed around you. Without HRTF filtering, sources always appear to be coming from inside your head.

Its not as simple as that of course. For instance, when you move your head, you can still identify the direction of arrival of different sound sources. So the Ossic headphones also incorporate head tracking. And a well-measured HRTF is essential for accurate localization, but calibration to the ear is not perfect. So their headphones also have eight drivers rather than the usual two, allowing more careful positioning of sounds over a wide range of frequencies.

Ossic was funded by a Kickstarter campaign. Another headphone start-up, Ora, currently has a Kickstarter campaign. Ora is a venture that was founded at Tandem Launch, who create companies often arising from academic research, and have previously invested in research arising from the audio engineering research team behind this blog.

Ora aim to release ‘the world’s first graphene headphones.’ Graphene is a form of carbon, shaped in a one atom thick lattice of hexagons. In 2004, Andre Geim and Konstantin Novoselov of the University of Manchester, isolated the material, analysed its properties, and showed how it could be easily fabricated, for which they won the Nobel prize in 2010. Andre Geim, by the way, is a colourful character, and the only person to have won both the Nobel and Ig Nobel prizes, the latter awarded for experiments involving levitating frogs.

graphene-headGraphene

Graphene has some amazing properties. Its 200 times stronger than the strongest steel, efficiently conducts heat and electricity and is nearly transparent. In 2013, Zhou and Zettl published early results on a graphene-based loudspeaker. In 2014, Dejan Todorovic and colleagues investigated the feasibility of graphene as a microphone membrane, and simulations suggested that it could have high sensitivity (the voltage generated in response to a pressure input) over a wide frequency range, far better than conventional microphones. Later that year, Peter Gaskell and others from McGill University performed physical and acoustical measurements of graphene oxide which confirmed Todorovic’s simulation results. Interestingly, they seemed unaware of Todorovic’s work.

graphene_speaker_640Graphene loudspeaker, courtesy Zettl Research Group, Lawrence Berkeley National Laboratory and University of California at Berkeley

Ora’s founders include some of the graphene microphone researchers from McGill University. Ora’s headphone uses a Graphene-based composite material optimized for use in acoustic transducers. One of the many benefits is the very wide frequency range, making it an appealing choice for high resolution audio reproduction.

I should be clear. This blog is not meant as an endorsement of any of the mentioned companies. I haven’t tried their products. They are a sample of what is going on at the frontiers of headphone technology, but by no means cover the full range of exciting developments. Still, one thing is clear. High-end headphones in the near future will sound very different from the typical consumer headphones around today.

Applause, applause! (thank you, thank you. You’re too kind)

“You must be prepared to work always without applause.”
―  Ernest Hemingway, By-line

In a recent blog entry , we discussed research into the sound of screams. Its one of those everyday sounds that we are particularly attuned to, but that there hasn’t been much research on. This got me thinking about what are some other under-researched sounds. Applause certainly fits. We all know when we hear it, and a quick search of famous quotes reveals that there are many ways to describe the many types of applause; thunderous applause, tumultuous applause, a smattering of applause, sarcastic applause, and of course, the dreaded slow hand clap. But from an auditory perspective, what makes it special?

Applause is nothing more than the sound of many people gathered in one place clapping their hands. Clapping your hands together is one of the simplest ways in which we can approximate an impulse, or short broadband sound, without the need for any equipment. Impulsive sounds are used for rhythm, for tagging important moments on a timeline, or for estimating the acoustic properties of a room. clappers and clapsticks are musical instruments, typically consisting of two pieces of wood that are clapped together to produce percussive sounds. In film and television, clapperboards have widespread use. The clapperboard produces a sharp clap noise that can be easily identified on the audio track, and the shutting of the clapstick at the top of the board can similarly be identified on the visual track. Thus, they are effective used to synchronising sound and picture, as well as to designate the starts of scenes or takes during production. And in acoustic measurement, if one can produce an impulsive sound at a given location and record the result, one can get an idea of the reverberation that the room will apply to any sound produced from that location.

Murri_artefacts_clapsticksclapstick
But a hand clap is a crude approximation for an impulse. Hand claps do not have completely flat impulse responses, are not completely omnidirectional, have significant duration and are not very high energy. Seetharaman and colleagues investigated the effectiveness of hand claps as impulse sources. They found that, with a small amount of additional but automated signal processing, the claps can produce reliable acoustical measurements.
Hanahara, Tada and Muroi exploited the impulse-like nature of hand claps for devising a means of Human-Robot Communication. The hand claps and their timing are relatively easy for a robot to decode, and not that difficult for a human to encode. But why the authors completely dismissed Morse code and all other simple forms of binary encoding is beyond me. And as voice recognition and related technologies continue to advance, the need for hand clap-based communication diminishes.
So what does a single hand clap sound like? This whole field of applause and clapping studies originated with a well-cited 1987 study by Bruno Repp, “The sound of two hands clapping.” He distinguished 8 hand clap positions;
Hands parallel and flat
P1: palm-to-palm
P2: halfway between P1 and P3
P3: fingers-to-palm

Hands held at an angle
A1: palm-to-palm
A2: halfway between P1 and P3
A3: fingers-to-palm
A1+: A1 with hands very cupped
A1-: A1 with hands fully flat

The figure below shows photos of these eight configurations of hand claps, excerpted from Leevi Peltola’s 2004 MSc thesis.

clap positions.png

Repp’s acoustic analyses and perceptual experiments mainly involved 20 test subjects who were each asked to clap at their normal rate for 10 seconds in a quiet room. The spectra of individual claps varied widely, but there was no evidence of influence of sex or hand size on the clap spectrum. He also measured his own clapping with the eight modes above. If the palms struck each other (P1, A1) there was a narrow frequency peak below 1 kHz together with a notch around 2.5 kHz. If the fingers of one hand struck the palm of the other hand (P3, A3) there was a broad spectral peak near 2 kHz.

Repp then tried to determine whether the subjects were able to extract information about the clapper from listening to the signal. Subjects generally assumed that slow, loud and low-pitched hand claps were from male clappers, and fast, soft and high-pitched hand claps were from female clappers. But this was not the case. The speed, intensity and pitch were uncorrelated with sex and thus it seemed that test subjects could correctly identify genre only slightly better than chance. Perceived differences were attributed mainly to hand configurations rather than hand size.

So much for individuals clapping, but what about applause. That’s when some interesting physics comes into play. Neda and colleagues recorded applause from several theatre and opera performances. They observed that the applause begins with incoherent random clapping, but then synchronization and periodic behaviour develops after a few seconds. This transition can be quite sudden and very strong, and is an unusual example of self-organization in a large coupled system. Neda gives quite a clear explanation of what is happening, and why.

Here’s a nice video of the phenomenon.

The fact that sonic aspects of hand claps can differ so significantly, and can often be identified by listeners, suggests that it may be possible to tell a lot about the source by signal analysis. Such was the case in work by Jylhä and colleagues, who proposed methods to identify a person by their hand claps, or identify the configuration (à  la Repp’s study) of the hand clap. Christian Uhle looked at the more general question of identifying applause in an audio stream.

Understanding of applause, beyond the synchronization phenomenon observed by Neda, is quite useful for encoding applause signals which so often accompany musical recordings- especially those recordings that are considered worth redistributing! And the important spatial and temporal aspects of applause signals are known to make then particularly tricky signals to encode and decode. As noted in research by Adami and colleagues, the more standard perceptual features like pitch or loudness do not do a good job of characterising grainy sound textures like applause. They introduced a new feature, applause density, which is loosely related to the overall clapping rate, but derived from perceptual experiments. Just a month before this blog entry, Adami and co-authors published a follow-up paper which used density and other characteristics to investigate the realism of upmixed (mono to stereo) applause signals. In fact, talking with one of the co-authors was a motivation for me to write this entry.

Upmixing is an important problem in its own right. But the placement and processing of sounds for a stereo or multichannel environment can be considered part of the general problem of sound synthesis. Synthesis of clapping and applause sounds was covered in detail, and to great effect, by Peltola and co-authors. They presented physics-based analysis, synthesis, and control systems capable of both producing individual hand-claps, or mimicking the applause of a group of clappers. The synthesis models were derived from experimental measurements and built both on the work of Repp and of Neda. Researchers here in the Centre for Digital Music’s Audio Engineering research team are trying to build on their work, creating a synthesis system that could incorporate cheering and other aspects of an appreciative crowd. More on that soon, hopefully.

“I think that’s just how the world will come to an end: to general applause from wits who believe it’s a joke.”
― Søren Kierkegaard, Either/Or, Part I

And for those who might be interested, here’s a short bibliography of applause and hand-clapping references;

1. Adami, A., Disch, S., Steba, G., & Herre, J. ‘Assessing Applause Density Perception Using Synthesized Layered Applause Signals,’ 19th International Conference on Digital Audio Effects (DAFx-16), Brno, Czech Republic, 2016
2. Adami, A.; Brand, L.; Herre, J., ‘Investigations Towards Plausible Blind Upmixing of Applause Signals,’ 142nd AES Convention, May 2017
3. W. Ahmad, AM Kondoz, Analysis and Synthesis of Hand Clapping Sounds Based on Adaptive Dictionary. ICMC, 2011
4. K. Hanahara, Y. Tada, and T. Muroi, “Human-robot communication by means of hand-clapping (preliminary experiment with hand-clapping language),” IEEE Int. Conf. on Systems, Man and Cybernetics(ISIC-2007),Oct2007,pp.2995–3000.
5. Farner, Snorre; Solvang, Audun; Sæbo, Asbjørn; Svensson, U. Peter ‘Ensemble Hand-Clapping Experiments under the Influence of Delay and Various Acoustic Environments’, Journal of the Audio Engineering Society, Volume 57 Issue 12 pp. 1028-1041; December 2009
6. A. Jylhä and C. Erkut, “Inferring the Hand Configuration from Hand Clapping Sounds,” 11th International Conference on Digital Audio Effects (DAFx-08), Espoo, Finland, 2008.
7. Jylhä, Antti; Erkut, Cumhur; Simsekli, Umut; Cemgil, A. Taylan ‘Sonic Handprints: Person Identification with Hand Clapping Sounds by a Model-Based Method’, AES 45th Conference, March 2012
8. Kawahara, Kazuhiko; Kamamoto, Yutaka; Omoto, Akira; Moriya, Takehiro ‘Evaluation of the Low-Delay Coding of Applause and Hand-Clapping Sounds Caused by Music Appreciation’ 138th AES Convention, May 2015.
9. Kawahara, Kazuhiko; Fujimori, Akiho; Kamamoto, Yutaka; Omoto, Akira; Moriya, Takehiro Implementation and Demonstration of Applause and Hand-Clapping Feedback System for Live Viewing,’ 141st AES Convention, September 2016.
10. Laitinen, Mikko-Ville; Kuech, Fabrian; Disch, Sascha; Pulkki, ‘Ville Reproducing Applause-Type Signals with Directional Audio Coding,’ Journal of the Audio Engineering Society, Volume 59 Issue 1/2 pp. 29-43; January 2011
11. Z. Néda, E. Ravasz, T. Vicsek, Y. Brechet, and A.-L. Barabási, “Physics of the rhythmic applause,” Phys. Rev. E, vol. 61, no. 6, pp. 6987–6992, 2000.
12. Z. Néda, E. Ravasz, Y. Brechet, T. Vicsek, and A.-L. Barabási, “The sound of many hands clapping: Tumultuous applause can transform itself into waves of synchronized clapping,” Nature, vol. 403, pp. 849–850, 2000.
13. Z. Néda, A. Nikitin, and T. Vicsek. ‘Synchronization of two-mode stochastic oscillators: a new model for rythmic applause an much more,’ Physica A: Statistical Mechanics and its Applications, 321:238–247, 2003.
14. L. Peltola, C. Erkut, P. R. Cook, and V. Välimäki, “Synthesis of Hand Clapping Sounds,”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 3, pp. 1021– 1029, 2007.
15. B. H. Repp. ‘The sound of two hands clapping: an exploratory study,’ J. of the Acoustical Society of America, 81:1100–1109, April 1987.
16. P. Seetharaman, S. P. Tarzia, ‘The Hand Clap as an Impulse Source for Measuring Room Acoustics,’ 132nd AES Convention, April 2012.
17. Uhle, C. ‘Applause Sound Detection’ , Journal of the Audio Engineering Society, Volume 59 Issue 4 pp. 213-224, April 2011

Why can you hear the difference between hot and cold water ?

I recently found out about an interesting little experiment where it was shown that people could identify when hot or cold water was being poured from the sound alone. This is a little surprising since we don’t usually think of temperature as having a sound.
Here are two sound samples;

Which one do you think was hot water and which was cold water? Scroll down for the answer..

.
.

.

..
..

.
Keep scrolling
.
.
.
.
.
.
.
.
Yes, the first sound sample was cold water being poured, and the second was hot water.
The work was first done by a London advertising agency, Condiment Junkie, who use sound design in branding and marketing, in collaboration with researchers from University of Oxford, and they published a research paper on this. The experiment is first described in Condiment Junkie’s blog, and was picked up by NPR and lots of others. There’s even a YouTube video about this phenomenon that has over 600,000 views.
However, there wasn’t really a good explanation as to why we hear the difference. The academic paper did not really discuss this. The youtube video simply states ‘change in the splashing of the water changes the sound that it makes because of various complex fluid dynamic reasons,’ which really doesn’t explain anything. According to one of the founders of Condiment Junkie, “more bubbling in a liquid that’s hot… you tend to get higher frequency sounds from it,” but further discussion on NPR noted “Cold water is more viscous… That’s what makes that high pitched ringing.” Are they both right? There is even a fair amount of discussion of this on physics forums.
But its all speculation. Most of the arguments are half-formed and involve a fair amount of handwaving. No one actually analysed the audio.

So I put the two samples above through some analysis using Sonic Visualiser. Spectrograms are very good for this sort of thing because they show you how the frequency content is changing over time. But you have to be careful because if you don’t choose how to visualise it carefully, you’ll easily overlook the interesting stuff.

Here’s the spectrograms of the two files, cold water on top, hot water on bottom. Frequency is on a log scale (otherwise all the detail will be crammed at the bottom) and the peak frequencies are heavily emphasised (there’s an awful lot of noise).

cold

hot

There’s more analysis than shown, but the most striking feature is that the same frequencies are present in both signals! There is a strong, dominant frequency that linearly increases from about 650 Hz to just over 1 kilohertz. And there is a second frequency that appears a little later, starting at around 720 Hz, falling all the way to 250 Hz, then climbing back up again.

These frequencies are pretty much the same in both hot and cold cases. The difference is mainly that cold water has a much stronger second frequency (the one that dips).
So all those people who speculated on why and how hot and cold water sound different seem to have gotten it wrong. If they had actually analysed the audio, they would have seen that the same frequencies are produced, but with different strengths.
My first guess was that the second frequency is due to the size of water droplets being dependent on the rate of water flow. When more water is flowing, in the middle of the pour, the droplets are large and so produce lower frequencies. Hot water is less viscuous (more runny) and so doesn’t separate into these droplets so much.
I was less sure about the first frequency. Maybe this is due to a default droplet size, and only some water droplets have a larger size. But why would this first frequency be linearly increasing? Maybe after water hits the surface, it always separates into small droplets and so this is them splashing back down after initial impact. Perhaps, the more water on the floor, the smaller the droplets splashing back up, giving the increase in this frequency.
But Rod Selfridge, a researcher in the Audio Engineering team here, gave a better possible explanation, which I’ll repeat verbatim here.
The higher frequency line in the spectrogram which linearly increases could be related to the volume of air left in the vessel the liquid is being poured into. As the fluid is poured in the volume of air decreases and the resonant frequency of the remaining ‘chamber’ increases.
The lower line of frequencies could be related to the force of liquid being added. As the pouring speed increases, increasing the force, the falling liquid pushes further into the reservoir. This means a deeper column of air is trapped and becomes a bubble. The larger the bubble the lower the resonant frequency. This is the theory of Minneart and described in the attached paper.
My last thought was that for hot water, especially boiling, there will be steam in the vessel and surrounding the contact area of the pour. Perhaps the steam has an acoustic filtering effect and/or a physical effect on the initial pour or splashes.
 Of course, a more definitive answer would involve a few experiments, pouring differing amounts of water into differing containers. But I think this already demonstrates the need to test the theory of what sound will occur against analysis of the actual sounds produced.

Scream!

Audio and informatics researchers are perhaps quite familiar with retrieval systems that try to analyse recordings to identify when an important word or phrase was spoken, or when a song was played. But I once did some collaboration with a company who did laughter and question detection, two audio informatics problems I hadn’t heard of before. I asked them about it. The company was developing audio analytics software to assist Call Centres. Call Centres wanted to keep track of the unusual or problematic calls, and in particular, any laughter when someone is calling tech support would be worth investigating. And I suppose all sorts of unusual sounds should indicate that something about the call is worth noting. Which brings me to the subject of this blog entry.

scream

Screams!

Screams occupy an important evolutionary niche, since they are used as a warning and alert signal, and hence are intended to be a sound which we strongly and quickly focus on. A 2015 study by Arnal et al. showed that screams contain a strong modulation component, typically within the 30 to 150 Hz range. This sort of modulation is sometimes called roughness. Arnal showed that roughness occurs in both natural and artificial alarm sounds, and that adding roughness to a sound can make it be perceived as more alarming or fearful.

This new study suggests that a peculiar set of features may be appropriate for detecting screams. And like most fields of research, if you dig deep enough, you find that quite a few people have already scratched the surface.

I did a quick search of AES and IEEE papers and found ten that had ‘scream’ in the title, not counting those referring to systems or algorithms given the acronym SCREAM. This is actually very few, indicating that the field is underdeveloped. One of them, is really about screams and growls in death metal music, which though interesting in its own right, is quite different. Most of the rest all seem to mostly just ‘applying my favourite machine learning technique to scream data’. This is an issue with a lot of papers, deserving of a blog entry in future.

But one of the most detailed analyses of screams was conducted by audio forensics researcher and consultant Durand Begault. In 2008 he published  ‘Forensic Analysis of the Audibility of Female Screams’ In it, he notes “the local frequency modulation (‘warble’ or ‘vibrato’)” that was later focused on in Arnal’s paper.

Begault also has some interesting discussion of investigations of scream audibility for a court case. He was asked to determine whether a woman screaming in one location could be heard by potential witnesses in a nearby community. He tested this on site by playing back prerecorded screams at the site of the incident. The test screams were generated by asking female subjects ‘to scream as loudly as possible, as if you had just been surprised by something very scary.’ Thirty screams were recorded, ranging from 123 to 102 decibels. The end result was that these screams could easily be heard more than 100 meters away, even with background noise and obstructions.

This is certainly not the only audio analysis and processing that has found its way into the courtroom. One high profile case was in February 2012. Neighborhood watch coordinator George Zimmerman shot and killed black teenager Trayvon Martin in Sanford, Florida. In Zimmerman’s trial for second degree murder, experts offered analysis of a scream heard in the background of a 911 phone call that also captured the sound of the gunshot that killed Martin. If the screamer was Zimmerman, it would strengthen the case that he acted in self-defense, but if it was Martin, it would imply that Zimmerman was the aggressor. But FBI audio analysis experts testified in the case about the difficulties in identifying the speaker, or even his age, from the screams , and news outlets also called on experts who noted the lack of robust ‘screamer identification’ technologies.

The issue of scream audibility thus begs the question, ‘how loud is a scream.’ We know they can be attention-grabbing, ear –piercing shrieks. The loudest scream Begault recorded was 123 dB, and he stated that scream “frequency content seems almost tailored to frequencies of maximal sensitivity on an equal-loudness contour.”

And apparently, one can get a lot louder with a scream than a shout. According to the Guinness Book of World Records, the loudest shout was 121.7 dBA by Annalisa Flanagan, shouting the word ‘Quiet!’. And the loudest scream ever recorded is 129 dB (C-Weighted), by Jill Drake. Not surprisingly, both Jill and Annalisa are teachers, who seem to have found a very effective way to deal with unruly classrooms.

Interestingly, one might have a false conception of the diversity of screaming sounds if one’s understanding is based on films. The Wilhelm Scream, a sound sample that has been used in over 300 films. This overuse perhaps gives a certain familiarity to the listener, and lessens the alarming nature of the sound.

For more on the Wilhelm scream, see the blog entry ‘Swinging microphones and slashing lightsabres’. But here’s a short video on the sound, which includes a few more examples of its use than were given in the previous blog entry.

The Benefits of Browser-Based Listening Tests

Listening tests, or subjective evaluation of audio, are an essential tool in almost any form of audio and music related research, from data compression codecs over loudspeaker design to realism of sound effects. Sadly, because of the time and effort required to carefully design a test and convince a sufficient number of participants, it is also quite an expensive process.

The advent of web technologies like the Web Audio API, enabling elaborate audio applications within a web page, offers the opportunity to develop browser-based listening tests which mitigate some of the difficulties associated with perceptual evaluation of audio. Researchers at the Centre for Digital Music and Birmingham City University’s Digital Media Technology Lab have developed the Web Audio Evaluation Tool [1] to facilitate listening test design for any experimenter regardless of their programming experience, operating system, test paradigm, interface layout, and location of their test subjects.

APE interface - Web Audio Evaluation Tool

Web Audio Evaluation Tool: An example single-axis, multiple stimuli listening test with comment fields.

Here we cover some of the reasons why you would want to carry out a listening test in the browser, using the Web Audio Evaluation Tool as a case study.

Remote tests

The first and most obvious reason to use a browser-based listening test platform. If you want to conduct a perceptual evaluation study online, i.e. host a website where participants can take the test, then there are no two ways about it: you need a listening test that works within the browser, i.e. one that is based on HTML and JavaScript.

A downloadable application is rarely an elegant solution, and only the most determined participants will end up taking the test if they can get it to work. A website, however, amounts to a very low-threshold participation.

Pros

  • Low effort A remote test means no booking and setting up of a listening room, showing the participant into the building, …
  • Scales easily If you can conduct the test once, you can conduct it a virtually unlimited number of times, as long as you find the participants. Amazon Turk or similar services could be helpful with this.
  • Different locations/cultures/languages within reach For some types of research, it is necessary to include (a high number of) participants with certain geographical locations, cultural backgrounds and/or native languages. When these are scarce nearby, and you cannot find the time or funds to fly around the world, a remote listening test can be helpful.

Cons

  • Limited programming languages For the implementation of the test, you are basically constrained to using web technologies such as JavaScript. For someone used to using e.g. MATLAB or C++, this can be off-putting. This is one of the reasons we aim to offer a tool that doesn’t involve any coding for most of the use cases.
  • Loss of control A truly remote test means that you are not present to talk to the participant and answer questions, or notice they misunderstand the instructions. You also have little information on their playback system (make and model, how it is set up, …) and you often know less about their background.

Depending on the type of test and research, you may or may not want to go ‘remote’ or ‘local’.
However, it has been shown for certain tasks that there is no significant difference between results from local and remote tests [2,3].

Furthermore, a tool like the Web Audio Evaluation Tool has many safeguards to compensate this loss of control. Examples of these features include

  • Extensive metrics Timestamps corresponding with playback and movement events can be automatically visualised to show when participants auditioned which samples and for how long; when they moved which slider from where to where; and so on.
  • Post-test checks Upon submissions, optional dialogs can remind the participant of certain instructions, e.g. to listen to all fragments; to move all sliders at least once; to rate at least one stimulus below 20% or at exactly 100%; …
  • Audiometric test and calibration of the playback system An optional series of sliders shown at the start of a test, to be set by the participant so that sine waves an octave apart are all equally loud.
  • Survey questions Most relevant background information on the participant’s background and playback system can be captured by well-phrased survey questions, which can be incorporated at the start or end of the test.
The Web Audio Evaluation Tool - an example test interface inspired by the popular MUSHRA standard, typical for the evaluation of audio codecs.

Web Audio Evaluation Tool – an example test interface inspired by the popular MUSHRA standard, typical for the evaluation of audio codecs.


Cross-platform, no third-party software needed


Source: Sewell Support

Listening test interfaces can be hard to design, with many factors to take into account. On top of that it may not always be possible to use your personal machine for (all) your listening tests, even when all your tests are ‘local’.

When your interface requires a third party, proprietary tool like MATLAB or Max to be set up, this can pose a problem as this may not be available where the test is to take place. Furthermore, upgrades to newer versions of this third party software has been known to ‘break’ listening test software, meaning many more hours of updating and patching.

This is a much bigger problem when the test is to take place at different locations, with different computers and potentially different versions of operating systems or other software.

This has been the single most important driving factor behind the development of the Web Audio Evaluation Tool, even for projects where all tests were controlled, i.e. not hosted on a web server, with ‘internet strangers’ as participants, but in a dedicated listening room with known, skilled participants. Because these listening rooms can have very different computers, operating systems, and geographical locations, using a standalone test or a third party application such as MATLAB is often very tedious or even impossible.

In contrast, a browser-based tool typically works on any machine and operating system that supports the browsers it was designed for. In the case of the Web Audio Evaluation Tool, this means Firefox, Chrome, Edge, Safari, … Essentially every browser which supports the Web Audio API.


Multiple machines with centralised results collection

Central server
Source: jaxonraye.com

Another benefit of a browser-based listening test, again regardless of whether your test takes place ‘locally’ or ‘remotely’, is the possibility of easy, centralised collection of results of these tests. Not only is this more elegant than fetching every test result with a USB drive (from any number of computers you are using), but it is also much safer to save the result to your own server straight away. If you are more paranoid (which is encouraged in the case of listening tests), you can then back up this server continually for redundancy.

In the case of the Web Audio Evaluation Tool, you just put the test on a (local or remote) web server, and the results will be stored to this server by default.
Others have put the test on a regular file server (not web server) and run the included Python server emulator script python/pythonServer.py from the test computer. The results are then stored to the file server, which can be your personal machine on the same network.
Intermediate versions of the results are stored as well, so that an outage of the test computer means the results are not lost in the event of a computer crash, a human error or a forgotten dentist appointment. The test can be resumed at any point.

Multiple participants using the Web Audio Evaluation Tool at the same time, at Queen Mary University of London


Leveraging other web technologies


Source: LinkedIn

Finally, any listening test which is essentially a website, can be integrated within other sites or enhanced with any kind of web technologies. We have already seen clever use of YouTube videos as instructions or HTML index pages tracking progression through a series of tests.

The Web Audio Evaluation Tool seeks to facilitate this by providing the optional returnURL attribute, which specifies the page the participant is redirected to upon completion of the test. This page can be anything from a Doodle to schedule the next test session, an Amazon voucher, a reward cat video, to a secret Eventbrite page for a test participant party.


Are there any other benefits to using a browser-based tool for your listening tests? Please let us know!

[1] N. Jillings, B. De Man, D. Moffat and J. D. Reiss, “Web Audio Evaluation Tool: A browser-based listening test environment,” 12th Sound and Music Computing Conference, 2015.

[2] M. Cartwright, B. Pardo, G. Mysore and M. Hoffman, “Fast and Easy Crowdsourced Perceptual Audio Evaluation,” IEEE International Conference on Acoustics, Speech and Signal Processing, 2016.

[3] M. Schoeffler, F.-R. Stöter, H. Bayerlein, B. Edler and J. Herre, “An Experiment about Estimating the Number of Instruments in Polyphonic Music: A Comparison Between Internet and Laboratory Results,” 14th International Society for Music Information Retrieval Conference, 2013.

This post originally appeared in modified form on the Web Audio Evaluation Tool Github wiki

High resolution audio- finally, rigorously put to the test. And the verdict is…

Yes, you can hear a difference! (but it is really hard to measure)

See http://www.aes.org/e-lib/browse.cfm?elib=18296 for the June 2016 Open Access article in the Journal of the Audio Engineering Society  on “A meta-analysis of high resolution audio perceptual evaluation”

For years, I’ve been hearing people in the audio engineering community arguing over whether or not it makes any difference to record, mix and playback better than CD quality (44.1 kHz, 16 bit) or better than production quality (48 kHz, 16 bit) audio. Some people swear they can hear a difference, others have stories about someone they met who could always pick out the differences, others say they’re all just fooling themselves. A few people could mention a study or two that supported their side, but the arguments didn’t seem to ever get resolved.

Then, a bit more than a year ago I was at a dinner party where a guy sitting across from me was about to complete his PhD in meta-analysis. Meta-analysis? I’d never heard of it. But the concept, analysing and synthesising the results of many studies to get a more definitive answer and gain more insights and knowledge, really intrigued me. So it was about time that someone tried this on the question of perception of hi-res audio.

Unfortunately, no one I asked was willing to get involved. A couple of experts thought there couldn’t be enough data out there to do the meta-analysis. A couple more thought that the type of studies (not your typical clinical trial with experimental and control groups) couldn’t be analysed using the established statistical approaches in meta-analysis. So, I had to do it myself. This also meant I had to be extra careful, and seek out as much advice as possible, since no one was looking over my shoulder to tell me when I was wrong or stupid.

The process was fascinating. The more I looked, the more I uncovered studies of high resolution audio perception. And my main approach for finding them (start with a few main papers, then look at everyone they cited and everyone who cited them, and repeat with any further interesting papers found), was not mentioned in the guidance to meta-analysis that I read. Then getting the data was interesting. Some researchers had it all prepared in handy, well-labelled spreadsheets, one other found it in an old filing cabinet, one had never kept it at all! And for some data, I had to write little programs to reverse engineer the raw data from T values for trials with finite outcomes.

Formal meta-analysis techniques could be applied, and I gained a strong appreciation for both the maths behind them, and the general guidance that helps ensure rigour and helps avoid bias in the meta-study, But the results, in a few places, disagreed with what is typical. The potential biases in the studies seemed to occur more often with those that did not reject the null hypothesis, i.e., those that found no evidence for discriminating between high resolution and CD quality audio. Evidence of publication bias seemed to mostly go away if one put the studies into subgroups. And use of binomial probabilities allowed the statistical approaches in meta-analysis to be applied to studies where there was not a control group (‘no effect’ can be determined just from binomial probabilities).

The end result was that people could, sometimes, perceive the difference between hi-res and CD audio. But they needed to be trained and the test needed to be carefully designed. And it was nice to see that the experiments and analysis were generally a little better today than in the past, so research is advancing. Still, most tests had some biases towards false negatives. So perhaps, careful experiments, incorporating all the best approaches, may show this perception even more strongly.

Meta-analysis is truly fascinating, and audio engineering, psychoacoustics, music technology and related fields need more of it.