I recently found out about an interesting little experiment where it was shown that people could identify when hot or cold water was being poured from the sound alone. This is a little surprising since we don’t usually think of temperature as having a sound.
Here are two sound samples;
Which one do you think was hot water and which was cold water? Scroll down for the answer..
Yes, the first sound sample was cold water being poured, and the second was hot water.
The work was first done by a London advertising agency, Condiment Junkie, who use sound design in branding and marketing, in collaboration with researchers from University of Oxford, and they published a research paper on this. The experiment is first described in Condiment Junkie’s blog, and was picked up by NPR and lots of others. There’s even a YouTube video about this phenomenon that has over 600,000 views.
However, there wasn’t really a good explanation as to why we hear the difference. The academic paper did not really discuss this. The youtube video simply states ‘change in the splashing of the water changes the sound that it makes because of various complex fluid dynamic reasons,’ which really doesn’t explain anything. According to one of the founders of Condiment Junkie, “more bubbling in a liquid that’s hot… you tend to get higher frequency sounds from it,” but further discussion on NPR noted “Cold water is more viscous… That’s what makes that high pitched ringing.” Are they both right? There is even a fair amount of discussion of this on physics forums.
But its all speculation. Most of the arguments are half-formed and involve a fair amount of handwaving. No one actually analysed the audio.
So I put the two samples above through some analysis using Sonic Visualiser. Spectrograms are very good for this sort of thing because they show you how the frequency content is changing over time. But you have to be careful because if you don’t choose how to visualise it carefully, you’ll easily overlook the interesting stuff.
Here’s the spectrograms of the two files, cold water on top, hot water on bottom. Frequency is on a log scale (otherwise all the detail will be crammed at the bottom) and the peak frequencies are heavily emphasised (there’s an awful lot of noise).
There’s more analysis than shown, but the most striking feature is that the same frequencies are present in both signals! There is a strong, dominant frequency that linearly increases from about 650 Hz to just over 1 kilohertz. And there is a second frequency that appears a little later, starting at around 720 Hz, falling all the way to 250 Hz, then climbing back up again.
These frequencies are pretty much the same in both hot and cold cases. The difference is mainly that cold water has a much stronger second frequency (the one that dips).
So all those people who speculated on why and how hot and cold water sound different seem to have gotten it wrong. If they had actually analysed the audio, they would have seen that the same frequencies are produced, but with different strengths.
My first guess was that the second frequency is due to the size of water droplets being dependent on the rate of water flow. When more water is flowing, in the middle of the pour, the droplets are large and so produce lower frequencies. Hot water is less viscuous (more runny) and so doesn’t separate into these droplets so much.
I was less sure about the first frequency. Maybe this is due to a default droplet size, and only some water droplets have a larger size. But why would this first frequency be linearly increasing? Maybe after water hits the surface, it always separates into small droplets and so this is them splashing back down after initial impact. Perhaps, the more water on the floor, the smaller the droplets splashing back up, giving the increase in this frequency.
But Rod Selfridge, a researcher in the Audio Engineering team here, gave a better possible explanation, which I’ll repeat verbatim here.
The higher frequency line in the spectrogram which linearly increases could be related to the volume of air left in the vessel the liquid is being poured into. As the fluid is poured in the volume of air decreases and the resonant frequency of the remaining ‘chamber’ increases.
The lower line of frequencies could be related to the force of liquid being added. As the pouring speed increases, increasing the force, the falling liquid pushes further into the reservoir. This means a deeper column of air is trapped and becomes a bubble. The larger the bubble the lower the resonant frequency. This is the theory of Minneart and described in the attached paper.
My last thought was that for hot water, especially boiling, there will be steam in the vessel and surrounding the contact area of the pour. Perhaps the steam has an acoustic filtering effect and/or a physical effect on the initial pour or splashes.
Of course, a more definitive answer would involve a few experiments, pouring differing amounts of water into differing containers. But I think this already demonstrates the need to test the theory of what sound will occur against analysis of the actual sounds produced.
These conventions are quite big, with thousands of attendees, but not so big that you get lost or overwhelmed. The attendees fit loosely into five categories: the companies, the professionals and practitioners, students, enthusiasts, and the researchers. That last category is where we fit.
I thought I’d give you an idea of some of the highlights of the Convention. These are some of the events that we will be involved in or just attending, but of course, there’s plenty else going on.
On Saturday May 20th, 9:30-12:30, Dave Ronan from the team here will be presenting a poster on ‘Analysis of the Subgrouping Practices of Professional Mix Engineers.’ Subgrouping is a greatly understudied, but important part of the mixing process. Dave surveyed 10 award winning mix engineers to find out how and why they do subgrouping. He then subjected the results to detailed thematic analysis to uncover best practices and insights into the topic.
For those willing to get up bright and early Sunday morning, there’s a 9 am panel on ‘Audio Education—What Does the Future Hold,’ where I will be one of the panellists. It should have some pretty lively discussion.
From 10:45 to 12:15, our own Brecht De Man will be chairing and speaking in a Workshop on ‘New Developments in Listening Test Design.’ He’s quite a leader in this field, and has developed some great software that makes the set up, running and analysis of listening tests much simpler and still rigorous.
From 1 to 2 pm, there is the meeting of the Technical Committee on High Resolution Audio, of which I am co-chair along with Vicki Melchior. The Technical Committee aims for comprehensive understanding of high resolution audio technology in all its aspects. The meeting is open to all, so for those at the Convention, feel free to stop by.
Sunday evening at 6:30 is the Heyser lecture. This is quite prestigious, a big talk by one of the eminent people in the field. This one is given by Jorg Sennheiser of, well, Sennheiser Electronic.
Audio and informatics researchers are perhaps quite familiar with retrieval systems that try to analyse recordings to identify when an important word or phrase was spoken, or when a song was played. But I once did some collaboration with a company who did laughter and question detection, two audio informatics problems I hadn’t heard of before. I asked them about it. The company was developing audio analytics software to assist Call Centres. Call Centres wanted to keep track of the unusual or problematic calls, and in particular, any laughter when someone is calling tech support would be worth investigating. And I suppose all sorts of unusual sounds should indicate that something about the call is worth noting. Which brings me to the subject of this blog entry.
Screams occupy an important evolutionary niche, since they are used as a warning and alert signal, and hence are intended to be a sound which we strongly and quickly focus on. A 2015 study by Arnal et al. showed that screams contain a strong modulation component, typically within the 30 to 150 Hz range. This sort of modulation is sometimes called roughness. Arnal showed that roughness occurs in both natural and artificial alarm sounds, and that adding roughness to a sound can make it be perceived as more alarming or fearful.
This new study suggests that a peculiar set of features may be appropriate for detecting screams. And like most fields of research, if you dig deep enough, you find that quite a few people have already scratched the surface.
I did a quick search of AES and IEEE papers and found ten that had ‘scream’ in the title, not counting those referring to systems or algorithms given the acronym SCREAM. This is actually very few, indicating that the field is underdeveloped. One of them, is really about screams and growls in death metal music, which though interesting in its own right, is quite different. Most of the rest all seem to mostly just ‘applying my favourite machine learning technique to scream data’. This is an issue with a lot of papers, deserving of a blog entry in future.
But one of the most detailed analyses of screams was conducted by audio forensics researcher and consultant Durand Begault. In 2008 he published ‘Forensic Analysis of the Audibility of Female Screams’ In it, he notes “the local frequency modulation (‘warble’ or ‘vibrato’)” that was later focused on in Arnal’s paper.
Begault also has some interesting discussion of investigations of scream audibility for a court case. He was asked to determine whether a woman screaming in one location could be heard by potential witnesses in a nearby community. He tested this on site by playing back prerecorded screams at the site of the incident. The test screams were generated by asking female subjects ‘to scream as loudly as possible, as if you had just been surprised by something very scary.’ Thirty screams were recorded, ranging from 123 to 102 decibels. The end result was that these screams could easily be heard more than 100 meters away, even with background noise and obstructions.
This is certainly not the only audio analysis and processing that has found its way into the courtroom. One high profile case was in February 2012. Neighborhood watch coordinator George Zimmerman shot and killed black teenager Trayvon Martin in Sanford, Florida. In Zimmerman’s trial for second degree murder, experts offered analysis of a scream heard in the background of a 911 phone call that also captured the sound of the gunshot that killed Martin. If the screamer was Zimmerman, it would strengthen the case that he acted in self-defense, but if it was Martin, it would imply that Zimmerman was the aggressor. But FBI audio analysis experts testified in the case about the difficulties in identifying the speaker, or even his age, from the screams , and news outlets also called on experts who noted the lack of robust ‘screamer identification’ technologies.
The issue of scream audibility thus begs the question, ‘how loud is a scream.’ We know they can be attention-grabbing, ear –piercing shrieks. The loudest scream Begault recorded was 123 dB, and he stated that scream “frequency content seems almost tailored to frequencies of maximal sensitivity on an equal-loudness contour.”
And apparently, one can get a lot louder with a scream than a shout. According to the Guinness Book of World Records, the loudest shout was 121.7 dBA by Annalisa Flanagan, shouting the word ‘Quiet!’. And the loudest shout ever recorded is 129 dB (C-Weighted), by Jill Drake. Not surprisingly, both Jill and Annalisa are teachers, who seem to have found a very effective way to deal with unruly classrooms.
Interestingly, one might have a false conception of the diversity of screaming sounds if one’s understanding is based on films. The Wilhelm Scream, a sound sample that has been used in over 300 films. This overuse perhaps gives a certain familiarity to the listener, and lessens the alarming nature of the sound.
I was recently pointed to a blog post about doing a PhD. It had lots of interesting advice, mainly along the lines of ‘if you are finding it difficult, don’t worry, that probably means you’re doing it right.’ True, and good advice to keep in mind for PhD researchers who might be feeling lost in the wilderness. But it reminded me that I’d recently given a talk about PhD research, based on experience I have either examining or supervising dozens of theses, and some of the main points that I made are worth sharing. And I think they are applicable to research-based PhDs across lots of different disciplines.
First off, lets think of a few things that a PhD thesis is not supposed to be;
A thesis isn’t easy
See the blog I mentioned above. Easy research may still be publishable, but its not going to make a thesis. If you’re finding it easy, you’re probably missing the point.
A thesis is not only what you already know
I’ve known researchers unwilling to learn a bit of new maths, or learn what’s going on under the hood in the software they use. Expect the research to lead you out of your comfort zone.
A thesis isn’t just something you do to get a phd
It’s not simply a box that needs to be checked off so that you can get ‘Doctor’ next to your name.
A thesis isn’t obvious
If you and most others can predict the outcome in advance based on common sense, then why do it?
A thesis isn’t just several years of hard work
It may take years of hard work to achieve, but that’s not the point. You don’t get a PhD just for time and effort.
A thesis isn’t about building a system
that’s challenging and technical, and may be a byproduct of the research, but its not the research result.
A thesis isn’t a lot of little achievements
I’ve seen theses that read a bit like ‘I did this little interesting thing, then this other one, then another one…’ That doesn’t look good. If no one contribution is strong enough to be a thesis, then just putting them all into one document still isn’t a strong contribution. Note that in some cases, you can do a ‘thesis by publication’, which is a collection of papers, usually with an introduction and some wrapper information. But in which case it should still tie together with an overall contribution.
So with that in mind, lets now think about what a thesis is, with a few highlighted aspects that are often neglected.
A thesis advances knowledge
That’s the key. Some new understanding, new insights, backed up by evidence and critical thinking. But this also suggests that it needs to actually be an advance, so you really need to know the prior art. How much reading of the literature one should do is a different question, and depends on the topic, the field, and the researcher. But in my experience, researchers generally don’t explore the literature deep enough. One thing is for sure though; if the researcher ever makes the claim that no one has done this before, they better have the evidence to back that up.
A thesis is an argument
The word ‘thesis’ comes from Greek, and means an argument in the sense of putting forth a position. That means that there needs to be some element of controversy in the topic, and the thesis provides strong evidence supporting a particular position. That is, someone knowledgeable in the field could read the abstract and think, ‘no, I don’t believe it,’ but then change his or her mind after reading the whole thesis.
A thesis tells a story
People tend to forget that it’s a book. Its meant to be read, and in some sense, enjoyed. So the researcher should think about the reader. I don’t mean it should be silly or salacious, but it should be engaging, and the researcher should always consider whether they (or at least some people in the field) would want to read what they’d written.
Sonic weapons frequently occur in science fiction and fantasy. I remember reading the Tintin book The Calculus affair, where Professor Calculus invents ultrasonic devices which break glass objects around the house. But the bad guys from Borduria want to make them large scale and long range devices, capable of mass destruction.
As with many fantastic fiction ideas, sonic weapons have a firm basis in fact. But one of the first planned uses for sonic devices in war was as a defense system, not a weapon.
Between about 1916 and 1936, acoustic mirrors were built and tested around the coast of England. The idea is that they could reflect, and in some cases focus, the sound of incoming enemy aircraft. Microphones could be placed at the foci of the reflectors, giving listeners a means of early detection. The mirrors were usually parabolic or spherical in shape detect the aircraft, and for the spherical designs, the microphone could be moved as a means of identifying the direction of arrival.
It was a good idea at first, but air speed of bombers and fighters improved so much over that time period that it would only give a few minutes extra warning. And then the technology became completely obsolete with the invention of radar, though that also meant that the effort into planning a network of detectors along the coast was not wasted.
The British weren’t the only ones attempting to use sound for aircraft detection between the world wars. The Japanese had mobile acoustic locators known as ‘war tubas,’ Dutch had personal horns and personal parabolas, the Czechs used a four-horn acoustic locator to detect height as well as horizontal direction, and the French physicist Jean-Baptiste Perrin designed the télésitemètre, which in a field full of unusual designs, still managed to distinguish itself by having 36 small hexagonal horns. Perrin though, is better known for his Nobel prize winning work on Brownian motion that finally confirmed the atomic theory of matter. Other well-known contributors to the field include the Austrian born ethnomusicologist Erich Moritz von Hornbo and renowned psychologist Max Wertheimer. Together, they developed the sound directional locator known as the Wertbostel, which was believed to have been commercialised during the 30s.
There are wonderful photos of these devices, most of which can be found here , but I couldn’t resist including at least a couple,
a German acoustic & optical locating apparatus, and a Japanese war tuba.
and a Japanese war tuba.
But these acoustic mirrors and related systems were all intended for defense. During World War II, German scientists worked on sonic weapons under the supervision of Albert Speer. They developed an acoustic cannon that was intended to send a deafening, focused beam of sound, magnified by parabolic reflector dishes. Research was discontinued however, since initial efforts were not successful, nor was it likely to be effective in practical situations.
Devices capable of producing especially loud sounds, often focused in a given direction or over a particular frequency range, have found quite a few uses as weapons of some kind. A long-range acoustic device was used to deter pirates who attempted to attack a cruise ship, for instance, and sonic devices emitting high frequencies that might be heard by teenagers but unlikely to be heard by adults have been deployed in city centres to prevent youth from congregating. However, such stories make for interesting reading, but it’s hard to say how effective they actually are.
And there are even sonic weapons occurring in nature.
The snapping shrimp has a claw which shoots a jet of water, which in turn generates a cavitation bubble. The bubble bursts with a snap reaching around 190 decibels. Its loud enough to kill or stun small sea creatures, who then become its prey.
Synthesising the Aeolian harp is part of a project into synthesising sounds that fall into a class called aeroacoustics. The synthesis model operates in real-time and is based on the physics that generate the sounds in nature.
The Aeolian harp is an instrument that is played by the wind. It is believed to date back to ancient Greece; legend states that King David hung a harp in the tree to hear it being played by the wind. They became popular in Europe in the romantic period and Aeolian harps can be designed as garden ornaments, part of sculptures or large scale sound installations.
The sound created by Aeolian harp has often been described as meditative and inspiring. A poem by Ralph Emerson describes it as follows:
Keep your lips or finger-tips
For flute or spinet’s dancing chips;
I await a tenderer touch
I ask more or not so much:
Give me to the atmosphere.
The harp in the picture is taken from Professor Henry Gurr’s website. This has an excellent review of the principles behind design and operation of Aeolian harps.
As air flows past a cylinder vortices are shed at a frequency that is proportional to the cylinder diameter and speed of the air. This has been discussed in the previous blog entry on Aeolian tones. We now think of the cylinders as a string, like that of a harp, guitar, violin, etc. When a string of one of these instruments is plucked it vibrates at it’s natural frequency. The natural frequency is proportional to the tension, length and mass of the string.
Instead of a pluck or a bow exciting a string, in an Aeolian harp it is the vortex shedding that stimulates the strings. When the frequency of the vortex shedding is in the region of the natural vibration frequency of the string, or one of it’s harmonics, a phenomenon known as lock-in occurs. While in lock-in the string starts to vibrate at the relevant harmonic frequency. For a range of airspeed the string vibration is the dominant factor that dictates the frequency of the vortex shedding; changing the air speed does not change the frequency of vortex shedding, hence the process is locked-in.
While in lock-in a FM type acoustic output is generated giving the harp its unique sound, described by the poet Samuel Coleridge as a “soft floating witchery of sound”.
As with the Aeolian tone model we calculate the frequency of vortex shedding for a given string dimensions and airspeed. We also calculate the fundamental natural vibrational frequency and harmonics of a string given its properties.
There is a specific area of airspeed that leads to string vibration and vortex shedding locking in. This is calculated and the specific frequencies for the FM acoustic signal generated. There is a hysteresis effect on the vibration amplitude based on the increase and decrease of the airspeed which is also implemented.
A used interface is provided that allows a user to select up to 13 strings, adjusting their length, diameter, tension, mass and the amount of damping (which reduces the vibration effects as the harmonic number increases). This interface is shown below which includes presets of an number of different string and wind configurations.
A copy of the pure data patch can be downloaded here. The video below was made to give an overview of the principles, sounds generated and variety of Aeolian harp constructions.
Listening tests, or subjective evaluation of audio, are an essential tool in almost any form of audio and music related research, from data compression codecs over loudspeaker design to realism of sound effects. Sadly, because of the time and effort required to carefully design a test and convince a sufficient number of participants, it is also quite an expensive process.
The advent of web technologies like the Web Audio API, enabling elaborate audio applications within a web page, offers the opportunity to develop browser-based listening tests which mitigate some of the difficulties associated with perceptual evaluation of audio. Researchers at the Centre for Digital Music and Birmingham City University’s Digital Media Technology Lab have developed the Web Audio Evaluation Tool  to facilitate listening test design for any experimenter regardless of their programming experience, operating system, test paradigm, interface layout, and location of their test subjects.
Here we cover some of the reasons why you would want to carry out a listening test in the browser, using the Web Audio Evaluation Tool as a case study.
A downloadable application is rarely an elegant solution, and only the most determined participants will end up taking the test if they can get it to work. A website, however, amounts to a very low-threshold participation.
Low effort A remote test means no booking and setting up of a listening room, showing the participant into the building, …
Scales easily If you can conduct the test once, you can conduct it a virtually unlimited number of times, as long as you find the participants. Amazon Turk or similar services could be helpful with this.
Different locations/cultures/languages within reach For some types of research, it is necessary to include (a high number of) participants with certain geographical locations, cultural backgrounds and/or native languages. When these are scarce nearby, and you cannot find the time or funds to fly around the world, a remote listening test can be helpful.
Loss of control A truly remote test means that you are not present to talk to the participant and answer questions, or notice they misunderstand the instructions. You also have little information on their playback system (make and model, how it is set up, …) and you often know less about their background.
Depending on the type of test and research, you may or may not want to go ‘remote’ or ‘local’.
However, it has been shown for certain tasks that there is no significant difference between results from local and remote tests [2,3].
Furthermore, a tool like the Web Audio Evaluation Tool has many safeguards to compensate this loss of control. Examples of these features include
Extensive metrics Timestamps corresponding with playback and movement events can be automatically visualised to show when participants auditioned which samples and for how long; when they moved which slider from where to where; and so on.
Post-test checks Upon submissions, optional dialogs can remind the participant of certain instructions, e.g. to listen to all fragments; to move all sliders at least once; to rate at least one stimulus below 20% or at exactly 100%; …
Audiometric test and calibration of the playback system An optional series of sliders shown at the start of a test, to be set by the participant so that sine waves an octave apart are all equally loud.
Survey questions Most relevant background information on the participant’s background and playback system can be captured by well-phrased survey questions, which can be incorporated at the start or end of the test.
Listening test interfaces can be hard to design, with many factors to take into account. On top of that it may not always be possible to use your personal machine for (all) your listening tests, even when all your tests are ‘local’.
When your interface requires a third party, proprietary tool like MATLAB or Max to be set up, this can pose a problem as this may not be available where the test is to take place. Furthermore, upgrades to newer versions of this third party software has been known to ‘break’ listening test software, meaning many more hours of updating and patching.
This is a much bigger problem when the test is to take place at different locations, with different computers and potentially different versions of operating systems or other software.
This has been the single most important driving factor behind the development of the Web Audio Evaluation Tool, even for projects where all tests were controlled, i.e. not hosted on a web server, with ‘internet strangers’ as participants, but in a dedicated listening room with known, skilled participants. Because these listening rooms can have very different computers, operating systems, and geographical locations, using a standalone test or a third party application such as MATLAB is often very tedious or even impossible.
In contrast, a browser-based tool typically works on any machine and operating system that supports the browsers it was designed for. In the case of the Web Audio Evaluation Tool, this means Firefox, Chrome, Edge, Safari, … Essentially every browser which supports the Web Audio API.
Multiple machines with centralised results collection
Another benefit of a browser-based listening test, again regardless of whether your test takes place ‘locally’ or ‘remotely’, is the possibility of easy, centralised collection of results of these tests. Not only is this more elegant than fetching every test result with a USB drive (from any number of computers you are using), but it is also much safer to save the result to your own server straight away. If you are more paranoid (which is encouraged in the case of listening tests), you can then back up this server continually for redundancy.
In the case of the Web Audio Evaluation Tool, you just put the test on a (local or remote) web server, and the results will be stored to this server by default.
Others have put the test on a regular file server (not web server) and run the included Python server emulator script python/pythonServer.py from the test computer. The results are then stored to the file server, which can be your personal machine on the same network.
Intermediate versions of the results are stored as well, so that an outage of the test computer means the results are not lost in the event of a computer crash, a human error or a forgotten dentist appointment. The test can be resumed at any point.
Finally, any listening test which is essentially a website, can be integrated within other sites or enhanced with any kind of web technologies. We have already seen clever use of YouTube videos as instructions or HTML index pages tracking progression through a series of tests.
The Web Audio Evaluation Tool seeks to facilitate this by providing the optional returnURL attribute, which specifies the page the participant is redirected to upon completion of the test. This page can be anything from a Doodle to schedule the next test session, an Amazon voucher, a reward cat video, to a secret Eventbrite page for a test participant party.
Are there any other benefits to using a browser-based tool for your listening tests? Please let us know!