The post Behind the spectacular sound of ‘Dunkirk’ – with Richard King: appeared first on A Sound Effect. Its an interesting interview giving deep insights into sound design and soundscape creation for film. It caught my attention first because of the mention of Richard King. But its not Richard King, Grammy award winning professor in sound recording at University of McGill. Its the other one, the Oscar award winning supervising sound editor at Warner Brothers Sound.
We collaborated with Prof. Richard King on a couple of papers. In , we conducted an experiment where eight songs were each mixed by eight different engineers. We analysed audio features from the multitracks and mixes. This allowed us to test various assumed rules of mixing practice. In the follow-up , the mixes were all rated by experienced test subjects. We used the ratings to investigate relationships between perceived mix quality and sonic features of the mixes.
 B. De Man, M. Boerum, B. Leonard, R. King, G. Massenburg and J. D. Reiss, ‘Perceptual Evaluation of Music Mixing Practices,’ 138th Audio Engineering Society (AES) Convention, May 2015
 B. De Man, B. Leonard, R. King and Joshua D. Reiss, “An analysis and evaluation of audio features for multitrack music mixtures,” 15th Int. Society for Music Information Retrieval Conference (ISMIR-14), Taipei, Taiwan, Oct. 2014
I recently found out about an interesting little experiment where it was shown that people could identify when hot or cold water was being poured from the sound alone. This is a little surprising since we don’t usually think of temperature as having a sound.
Here are two sound samples;
Which one do you think was hot water and which was cold water? Scroll down for the answer..
The work was first done by a London advertising agency, Condiment Junkie, who use sound design in branding and marketing, in collaboration with researchers from University of Oxford, and they published a research paper on this. The experiment is first described in Condiment Junkie’s blog, and was picked up by NPR and lots of others. There’s even a YouTube video about this phenomenon that has over 600,000 views.
But its all speculation. Most of the arguments are half-formed and involve a fair amount of handwaving. No one actually analysed the audio.
So I put the two samples above through some analysis using Sonic Visualiser. Spectrograms are very good for this sort of thing because they show you how the frequency content is changing over time. But you have to be careful because if you don’t choose how to visualise it carefully, you’ll easily overlook the interesting stuff.
Here’s the spectrograms of the two files, cold water on top, hot water on bottom. Frequency is on a log scale (otherwise all the detail will be crammed at the bottom) and the peak frequencies are heavily emphasised (there’s an awful lot of noise).
There’s more analysis than shown, but the most striking feature is that the same frequencies are present in both signals! There is a strong, dominant frequency that linearly increases from about 650 Hz to just over 1 kilohertz. And there is a second frequency that appears a little later, starting at around 720 Hz, falling all the way to 250 Hz, then climbing back up again.
The higher frequency line in the spectrogram which linearly increases could be related to the volume of air left in the vessel the liquid is being poured into. As the fluid is poured in the volume of air decreases and the resonant frequency of the remaining ‘chamber’ increases.The lower line of frequencies could be related to the force of liquid being added. As the pouring speed increases, increasing the force, the falling liquid pushes further into the reservoir. This means a deeper column of air is trapped and becomes a bubble. The larger the bubble the lower the resonant frequency. This is the theory of Minneart and described in the attached paper.My last thought was that for hot water, especially boiling, there will be steam in the vessel and surrounding the contact area of the pour. Perhaps the steam has an acoustic filtering effect and/or a physical effect on the initial pour or splashes.
Audio and informatics researchers are perhaps quite familiar with retrieval systems that try to analyse recordings to identify when an important word or phrase was spoken, or when a song was played. But I once did some collaboration with a company who did laughter and question detection, two audio informatics problems I hadn’t heard of before. I asked them about it. The company was developing audio analytics software to assist Call Centres. Call Centres wanted to keep track of the unusual or problematic calls, and in particular, any laughter when someone is calling tech support would be worth investigating. And I suppose all sorts of unusual sounds should indicate that something about the call is worth noting. Which brings me to the subject of this blog entry.
Screams occupy an important evolutionary niche, since they are used as a warning and alert signal, and hence are intended to be a sound which we strongly and quickly focus on. A 2015 study by Arnal et al. showed that screams contain a strong modulation component, typically within the 30 to 150 Hz range. This sort of modulation is sometimes called roughness. Arnal showed that roughness occurs in both natural and artificial alarm sounds, and that adding roughness to a sound can make it be perceived as more alarming or fearful.
This new study suggests that a peculiar set of features may be appropriate for detecting screams. And like most fields of research, if you dig deep enough, you find that quite a few people have already scratched the surface.
I did a quick search of AES and IEEE papers and found ten that had ‘scream’ in the title, not counting those referring to systems or algorithms given the acronym SCREAM. This is actually very few, indicating that the field is underdeveloped. One of them, is really about screams and growls in death metal music, which though interesting in its own right, is quite different. Most of the rest all seem to mostly just ‘applying my favourite machine learning technique to scream data’. This is an issue with a lot of papers, deserving of a blog entry in future.
But one of the most detailed analyses of screams was conducted by audio forensics researcher and consultant Durand Begault. In 2008 he published ‘Forensic Analysis of the Audibility of Female Screams’ In it, he notes “the local frequency modulation (‘warble’ or ‘vibrato’)” that was later focused on in Arnal’s paper.
Begault also has some interesting discussion of investigations of scream audibility for a court case. He was asked to determine whether a woman screaming in one location could be heard by potential witnesses in a nearby community. He tested this on site by playing back prerecorded screams at the site of the incident. The test screams were generated by asking female subjects ‘to scream as loudly as possible, as if you had just been surprised by something very scary.’ Thirty screams were recorded, ranging from 123 to 102 decibels. The end result was that these screams could easily be heard more than 100 meters away, even with background noise and obstructions.
This is certainly not the only audio analysis and processing that has found its way into the courtroom. One high profile case was in February 2012. Neighborhood watch coordinator George Zimmerman shot and killed black teenager Trayvon Martin in Sanford, Florida. In Zimmerman’s trial for second degree murder, experts offered analysis of a scream heard in the background of a 911 phone call that also captured the sound of the gunshot that killed Martin. If the screamer was Zimmerman, it would strengthen the case that he acted in self-defense, but if it was Martin, it would imply that Zimmerman was the aggressor. But FBI audio analysis experts testified in the case about the difficulties in identifying the speaker, or even his age, from the screams , and news outlets also called on experts who noted the lack of robust ‘screamer identification’ technologies.
The issue of scream audibility thus begs the question, ‘how loud is a scream.’ We know they can be attention-grabbing, ear –piercing shrieks. The loudest scream Begault recorded was 123 dB, and he stated that scream “frequency content seems almost tailored to frequencies of maximal sensitivity on an equal-loudness contour.”
And apparently, one can get a lot louder with a scream than a shout. According to the Guinness Book of World Records, the loudest shout was 121.7 dBA by Annalisa Flanagan, shouting the word ‘Quiet!’. And the loudest scream ever recorded is 129 dB (C-Weighted), by Jill Drake. Not surprisingly, both Jill and Annalisa are teachers, who seem to have found a very effective way to deal with unruly classrooms.
Interestingly, one might have a false conception of the diversity of screaming sounds if one’s understanding is based on films. The Wilhelm Scream, a sound sample that has been used in over 300 films. This overuse perhaps gives a certain familiarity to the listener, and lessens the alarming nature of the sound.
For more on the Wilhelm scream, see the blog entry ‘Swinging microphones and slashing lightsabres’. But here’s a short video on the sound, which includes a few more examples of its use than were given in the previous blog entry.
Synthesising the Aeolian harp is part of a project into synthesising sounds that fall into a class called aeroacoustics. The synthesis model operates in real-time and is based on the physics that generate the sounds in nature.
The Aeolian harp is an instrument that is played by the wind. It is believed to date back to ancient Greece; legend states that King David hung a harp in the tree to hear it being played by the wind. They became popular in Europe in the romantic period and Aeolian harps can be designed as garden ornaments, part of sculptures or large scale sound installations.
As air flows past a cylinder vortices are shed at a frequency that is proportional to the cylinder diameter and speed of the air. This has been discussed in the previous blog entry on Aeolian tones. We now think of the cylinders as a string, like that of a harp, guitar, violin, etc. When a string of one of these instruments is plucked it vibrates at it’s natural frequency. The natural frequency is proportional to the tension, length and mass of the string.
Instead of a pluck or a bow exciting a string, in an Aeolian harp it is the vortex shedding that stimulates the strings. When the frequency of the vortex shedding is in the region of the natural vibration frequency of the string, or one of it’s harmonics, a phenomenon known as lock-in occurs. While in lock-in the string starts to vibrate at the relevant harmonic frequency. For a range of airspeed the string vibration is the dominant factor that dictates the frequency of the vortex shedding; changing the air speed does not change the frequency of vortex shedding, hence the process is locked-in.
As with the Aeolian tone model we calculate the frequency of vortex shedding for a given string dimensions and airspeed. We also calculate the fundamental natural vibrational frequency and harmonics of a string given its properties.
There is a specific area of airspeed that leads to string vibration and vortex shedding locking in. This is calculated and the specific frequencies for the FM acoustic signal generated. There is a hysteresis effect on the vibration amplitude based on the increase and decrease of the airspeed which is also implemented.
A used interface is provided that allows a user to select up to 13 strings, adjusting their length, diameter, tension, mass and the amount of damping (which reduces the vibration effects as the harmonic number increases). This interface is shown below which includes presets of an number of different string and wind configurations.
Sound designers for film and games often use creative methods to generate the appropriate sound from existing sources, rather than through signal processing techniques designed to synthesise or process audio. One well-known technique for generating the Doppler effect is to swing a microphone back and forth in front of a sound source. This was used in the original Star Wars to generate the original lightsaber sound. As described by Ben Burtt, the sound designer;
“… once we had established this tone of the lightsaber of course you had to get the sense of the lightsaber moving because characters would carry it around, they would whip it through the air , they would thrust and slash at each other in fights, and to achieve this addtional sense of movement I played the sound over a speaker in a room.
Just the humming sound, the humming and the buzzing combined as an endless sound, and then I took another microphone and waved it in the air next to that speaker so that it would come close to the speaker and go away and you could whip it by. And what happens when you do that by recording with a moving microphone is you get a Doppler’s shift, you get a pitch shift in the sound and therefore you can produce a very authentic facsimile of a moving sound. And therefore give the lightsaber a sense of movement…”
Ben Burtt, by the way, is one of the most successful sound designers of all time. One of his trademarks is incorporating the ‘Wilhelm Scream’ into many of the films he works on. This scream is a stock sound clip that he found, which was originally recorded for the movie Distant Drums (1951). Here’s some examples of where the clip has been used.
When we watch Game of Thrones or play the latest Assassin’s Creed the sound effect added to a sword being swung adds realism, drama and overall excitement to our viewing experience.
There are a number of methods for producing sword sound effects, from filtering white noise with a bandpass filter to solving the fundamental equations for fluid dynamics using finite volume methods. One method investigated by the Audio Engineering research team at QMUL was to find semi-empirical equations used in the Aeroacoustic community as an alternative to solving the full Navier Stokes equations. Running in real-time these provide computationally efficient methods of achieving accurate results – we can model any sword, swung at any speed and even adjust the model to replicate the sound of a baseball bat or golf club!
The starting point for these sound effect models is that of the Aeolian tone, (see previous blog entry – https://intelligentsoundengineering.wordpress.com/2016/05/19/real-time-synthesis-of-an-aeolian-tone/). The Aeolian tone is the sound generated as air flows around an object, in the case of our model, a cylinder. In the previous blog we describe the creation of a sound synthesis model for the Aeolian tone, including a link to a demo version of the model.
For a sword we take a number of the Aeolian tone models and place them on a virtual sword at different place settings. This is shown below:
Each Aeolian tone model is called a compact source. It can be seen that more are placed at the tip of the sword rather than the hilt. This is because the acoustic intensity is far higher for faster moving sources. There are 6 sources placed at the tip, positioned at a distance of 7 x the sword diameter. This distance is based on when the aerodynamic effects become de-correlated, although a simplification. One source is placed at the hilt and the final source equidistant between the last tip source and the hilt.
The complete model is presented in a GUI as shown below:
Referring to the both previous figures, it can be seen that the user is able to move the observer position within a 3D space. The thickness of the blade can be set at the tip and the hilt as well as the length of the blade. It is then linearly interpolated over the blade length so that each source diameter can be calculated.
The azimuth and elevation of the sword pre and post swing can be set. The strike position is fixed to an azimuth of 180 degrees and this is the point where the sword reaches its maximum speed. The user sets the top speed of the tip from the GUI. The Prime button makes sure all the variables are pushed through into the correct places in equations and the Go button triggers the swing.
It can be seen that there are 4 presets. Model 1 is a thin fencing type sword and Model 2 is a thicker sword. To test versatility of the model we decided to try and model a golf club. The preset PGA will set the model to implement this. The golf club model involves making the diameter of the source at the tip much larger, to represent the striking face of a golf club. It was found that those unfamiliar with golf did not identify the sound immediately so a simple golf ball strike sound is synthesised as the club reaches top speed.
To test versatility further, we created a model to replicate the sound of a baseball bat; preset MLB. This is exactly the same model as the sword with the dimensions just adjusted to the length of a bat plus the tip and hilt thickness. A video with all the preset sounds is given below. This includes two sounds created by a model with reduced physics, LoQ1 & LoQ2. These were created to investigate if there is any difference in perception.
The demo model was connected to the animation of a knight character in the Unity game engine. The speed of the sword is directly mapped from the animation to the sound effect model and the model observer position set to the camera position. A video of the result is given below:
Aeroacoustics are sounds generated by objects and the air and is a unique group of sounds. Examples of these sounds are a sword swooshing through the air, jet engines, propellers as well as the wind blowing through cracks, etc. The Aeolian tone is one of the fundamental sounds; the cavity tone and edge tone being others. When designing these sound effects we want to model these fundamental sounds. It then should be possible to make a wide range of sound effects based on these. We want the sounds to be true to the physics generating them and operate in real-time. Completed effects will be suitable for use in video games, TV, film and virtual or augmented reality.
The Aeolian tone is the sound generated when air moves past a string, cylinder or similar object. It’s the whistling noise we may hear coming from a fence in the wind or the swoosh of a sword. An Aeolian Harp is a wind instrument that has been harnessing the Aeolian tone for hundreds of years. If fact, the word Aeolian comes from the Greek god of wind Aeolus.
The physics behind this sound….
When air moves past a cylinder spirals called vortices form behind it, moving away with the air flow. The vortices build up on both sides of the cylinder and detach in an alternating sequence. We call this vortex shedding and the downstream trail of vortices, a Von Karman Vortex Street. An illustration of this is given below:
As a vortex sheds from each side there is a change in the lift force from one side to the other. It’s the frequency of this oscillating force that is the fundamental tone frequency. The sound radiates in a direction perpendicular to the flow. There is also a smaller drag force associated with each vortex shed. It is much smaller than the lift force, twice the frequency and radiates parallel to the flow. Both the lift and drag tones have harmonics present.
Can we replicate this…?
In 1878 Vincent Strouhal realized there was a relationship between the diameter of a string, the speed it was travelling thought the air and the frequency of tone produces. We find the Strouhal number varies with the turbulence around the cylinder. Luckily, we have a parameter that represents the turbulence called the Reynolds number. It’s calculated from the viscosity, density and velocity of air, and the diameter of the string. From this we can calculate the Strouhal number and get the fundamental tone frequency.
This is the heart of our model and was the launching point for our model. Acoustic sound sources can be often represented by compact sound sources. These are monopoles, dipoles and quadrupoles. For the Aeolian tone the compact sound source is a dipole.
We have an equation for the acoustic intensity. This is proportional to airspeed to the power of 6. It also includes the relationship between the sound source and listener. The bandwidth around the fundamental tone peak is proportional to the Reynolds number. We calculate this from published experimental results.
The vortex wake acoustic intensity is also calculated. This is much lower that the tone dipole at low airspeed but is proportional to airspeed to the power of 8. There is little wake sound below the fundamental tone frequency and it decreases proportional to the frequency squared.
We use the graphical programming language Pure Data to realise the equations and relationships. A white noise source and bandpass filters can generate the tone sounds and harmonics. The wake noise is a brown noise source shaped by high pass filtering. You can get the Pure Data patch of the model by clicking here.
Our sound effect operates in real-time and is interactive. A user or game engine can adjust:
- Diameter and length of the cylinder
- Distance between observer and source
- Azimuth and elevation between observer and source
- Panning and gain
We can now use the sound source to build up further models. For example, an airspeed model that replicates the wind can reproduce the sound of wind through a fence. The swoosh of a sword is sources lines up in a row with speed adjusted to radius of the arc.
Not quite. We can calculate the bandwidth of the fundamental tone but have no data for the bandwidth of harmonics. In the current model we set them at the same value. The equation of the acoustic intensity of the wake is an approximation. The equation represents the physics but is not an exact value. We have to use best judgement when scaling it to the acoustic intensity of the fundamental tone.
A string or wire has a natural vibration frequency. There is an interaction between this and the vortex shedding frequency. This modifies the sound heard by a significant factor.