High resolution audio- finally, rigorously put to the test. And the verdict is…

Yes, you can hear a difference! (but it is really hard to measure)

See http://www.aes.org/e-lib/browse.cfm?elib=18296 for the June 2016 Open Access article in the Journal of the Audio Engineering Society  on “A meta-analysis of high resolution audio perceptual evaluation”

For years, I’ve been hearing people in the audio engineering community arguing over whether or not it makes any difference to record, mix and playback better than CD quality (44.1 kHz, 16 bit) or better than production quality (48 kHz, 16 bit) audio. Some people swear they can hear a difference, others have stories about someone they met who could always pick out the differences, others say they’re all just fooling themselves. A few people could mention a study or two that supported their side, but the arguments didn’t seem to ever get resolved.

Then, a bit more than a year ago I was at a dinner party where a guy sitting across from me was about to complete his PhD in meta-analysis. Meta-analysis? I’d never heard of it. But the concept, analysing and synthesising the results of many studies to get a more definitive answer and gain more insights and knowledge, really intrigued me. So it was about time that someone tried this on the question of perception of hi-res audio.

Unfortunately, no one I asked was willing to get involved. A couple of experts thought there couldn’t be enough data out there to do the meta-analysis. A couple more thought that the type of studies (not your typical clinical trial with experimental and control groups) couldn’t be analysed using the established statistical approaches in meta-analysis. So, I had to do it myself. This also meant I had to be extra careful, and seek out as much advice as possible, since no one was looking over my shoulder to tell me when I was wrong or stupid.

The process was fascinating. The more I looked, the more I uncovered studies of high resolution audio perception. And my main approach for finding them (start with a few main papers, then look at everyone they cited and everyone who cited them, and repeat with any further interesting papers found), was not mentioned in the guidance to meta-analysis that I read. Then getting the data was interesting. Some researchers had it all prepared in handy, well-labelled spreadsheets, one other found it in an old filing cabinet, one had never kept it at all! And for some data, I had to write little programs to reverse engineer the raw data from T values for trials with finite outcomes.

Formal meta-analysis techniques could be applied, and I gained a strong appreciation for both the maths behind them, and the general guidance that helps ensure rigour and helps avoid bias in the meta-study, But the results, in a few places, disagreed with what is typical. The potential biases in the studies seemed to occur more often with those that did not reject the null hypothesis, i.e., those that found no evidence for discriminating between high resolution and CD quality audio. Evidence of publication bias seemed to mostly go away if one put the studies into subgroups. And use of binomial probabilities allowed the statistical approaches in meta-analysis to be applied to studies where there was not a control group (‘no effect’ can be determined just from binomial probabilities).

The end result was that people could, sometimes, perceive the difference between hi-res and CD audio. But they needed to be trained and the test needed to be carefully designed. And it was nice to see that the experiments and analysis were generally a little better today than in the past, so research is advancing. Still, most tests had some biases towards false negatives. So perhaps, careful experiments, incorporating all the best approaches, may show this perception even more strongly.

Meta-analysis is truly fascinating, and audio engineering, psychoacoustics, music technology and related fields need more of it.

The Swoosh of the Sword

When we watch Game of Thrones or play the latest Assassin’s Creed the sound effect added to a sword being swung adds realism, drama and overall excitement to our viewing experience.

There are a number of methods for producing sword sound effects, from filtering white noise with a bandpass filter to solving the fundamental equations for fluid dynamics using finite volume methods. One method investigated by the Audio Engineering research team at QMUL was to find semi-empirical equations used in the Aeroacoustic community as an alternative to solving the full Navier Stokes equations. Running in real-time these provide computationally efficient methods of achieving accurate results – we can model any sword, swung at any speed and even adjust the model to replicate the sound of a baseball bat or golf club!

The starting point for these sound effect models is that of the Aeolian tone, (see previous blog entry – https://intelligentsoundengineering.wordpress.com/2016/05/19/real-time-synthesis-of-an-aeolian-tone/). The Aeolian tone is the sound generated as air flows around an object, in the case of our model, a cylinder. In the previous blog we describe the creation of a sound synthesis model for the Aeolian tone, including a link to a demo version of the model.

For a sword we take a number of the Aeolian tone models and place them on a virtual sword at different place settings. This is shown below:

coordSwordSource

Each Aeolian tone model is called a compact source. It can be seen that more are placed at the tip of the sword rather than the hilt. This is because the acoustic intensity is far higher for faster moving sources. There are 6 sources placed at the tip, positioned at a distance of 7 x the sword diameter. This distance is based on when the aerodynamic effects become de-correlated, although a simplification. One source is placed at the hilt and the final source equidistant between the last tip source and the hilt.

The complete model is presented in a GUI as shown below:

SwordDemoGUI

Referring to the both previous figures, it can be seen that the user is able to move the observer position within a 3D space. The thickness of the blade can be set at the tip and the hilt as well as the length of the blade. It is then linearly interpolated over the blade length so that each source diameter can be calculated.

The azimuth and elevation of the sword pre and post swing can be set. The strike position is fixed to an azimuth of 180 degrees and this is the point where the sword reaches its maximum speed. The user sets the top speed of the tip from the GUI. The Prime button makes sure all the variables are pushed through into the correct places in equations and the Go button triggers the swing.

It can be seen that there are 4 presets. Model 1 is a thin fencing type sword and Model 2 is a thicker sword. To test versatility of the model we decided to try and model a golf club. The preset PGA will set the model to implement this. The golf club model involves making the diameter of the source at the tip much larger, to represent the striking face of a golf club. It was found that those unfamiliar with golf did not identify the sound immediately so a simple golf ball strike sound is synthesised as the club reaches top speed.

To test versatility further, we created a model to replicate the sound of a baseball bat; preset MLB. This is exactly the same model as the sword with the dimensions just adjusted to the length of a bat plus the tip and hilt thickness. A video with all the preset sounds is given below. This includes two sounds created by a model with reduced physics, LoQ1 & LoQ2. These were created to investigate if there is any difference in perception.

The demo model was connected to the animation of a knight character in the Unity game engine. The speed of the sword is directly mapped from the animation to the sound effect model and the model observer position set to the camera position. A video of the result is given below:

Doppler, Leslie and Hammond

Donald Leslie (1913–2004) bought a Hammond organ in 1937, as a substitute for a pipe organ. But at home in a small room, it could not reproduce the grand sound of an organ. Since the pipe organ has different locations for each pipe, he designed a moving loudspeaker.

The Leslie speaker uses an electric motor to move an acoustic horn in a circle around a loudspeaker. Thus we have a moving sound source and a stationary listener, which is a well-known situation that produces the Doppler effect.

It exploits the Doppler effect to produce frequency modulation. The classic Leslie speaker has a crossover that divides the low and high frequencies. It consists of a fixed treble unit with spinning horns, a fixed woofer and spinning rotor. Both the horns (actually, one horn and a dummy used as a counterbalance) and a bass sound baffle rotate, thus creating vibrato due to the changing velocity in the direction of the listener, and tremolo due to the changing distance. The rotating elements can move at varied speeds, or stopped completely. Furthermore, the system is partially enclosed and it uses a rotating speaker port. So the listener hears multiple reflections at different Doppler shifts to produce a chorus-like effect.

The Leslie speaker has been widely used in popular music, especially when the Hammond B-3 organ was played out through a Leslie speaker. This combination can be heard on many classic and progressive rock songs, including hits by Boston, Santana, Steppenwolf, Deep Purple and The Doors. And the Leslie speaker has also found extensive use in modifying guitar and vocal sounds.

Ironically, Donald Leslie had originally tried to license his loudspeaker to the Hammond company, and even gave the Hammond company a special demonstration. But at the time, Laurens Hammond (founder of the Hammond organ company) did not like the concept at all.