Blogs, blogs, blogs

We’re collaborating on a really interesting project called ‘Cross-adaptive processing as musical intervention,’ led by Professor Øyvind Brandtsegg of the Norwegian University of Science and Technology. Essentially, this project involves cross-adaptive audio effects, where the processing applied to one audio signal is dependent on analysis of other signals. We’ve used this concept quite a lot to build intelligent music production systems. But in this project, Øyvind and his collaborators are exploring creative uses of cross-adaptive audio effects in live performance. The effects applied to one source may change depending on what and how another performer plays, so a performer may change what they play to overtly influence everyone else’s sound, thus taking the interplay in a jam session to a whole new level.

One of the neat things that they’ve done to get this project off the ground is created a blog, http://crossadaptive.hf.ntnu.no/ , which is a great way to get all the reports and reflections out there quickly and widely.

This got me thinking of a few other blogs that we should mention. First and foremost is Prof, Trevor Cox of the University of Salford’s wonderful blog, ‘The Sound Blog: Dispatches from Acoustic and Audio Engineering,’ is available at https://acousticengineering.wordpress.com/ . This blog was one of the principal inspirations for our own blog here.

Another leading researcher’s interesting blog is https://marianajlopez.wordpress.com/ – Mariana is looking into aspects of sound design that I feel really don’t get enough attention from the academic community… yet. Hopefully, that will change soon.

There’s plenty of blogs about music production. A couple of good ones are http://thestereobus.com/ and http://productionadvice.co.uk/blog/ . They are full of practical advice, insights and tutorials.

A lot of the researchers in the Audio Engineering team have their own personal blogs, which discuss their research, their projects and various other things related to their career or just cool technologies.

See,

http://brechtdeman.com/blog.html – Brecht De Man ‘s blog. He’s researching semantic and knowledge engineering approaches to music production systems (and a lot more).

https://auralcharacter.wordpress.com/ – Alessia Milo’s blog. She’s looking at (and listening to) soundscapes, and their importance in architecture

http://davemoffat.com/wp/ – Dave Moffat is investigating evaluation of sound synthesis techniques, and how machine learning can be applied to synthesize a wide variety of sound effects.

https://rodselfridge.wordpress.com/ – Rod Selfridge is looking at real-time physical modelling techniques for procedural audio and sound synthesis.

More to come on all of them, I’m sure.

Let us know of any other blogs that we should mention, and we’ll update this entry or add new entries.

Real-Time Synthesis of an Aeolian tone

Aeroacoustics are sounds generated by objects and the air and is a unique group of sounds. Examples of these sounds are a sword swooshing through the air, jet engines, propellers as well as the wind blowing through cracks, etc.  The Aeolian tone is one of the fundamental sounds; the cavity tone and edge tone being others. When designing these sound effects we want to model these fundamental sounds. It then should be possible to make a wide range of sound effects based on these. We want the sounds to be true to the physics generating them and operate in real-time. Completed effects will be suitable for use in video games, TV, film and virtual or augmented reality.

The Aeolian tone is the sound generated when air moves past a string, cylinder or similar object. It’s the whistling noise we may hear coming from a fence in the wind or the swoosh of a sword. An Aeolian Harp is a wind instrument that has been harnessing the Aeolian tone for hundreds of years. If fact, the word Aeolian comes from the Greek god of wind Aeolus.

The physics behind this sound….

When air moves past a cylinder spirals called vortices form behind it, moving away with the air flow. The vortices build up on both sides of the cylinder and detach in an alternating sequence. We call this vortex shedding and the downstream trail of vortices, a Von Karman Vortex Street. An illustration of this is given below:

strouh2

As a vortex sheds from each side there is a change in the lift force from one side to the other. It’s the frequency of this oscillating force that is the fundamental tone frequency. The sound radiates in a direction perpendicular to the flow. There is also a smaller drag force associated with each vortex shed. It is much smaller than the lift force, twice the frequency and radiates parallel to the flow. Both the lift and drag tones have harmonics present.

Can we replicate this…?

In 1878 Vincent Strouhal realized there was a relationship between the diameter of a string, the speed it was travelling thought the air and the frequency of tone produces. We find the Strouhal number varies with the turbulence around the cylinder. Luckily, we have a parameter that represents the turbulence called the Reynolds number. It’s calculated from the viscosity, density and velocity of air, and the diameter of the string. From this we can calculate the Strouhal number and get the fundamental tone frequency.

This is the heart of our model and was the launching point for our model. Acoustic sound sources can be often represented by compact sound sources. These are monopoles, dipoles and quadrupoles. For the Aeolian tone the compact sound source is a dipole.

We have an equation for the acoustic intensity. This is proportional to airspeed to the power of 6. It also includes the relationship between the sound source and listener. The bandwidth around the fundamental tone peak is proportional to the Reynolds number. We calculate this from published experimental results.

The vortex wake acoustic intensity is also calculated. This is much lower that the tone dipole at low airspeed but is proportional to airspeed to the power of 8. There is little wake sound below the fundamental tone frequency and it decreases proportional to the frequency squared.

We use the graphical programming language Pure Data to realise the equations and relationships. A white noise source and bandpass filters can generate the tone sounds and harmonics. The wake noise is a brown noise source shaped by high pass filtering. You can get the Pure Data patch of the model by clicking here.

Our sound effect operates in real-time and is interactive. A user or game engine can adjust:

  • Airspeed
  • Diameter and length of the cylinder
  • Distance between observer and source
  • Azimuth and elevation between observer and source
  • Panning and gain

We can now use the sound source to build up further models. For example, an airspeed model that replicates the wind can reproduce the sound of wind through a fence. The swoosh of a sword is sources lines up in a row with speed adjusted to radius of the arc.

Model complete…?

Not quite. We can calculate the bandwidth of the fundamental tone but have no data for the bandwidth of harmonics. In the current model we set them at the same value. The equation of the acoustic intensity of the wake is an approximation. The equation represents the physics but is not an exact value. We have to use best judgement when scaling it to the acoustic intensity of the fundamental tone.

A string or wire has a natural vibration frequency. There is an interaction between this and the vortex shedding frequency. This modifies the sound heard by a significant factor.

Human echolocation, absolute pitch and Golden Ears

I’m always intrigued by stories of people with amazing abilities, and similar questions often come up. Is this for real, and is this a latent ability that we all might have?

 

A few years ago there was a lot of news stories about Daniel Kish, see “The blind man who taught himself to see”  or “Human echolocation: Using tongue-clicks to navigate the world.” Daniel is a master of echolocation, the ability to sense the environment by listening to the echoes from actively produced sounds, though Daniel is also newsworthy for his humanitarian contributions helping other visually impaired people, see his World Access for the Blind charity . His ability is amazing, and the first question, “Is this for real?” is easily answered in the affirmative. Quite a few studies have also shown that many (or most or even all) people have some echolocation ability, and that the blind generally perform better. And Daniel has taught others to hone their skills.

 

You can find Daniel Kish’s TedX talk at https://www.youtube.com/watch?v=ob-P2a6Mrjs

 

And here’s a wonderful light piece about an eight year old learning echolocation skills

This got me thinking about some other amazing auditory skills. I remember when I was a teen, at a friend’s house, and he told me the names of the white keys on a piano (my musical knowledge was nonexistent). He then asked me to play any of them and he’d tell me which one it was. I thought I’d trick him and so I played one of the black keys. He turned around surprised and said A sharp. So I tried hitting two keys, and he got that right. I soon established that he could identify correctly at most any two keys, and sometimes even three hit together. I said, “Wow, you were born with perfect pitch.” And he looked at me and said “Not born with it. It’s because I’ve been playing piano since I was four!” I also remember that he was amazing at playing music by ear, which is no doubt related, but lousy at sheet reading.

And I don’t know if he had absolute pitch in the true sense. Could he identify the note played on other instruments? Maybe his skill was just limited to what was played at home on his piano, or just generally to piano. Absolute pitch is a phenomenon where there is some debate about the extent to which we might all be able to do it. Some studies suggest that there could be a genetic trait, but there’s also a lot of evidence to suggest that it could be learned. So can anyone learn it, and can they learn it at any time? Certainly, relative pitch skills can be acquired late in life (there’s a lot of material on critical listening and ear training that help someone learn this skill), and repeated exposure can provide someone with an external reference. With enough training, enough examples of different timbres with different fundamentals, perhaps almost anyone could identify the pitch of a wide variety of different sounds.

Extraordinary auditory skills has also come up in some recent research that we’ve been involved in, see J. D. Reiss, “A Meta-Analysis of High Resolution Audio Perceptual Evaluation,” Journal of the Audio Engineering Society, v. 64 (6), June 2016, http://www.aes.org/e-lib/browse.cfm?elib=18296 .

. We were interested in whether people could perceive a difference in CD quality audio (16 bits, 44.1 kHz) and high resolution audio (loosely, anything beyond CD quality). Some anecdotes have mentioned individuals with ‘Golden Ears.’ That is, there might exist a few special people with an exceptional ability to hear this difference, even if the vast majority cannot distinguish the two formats. Our research involved a meta-analysis of all studies looking into the ability to discriminate between high resolution and standard format audio. For a lot of studies, participants were asked lots of binary questions (like ‘are these two samples the same or different?’ or ‘which of these two samples sounds closest to a high resolution reference sample?’). Thus, one could assign a p value to each participant, which corresponds to the probability of getting at least that many correct answers if the participant was just guessing. If everyone was always just guessing, then the p values should be uniformly distributed. If there is a Golden Ears phenomenon, then there should be a ‘bump’ in the low p values.

Well, here’s the histogram of p values from participants from a lot of the studies.

Picture1

You can’t really tell if there’s a Golden Ears phenomenon or not. Why? Well, first, you need a lot of data to see structure in a histogram. But also, our p values are discrete and finite. If a participant was involved in only 4 trials, there are only 5 possible p values; 0.0625 (all 4 correct), 0.3125 (at least 3 correct), 0.6875 (at least 2 correct), 0.9375 (at least one correct), and 1. So there are a lot of bins in our histogram that this participant will never hit. The histogram isn’t showing any participants who only did 4 trials, but the problem is still there even for participants who did a lot, but still finite number, of trials.

There are other issues of course. Maybe this Golden Ears phenomenon only occurs in one out of a thousand people, and those people just weren’t participants. Just one of many reasons why its hard to reject the alternative hypothesis in null hypothesis testing.

But what we did find is that, on average, participants were correct much more than 50% of the time, and that was statistically significant. More on that in an upcoming blog, and in the above mentioned paper ‘A meta-analysis of high resolution audio perceptual evaluation’ in the Journal of the Audio Engineering Society.

The creation of auto-tune

oh-you-auto-tune-your-voice-you-mut-be-a-very-talented-musicianth2HKEQVAN

From 1976 through 1989, Dr. Andy Hildebrand worked for the oil industry, interpreting seismic data. By sending sound waves into the ground, he could detect the reflections, and map potential drill sites. Dr. Hildebrand studied music composition at Rice University, and then developed audio processing tools based on his knowledge in seismic data analysis. He was a leading developer of a variety of plug-ins, including MDT (Multiband Dynamics Tool), JVP (Jupiter Voice Processor) and SST (Spectral Shaping Tool). At a dinner party, a guest challenged him to invent a tool that would help her sing in tune. Based on the phase vocoder, Hildebrand’s Antares Audio Technologies  released Auto-Tune in late 1996.

Auto-Tune was intended to correct or disguise off-key vocals. It moves the pitch of a note to the nearest true semitone (the nearest musical interval in traditional, equal temperament Western tonal music), thus allowing the vocal parts to be tuned. The original Auto-Tune had a speed parameter which could be set between 0 and 400 milliseconds, and determined how quickly the note moved to the target pitch. Engineers soon realised that by setting this ‘attack time’ very short, Auto-Tune could be used as an effect to distort vocals, and make it sound as if the voice leaps from note to note in discrete steps. It gives it an artificial, synthesiser like sound, that can be appealing or irritating depending on taste. This unusual effect was the trademark sound of Cher’s 1998 hit song, ‘Believe.’

Like many audio effects, engineers and performers found a creative use, quite different from the intended use. As Hildebrand said, “I never figured anyone in their right mind would want to do that.” Yet Auto-Tune and competing pitch correction technologies are now widely applied (in amateur and professional recordings, and across many genres) for both intended and unusual, artistic uses.

th3YKRLG9W