Aeroacoustic Sound Effects – Journal Article

I am delighted to be able to announce that my article on Creating Real-Time Aeroacoustic Sound Effects Using Physically Informed Models is in this months Journal of the Audio Engineering Society. This is an invited article following winning the best paper award at the Audio Engineering Society 141st Convention in LA. It is an open access article so free for all to download!

The article extends the original paper by examining how the Aeolian tone synthesis models can be used to create a number of sound effects. The benefits of these models are that the produce plausible sound effects which operate in real-time. Users are presented with a number of highly relevant parameters to control the effects which can be mapped directly to 3D models within game engines.

The basics of the Aeolian tone were given in a previous blog post. To summarise, a tone is generated when air passes around an object and vortices are shed behind it. Fluid dynamic equations are available which allow a prediction of the tone frequency based on the physics of the interaction between the air and object. The Aeolian tone is modelled as a compact sound source.

To model a sword or similar object a number of these compact sound sources are placed in a row. A previous blog post describes this in more detail. The majority of compact sound sources are placed at the tip as this is where the airspeed is greatest and the greatest sound is generated.

The behaviour of a sword when being swung has to be modelled which then used to control some of the parameters in the equations. This behaviour can be controlled by a game engine making fully integrated procedural audio models.

The sword model was extended to include objects like a baseball bat and golf club, as well as a broom handle. The compact sound source of a cavity tone was also added in to replicate swords which have grooved profiles. Subjective evaluation gave excellent results, especially for thicker objects which were perceived as plausible as pre-recorded samples.

The synthesis model could be extended to look at a range of sword cross sections as well as any influence of the material of the sword. It is envisaged that other sporting equipment which swing or fly through the air could be modelled using compact sound sources.

A propeller sound is one which is common in games and film and partially based on the sounds generated from the Aeolian tone and vortex shedding. As a blade passes through the air vortices are shed at a specific frequency along the length. To model individual propeller blades the profiles of a number were obtained with specific span length (centre to tip) and chord lengths (leading edge to trailing edge).

Another major sound source is the loading sounds generated by the torque and thrust. A procedure for modelling these sounds is outlined in the article. Missing from the propeller model is distortion sounds. These are more associated with rotors which turn in the horizontal plane.

An important sound when hearing a propeller powered aircraft is the engine sound. The one taken for this model was based on one of Andy Farnell’s from his book Designing Sound. Once complete a user is able to select an aircraft from a pre-programmed bank and set the flight path. If linked to a game engine the physical dimensions and flight paths can all be controlled procedurally.

Listening tests indicate that the synthesis model was as plausible as an alternative method but still not as plausible as pre-recorded samples. It is believed that results may have been more favourable if modelling electric-powered drones and aircraft which do not have the sound of a combustion engine.

The final model exploring the use of the Aeolian tone was that of an Aeolian Harp. This is a musical instrument that is activated by wind blowing around the strings. The vortices that are shed behind the string can activate a mechanical vibration if they are around the frequency of one of the strings natural harmonics. This produces a distinctive sound.

The digital model allows a user to synthesis a harp of up to 13 strings. Tension, mass density, length and diameter can all be adjusted to replicate a wide variety of string material and harp size. Users can also control a wind model modified from one presented in Andy Farnell’s book Designing Sound, with control over the amount of gusts. Listening tests indicate that the sound is not as plausible as pre-recorded ones but is as plausible as alternative synthesis methods.

The article describes the design processes in more detail as well as the fluid dynamic principles each was developed from. All models developed are open source and implemented in pure data. Links to these are in the paper as well as my previous publications. Demo videos can be found on YouTube.

Advertisements

Sound Synthesis – Are we there yet?

TL;DR. Yes

At the beginning of my PhD, I began to read the sound effect synthesis literature, and I quickly discovered that there was little to no standardisation or consistency in evaluation of sound effect synthesis models – particularly in relations to the sounds they produce. Surely one of the most important aspects of a synthetic system, is whether it can artifically produce a convincing replacement for what it is intended to synthesize. We could have the most intractable and relatable sound model in the world, but if it does not sound anything like it is intended to, then will any sound designers or end users ever use it?

There are many different methods for measuring how effective a sound synthesis model is. Jaffe proposed evaluating synthesis techniques for music based on ten criteria. However, only two of the ten criteria actually consider any sounds made by the synthesiser.

This is crazy! How can anyone know what synthesis method can produce a convincingly realistic sound?

So, we performed a formal evaluation study, where a range of different synthesis techniques where compared in a range of different situations. Some synthesis techniques are indistinguishable from a recorded sample, in a fixed medium environment. In short – Yes, we are there yet. There are sound synthesis methods that sound more realistic than high quality recorded samples. But there is clearly so much more work to be done…

For more information read this paper

My favorite sessions from the 143rd AES Convention

AES_NY

Recently, several researchers from the audio engineering research team here attended the 143rd Audio Engineering Society Convention in New York. Before the Convention, I wrote a blog entry highlighting a lot of the more interesting or adventurous research that was being presented there. As is usually the case at these Conventions, I have so many meetings to attend that I miss out on a lot of highlights, even ones that I flag up beforehand as ‘must see’. Still, I managed to attend some real gems this time, and I’ll discuss a few of them here.

I’m glad that I attended ‘Audio Engineering with Hearing Loss—A Practical Symposium’ . Hearing loss amongst musicians, audiophiles and audio engineers is an important topic that needs more attention. Overexposure, both prolonged and too loud, is a major cause of hearing dage. In addition to all the issues it causes for anybody, for those in the industry, it affects their ability to work or even appreciate their passion. The session had lots of interesting advice.

The most interesting presentation in the session was from Richard Einhorn, a composer and music producer. In 2010, he lost much of his hearing due to a virus. He woke up one day to find that he had completely lost hearing in his right ear, a condition known as Idiopathic Sudden Sensorineural Hearing Loss. This then evolved into hyperacusis, with extreme distortion, excessive volume and speech intelligibility. In many ways, deafness in the right ear would have been preferred. On top of that, his left ear suffered otosclerosis, where everything was at greatly reduced volume. And given that this was his only functioning ear, the risk of surgery to correct it was too great.

Richard has found some wonderful ways to still function, and even continue working in audio and music, with the limited hearing he still has. There’s a wonderful description of them in Hearing Loss Magazine, and they include the use of the ‘Companion Mic,’ which allowed him to hear from many different locations around a busy, noisy environment, like a crowded restaurant.

Thomas Lund presented ‘The Bandwidth of Human Perception and its Implications for Pro Audio.’ I really wasn’t sure about this before the Convention. I had read the abstract, and thought it might be some meandering, somewhat philosophical talk about hearing perception, with plenty of speculation but lacking in substance. I was very glad to be proven wrong! It had aspects of all of that, but in a very positive sense. It was quite rigorous, essentially a systematic review of research in the field that had been published in medical journals. It looks at the question of auditory perceptual bandwidth, where bandwidth is in a general information theoretic and cognitive sense, not specifically frequency range. The research revolves around the fact that, though we receive many megabits of sensory information every second, it seems that we only use dozens of bits per second of information in our higher level perception. This has lots of implications for listening test design, notably on how to deal with aspects like sample duration or training of participants. This was probably the most fascinating technical talk I saw at the Convention.

There were two papers that I had flagged up as having the most interesting titles, ‘Influence of Audience Noises on the Classical Music Perception on the Example of Anti-cough Candies Unwrapping Noise’, and ‘Acoustic Levitation—Standing Wave Demonstration.’ I had an interesting chat with an author of the first one, Adam Pilch. When walking around much later looking for the poster for the second one, I bump into Adam again. Turns out, he was a co-author on both of them! It looks like Adam Pilch and Bartlomiej Chojnacki (the shared authors on those papers) and their co-authors have an appreciation of the joy of doing research for fun and curiousity, and an appreciation for a good paper title.

Leslie Ann Jones was the Heyser lecturer. The Heyser lecture, named after Richard C. Heyser, is an evening talk given by an eminent individual in audio engineering or related fields. Leslie has had a fascinating career, and gave a talk that makes one realise just how much the industry is changing and growing, and how important are the individuals and opportunities that one encounters in a career.

The last session I attended was also one of the best. Chris Pike, who recently became leader of the audio research team at BBC R&D (he has big shoes to fill, but fits them well and is already racing ahead), presented ‘What’s This? Doctor Who with Spatial Audio!’ . I knew this was going to be good because it involved two of my favorite things, but it was much better than that. The audience were all handed headphones so that they could listen to binaural renderings used throughout the presentation. I love props at technical talks! I also expected the talk to focus almost completely on the binaural, 3d sound rendering for a recent episode, but it was so much more than that. There was quite detailed discussion of audio innovation throughout the more than 50 years of Doctor Who, some of which we have discussed when mentioning Daphne Oram and Delia Derbyshire in our blog entry on female pioneers in audio engineering.

There’s a nice short interview with Chris and colleagues Darran Clement (sound mixer) and Catherine Robinson (audio supervisor) about the binaural sound in Doctor Who on BBC R&D’s blog, and here’s a youtube video promoting the binaural sound in the recent episode;

 

Applied Science Journal Article

We are delighted to announce the publication of our article titled, Sound Synthesis of Objects Swinging through Air Using Physical Models in the Applied Science Special Issue on Sound and Music Computing.

 

The Journal is a revised and extended version of our paper which won a best paper award at the 14th Sound and Music Computing Conference which was held in Espoo, Finland in July 2017. The initial paper presented a physically derived synthesis model used to replicate the sound of sword swings using equations obtained from fluid dynamics, which we discussed in a previous blog entry. In the article we extend listening tests to include sound effects of metal swords, wooden swords, golf clubs, baseball bats and broom handles as well as adding in a cavity tone synthesis model to replicate grooves in the sword profiles. Further test were carried out to see if participants could identify which object our model was replicating by swinging a Wii Controller.
The properties exposed by the sound effects model could be automatically adjusted by a physics engine giving a wide corpus of sounds from one simple model, all based on fundamental fluid dynamics principles. An example of the sword sound linked to the Unity game engine is shown in this video.
 

 

Abstract:
A real-time physically-derived sound synthesis model is presented that replicates the sounds generated as an object swings through the air. Equations obtained from fluid dynamics are used to determine the sounds generated while exposing practical parameters for a user or game engine to vary. Listening tests reveal that for the majority of objects modelled, participants rated the sounds from our model as plausible as actual recordings. The sword sound effect performed worse than others, and it is speculated that one cause may be linked to the difference between expectations of a sound and the actual sound for a given object.
The Applied Science journal is open access and a copy of our article can be downloaded here.

Our meta-analysis wins best JAES paper 2016!

Last year, we published an Open Access article in the Journal of the Audio Engineering Society (JAES) on “A meta-analysis of high resolution audio perceptual evaluation.”

JAES_V64_6_ALL

I’m very pleased and proud to announce that this paper won the award for best JAES paper for the calendar year 2016.

We discussed the research a little bit while it was ongoing, and then in more detail soon after publication. The research addressed a contentious issue in the audio industry. For decades, professionals and enthusiasts have engaged in heated debate over whether high resolution audio (beyond CD quality) really makes a difference. So I undertook a meta-analysis to assess the ability to perceive a difference between high resolution and standard CD quality audio. Meta-analysis is a popular technique in medical research, but this may be the first time that its been formally applied to audio engineering and psychoacoustics. Results showed a highly significant ability to discriminate high resolution content in trained subjects that had not previously been revealed. With over 400 participants in over 12,500 trials, it represented the most thorough investigation of high resolution audio so far.

Since publication, this paper was covered broadly across social media, popular press and trade journals. Thousands of comments were made on forums, with hundreds of thousands of reads.

Here’s one popular independent youtube video discussing it.

and an interview with Scientific American about it,

and some discussion of it in this article for Forbes magazine (which is actually about the lack of a headphone jack in the iPhone 7).

But if you want to see just how angry this research made people, check out the discussion on hydrogenaudio. Wow, I’ve never been called an intellectually dishonest placebophile apologist before 😉 .

In fact, the discussion on social media was full of misinformation, so I’ll try and clear up a few things here;

When I first started looking into this subject , it became clear that potential issues in the studies was a problem. One option would have been to just give up, but then I’d be adding no rigour to a discussion because I felt it wasn’t rigourous enough. Its the same as not publishing because you don’t get a significant result, only now on a meta scale. And though I did not have a strong opinion either way as to whether differences could be perceived, I could easily be fooling myself. I wanted to avoid any of my own biases or judgement calls. So I set some ground rules.

  • I committed to publishing all results, regardless of outcome.
  • A strong motivation for doing the meta-analysis was to avoid cherry-picking studies. So I included all studies for which there was sufficient data for them to be used in meta-analysis.  Even if I thought a study was poor, its conclusions seemed flawed or it disagreed with my own conceptions, if I could get the minimal data to do meta-analysis, I included it. I then discussed potential issues.
  • Any choices regarding analysis or transformation of data was made a priori, regardless of the result of that choice, in an attempt to minimize any of my own biases influencing the outcome.
  • I did further analysis to look at alternative methods of study selection and representation.

I found the whole process of doing a meta-analysis in this field to be fascinating. In audio engineering and psychoacoustics, there are a wealth of studies investigating big questions, and I hope others will use similar approaches to gain deeper insights and perhaps even resolve some issues.

How does this sound? Evaluating audio technologies

The audio engineering team here have done a lot of work on audio evaluation, both in collaboration with companies and as an essential part of our research. Some challenges come up time and time again, not just in terms of formal approaches, but also in terms of just establishing a methodology that works. I’m aware of cases where a company has put a lot of effort into evaluating the technologies that they create, only for it to make absolutely no difference in the product. So here are some ideas about how to do it, especially from an informal industry perspective.

– When you are tasked with evaluating a technology, you should always maintain a dialogue with the developer. More than anyone else, he or she knows what the tool is supposed to do, how it all works, what content might be best to use and has suggestions on how to evaluate it.

subjective evaluation details

– Developers should always have some test audio content that they use during development. They work with this content all the time to check that the algorithm is modifying or analysing the audio correctly. We’ll come back to this.

– The first stage of evaluation is documentation. Each tool should have some form of user guide, tester guide and developer guide. The idea is that if the technology remains unused for a period of time and those who worked on it have moved on, a new person can read the guides and have a good idea how to use it and test it, and a new developer should be able to understand the algorithm and the source code. Documentation should also include test audio content, preferably both input and output files with information on how the tool should be used with this content.

– The next stage of evaluation is duplication. You should be able run the tool as suggested in the guide and get the expected results with the test audio. If anything in the documentation is incorrect or incomplete, get in touch with the developers for more information.

– Then we have the collection stage. You need test content to evaluate the tool. The most important content is that which shows off exactly what the tool is intended to do. You should also gather content that tests challenging cases, or content where you need to ensure that the effect doesn’t make things worse.

– The preparation stage is next, though this may be performed in tandem with collection. With the test content, you may need to edit it, in order that its ready to use in testing. You may also want to create manually create target content, demonstrating ideal results, or at least of similar sound quality to expected results.

– Next is informal perceptual evaluation. This is lots of listening and playing around with the tool. The goal is to identify problems, find out when it works best, identify interesting cases, problematic or preferred parameter settings.

untitled

– Now on to semi-formal evaluation. Have focused questions that you need to find the answer to and procedures and methodologies to answer them. Be sure to document your findings, so that you can say what content causes what problem, how and why, etc. This needs to be done so that the problem can be exactly replicated by developers, and so that you can see if the problem still exists in the next iteration.

– Now comes the all-important listening tests. Be sure that the technology is at a level such that the test will give meaningful results. You don’t want to ask a bunch of people to listen and evaluate if the tool still has major known bugs. You also want to make sure that the test is structured in such a way so that it gives really useful information. This is very important, and often overlooked. Finding out that people preferred implementation A over implementation B is nice, but its much better to find out why, and how much, and if listeners would have preferred something else. You also want to do this test with lots of content. If, for instance only one piece of content is used in a listening test, then you’ve only found out that people prefer A over B for one example. So, generally, listening tests should involve lots of questions, lots of content, and everything should be randomised to prevent bias. You may not have time to do everything, but its definitely worth putting significant time and effort into listening test design.

Keeping Score for the Team

We’ve developed the Web Audio Evaluation Toolbox, designed to make listening test design and implementation straightforward and high quality.

– And there is the feedback stage. Evaluation counts for very little unless all the useful information gets back to developers (and possibly others), and influences further development. All this feedback needs to be prepared and stored, so that people can always refer back to it.

– Finally, there is revisiting and reiteration. If we identify a problem, or a place for improvement, we need to perform the same evaluation on the next iteration of the tool to ensure that the problem has indeed been fixed. Otherwise, issues perpetuate and we never actually know if the tool is improving and problems are resolved and closed.

By the way, I highly recommend the book Perceptual Audio Evaluation by Bech and Zacharov, which is the bible on this subject.

The Mix Evaluation Dataset

Still at the upcoming International Conference on Digital Audio Effects in Edinburgh, 5-8 September, our group’s Brecht De Man will be presenting a paper on his Mix Evaluation Dataset (a pre-release of which can be read here).
It is a collection of mixes and evaluations of these mixes, amassed over the course of his PhD research, that has already been the subject of several studies on best practices and perception of mix engineering processes.
With over 180 mixes of 18 different songs, and evaluations from 150 subjects totalling close to 13k statements (like ‘snare drum too dry’ and ‘good vocal presence’), the dataset is certainly the largest and most diverse of its kind.

Unlike the bulk of previous research in this topic, the data collection methodology presented here has maximally preserved ecological validity by allowing participating mix engineers to use representative, professional tools in their preferred environment. Mild constraints on software, such as the agreement to use the DAW’s native plug-ins, means that mixes can be recreated completely and analysed in depth from the DAW session files, which are also shared.

The listening test experiments offered a unique opportunity for the participating mix engineers to receive anonymous feedback from peers, and helped create a large body of ratings and free-field text comments. Annotation and analysis of these comments further helped understand the relative importance of various music production aspects, as well as correlate perceptual constructs (such as reverberation amount) with objective features.

Proportional representation of processors in subjective comments

An interface to browse the songs, audition the mixes, and dissect the comments is provided at http://c4dm.eecs.qmul.ac.uk/multitrack/MixEvaluation/, from where the audio (insofar the source is licensed under Creative Commons, or copyrighted but available online) and perceptual evaluation data can be downloaded as well.

The Mix Evaluation Dataset browsing interface