Audiology and audio production PhD studentship available for UK residents

BBC R&D and Queen Mary University of London’s School of Electronic Engineering and Computer Science have an ICASE PhD studentship available for a talented researcher. It will involve researching the idea of intelligent mixing of broadcast audio content for hearing impaired audiences.

Perceptual Aspects of Broadcast Audio Mixing for Hearing Impaired Audiences

Project Description

This project will explore new approaches to audio production to address hearing loss, a growing concern with an aging population. The overall goal is to investigate, implement and validate original strategies for mixing broadcast content such that it can be delivered with improved perceptual quality for hearing impaired people.

Soundtracks for television and radio content typically have dialogue, sound effects and music mixed together with normal-hearing listeners in mind. But a hearing impairment may result in this final mix sounding muddy and cluttered. First, hearing aid strategies will be investigated, to establish their limitations and opportunities for improving upon them with object- based audio content. Then different mixing strategies will be implemented to counteract the hearing impairment. These strategies will be compared against each other in extensive listening tests, to establish preferred approaches to mixing broadcast audio content.

Requirements and details

This is a fully funded, 4 year studentship which includes tuition fees, travel and consumables allowance and a stipend covering living expenses.

Skills in signal processing, audio production and auditory models are preferred, though we encourage any interested and talented researchers to apply. A successful candidate will have an academic background in engineering, science or maths.

The student will be based in London. Time will be spent  between QMUL’s Audio Engineering team (the people behind this blog) in the Centre for Digital Music and BBC R&D South Lab, with a minimum of six months at each.

The preferred start date is January 2nd, 2019.
All potential candidates must meet UK residency requirements, e.g. normally EU citizen with long-term residence in the UK. Please check the regulations if you’re unsure.

If interested, please contact Prof. Josh Reiss at joshua.reiss@qmul.ac.uk .

Advertisements

Do you hear what I hear? The science of everyday sounds.

I became a professor last year, which is quite a big deal here. On April 17th, I gave my Inaugural lecture, which is a talk on my subject area to the general public. I tried to make it as interesting as possible, with sound effects, videos, a live experiment and even a bit of physical comedy. Here’s the video, and below I have a (sort of) transcript.

The Start

 

What did you just hear, what’s the weather like outside? Did that sound like a powerful, wet storm with rain, wind and thunder, or did it sound fake, was something not quite right? All you had was nearly identical, simple signals from each speaker, and you only received two simple, nearly identical signals, one to each ear.  Yet somehow you were able to interpret all the rich details, know what it was and assess the quality.

Over the next hour or so, we’ll investigate the research that links deep understanding of sound and sound perception to wonderful new audio technologies. We’ll look at how market needs in the commercial world are addressed by basic scientific advances. We will explore fundamental challenges about how we interact with the auditory world around us, and see how this leads to new creative artworks and disruptive innovations.

Sound effect synthesis

But first, lets get back to the storm sounds you heard. Its an example of a sound effect, like what might be used in a film. Very few of the sounds that you hear in film or TV, and more and more frequently, in music too, are recorded live on set or on stage.

Such sounds are sometimes created by what is known as Foley, named after Jack Foley, a sound designer working in film and radio from the late 1920s all the way to the early 1960s. In its simplest form, Foley is basically banging pots and pans together and sticking a microphone next to it. It also involves building mechanical contraptions to create all sorts of sounds. Foley sound designers are true artists, but its not easy, its expensive and time consuming. And the Foley studio today looks almost exactly the same as it did 60 years ago. The biggest difference is that the photos of the Foley studios are now in colour.

foley in the pastfoley today

But most sound effects come from sample libraries. These consist of tens or hundreds of thousands of high quality recordings. But they are still someone else’s vision of the sounds you might need. They’re never quite right. So sound designers either ‘make do’ with what’s there, or expend effort trying to shape them towards some desired sound. The designer doesn’t have the opportunity to do creative sound design. Reliance on pre-recorded sounds has dictated the workflow. The industry hasn’t evolved, we’re simply adapting old ways to new problems.

In contrast, digital video effects have reached a stunning level of realism, and they don’t rely on hundreds of thousands of stock photos, like the sound designers do with sample libraries. And animation is frequently created by specifying the scene and action to some rendering engine, without designers having to manipulate every little detail.

There might be opportunities for better and more creative sound design. Instead of a sound effect as a chunk of bits played out in sequence, conceptualise the sound generating mechanism, a procedure or recipe that when implemented, produces the desired sound. One can change the procedure slightly, shaping the sound. This is the idea behind sound synthesis. No samples need be stored. Instead, realistic and desired sounds can be generated from algorithms.

This has a lot of advantages. Synthesis can produce a whole range of sounds, like walking and running at any speed on any surface, whereas a sound effect library has only a finite number of predetermined samples. Synthesized sounds can play for any amount of time, but samples are fixed duration. Synthesis can have intuitive controls, like the enthusiasm of an applauding audience. And synthesis can create unreal or imaginary sounds that never existed in nature, a roaring dragon for instance, or Jedi knights fighting with light sabres..

Give this to sound designers, and they can take control, shape sounds to what they want. Working with samples is like buying microwave meals, cheap and easy, but they taste awful and there’s no satisfaction. Synthesis on the other hand, is like a home-cooked meal, you choose the ingredients and cook it the way you wish. Maybe you aren’t a fine chef, but there’s definitely satisfaction in knowing you made it.

This represents a disruptive innovation, changing the marketplace and changing how we do things. And it matters; not just to professional sound designers, but to amateurs and to the consumers, when they’re watching a film and especially, since we’re talking about sound, when they are listening to music, which we’ll come to later in the talk.

That’s the industry need, but there is some deep research required to address it. How do you synthesise sounds? They’re complex, with lots of nuances that we don’t fully understand. A few are easy, like these-

I just played that last one to get rid of the troublemakers in the audience.

But many of those are artificial or simple mechanical sounds. And the rest?

Almost no research is done in isolation, and there’s a community of researchers devising sound synthesis methods. Many approaches are intended for electronic music, going back to the work of Daphne Oram and Delia Derbyshire at the BBC Radiophonics Workshop, or the French Musique Concrete movement. But they don’t need a high level of realism. Speech synthesis is very advanced, but tailored for speech of course, and doesn’t apply to things like the sound of a slamming door. Other methods concentrate on simulating a particular sound with incredible accuracy. They construct a physical model of the whole system that creates the sound, and the sound is an almost incidental output of simulating the system. But this is very computational and inflexible.

And this is where we are today. The researchers are doing fantastic work on new methods to create sounds, but its not addressing the needs of sound designers.

Well, that’s not entirely true.

The games community has been interested in procedural audio for quite some time. Procedural audio embodies the idea of sound as a procedure, and involves looking at lightweight interactive sound synthesis models for use in a game. Start with some basic ingredients; noise, pulses, simple tones. Stir them together with the right amount of each, bake them with filters that bring out various pitches, add some spice and you start to get something that sounds like wind, or an engine or a hand clap. That’s the procedural audio approach.

A few tools have seen commercial use, but they’re specialised and integration of new technology in a game engine is extremely difficult. Such niche tools will supplement but not replace the sample libraries.

A few years ago, my research team demonstrated a sound synthesis model for engine and motor sounds. We showed that this simple software tool could be used by a sound designer to create a diverse range of sounds, and it could match those in the BBC sound effect library, everything from a handheld electric drill to a large boat motor.

 

This is the key. Designed right, one synthesis model can create a huge, diverse range of sounds. And this approach can be extended to simulate an entire effects library using only a small number of versatile models.

That’s what you’ve been hearing. Every sound sample you’ve heard in this talk was synthesised. Artificial sounds created and shaped in real-time. And they can be controlled and rendered in the same way that computer animation is performed. Watch this example, where the synthesized propeller sounds are driven by the scene in just the same way as the animation was.

It still needs work of course. You could hear lots of little mistakes, and the models missed details. And what we’ve achieved so far doesn’t scale. We can create hundred of sounds that one might want, but not yet thousands or tens of thousands.

But we know the way forward. We have a precious resource, the sound effect libraries themselves. Vast quantities of high quality recordings, tried and tested over decades. We can feed these into machine learning systems to uncover the features associated with every type of sound effect, and then train our models to find settings that match recorded samples.

We can go further, and use this approach to learn about sound itself. What makes a rain storm sound different from a shower? Is there something in common with all sounds that startle us, or all sounds that calm us? The same approach that hands creativity back to sound designers, resulting in wonderful new sonic experiences, can also tell us so much about sound perception.

Hot versus cold

I pause, say “I’m thirsty”. I have an empty jug and pretend to pour

Pretend to throw it at the audience.

Just kidding. That’s another synthesised sound. It’s a good example of this hidden richness in sounds. You knew it was pouring because the gesture helped, and there is an interesting interplay between our visual and auditory senses. You also heard bubbles, splashes, the ring of the container that its poured into. But do you hear more?

I’m going to run a little experiment. I have two sound samples, hot water being poured and cold water being poured. I want you to guess which is which.

Listen and try it yourself at our previous blog entry on the sound of hot and cold water.

I think its fascinating that we can hear temperature. There must be some physical phenomenon affecting the sound, which we’ve learned to associate with heat. But what’s really interesting is what I found when I looked online. Lots of people have discussed this. One argument goes ‘Cold water is more viscuous or sticky, and so it gives high pitched sticky splashes.’ That makes sense. But another argument states ‘There are more bubbles in a hot liquid, and they produce high frequency sounds.’

Wait, they can’t both be right. So we analysed recordings of hot and cold water being poured, and it turns out they’re both wrong! The same tones are there in both recordings, so essentially the same pitch. But the strengths of the tones are subtly different. Some sonic aspect is always present, but its loudness is a function of temperature. We’re currently doing analysis to find out why.

And no one noticed! In all the discussion, no one bothered to do a little critical analysis or an experiment. It’s an example of a faulty assumption, that because you can come up with a solution that makes sense, it should be the right one. And it demonstrates the scientific method; nothing is known until it is tested and confirmed, repeatedly.

Intelligent Music Production

Its amazing what such subtle changes can do, how they can indicate elements that one never associates with hearing. Audio production thrives on such subtle changes and there is a rich tradition of manipulating them to great effect. Music is created not just by the composer and performers. The sound engineer mixes and edits it towards some artistic vision. But phrasing the work of a mixing engineer as an art form is a double-edged sword, we aren’t doing justice to the technical challenges. The sound engineer is after all, an engineer.

In audio production, whether for broadcast, live sound, games, film or music, one typically has many sources. They each need to be heard simultaneously, but can all be created in different ways, in different environments and with different attributes. Some may mask each other, some may be too loud or too quiet. The final mix should have all sources sound distinct yet contribute to a nice clean blend of the sounds. To achieve this is very labour intensive and requires a professional engineer. Modern audio production systems help, but they’re incredibly complex and all require manual manipulation. As technology has grown, it has become more functional but not simpler for the user.

In contrast, image and video processing has become automated. The modern digital camera comes with a wide range of intelligent features to assist the user; face, scene and motion detection, autofocus and red eye removal. Yet an audio recording or editing device has none of this. It is essentially deaf; it doesn’t listen to the incoming audio and has no knowledge of the sound scene or of its intended use. There is no autofocus for audio!

Instead, the user is forced to accept poor sound quality or do a significant amount of manual editing.

But perhaps intelligent systems could analyse all the incoming signals and determine how they should be modified and combined. This has the potential to revolutionise music production, in effect putting a robot sound engineer inside every recording device, mixing console or audio workstation. Could this be achieved? This question gets to the heart of what is art and what is science, what is the role of the music producer and why we prefer one mix over another.

But unlike replacing sound effect libraries, this is not a big data problem. Ideally, we would get lots of raw recordings and the produced content that results. Then extract features from each track and the final mix in order to establish rules for how audio should be mixed. But we don’t have the data. Its not difficult to access produced content. But the initial multitrack recordings are some of the most highly guarded copyright material. This is the content that recording companies can use over and over, to create remixes and remastered versions. Even if we had the data, we don’t know the features to use and we don’t know how to manipulate those features to create a good mix. And mixing is a skilled craft. Machine learning systems are still flawed if they don’t use expert knowledge.

There’s a myth that as long as we get enough data, we can solve almost any problem. But lots of problems can’t be tackled this way. I thought weather prediction was done by taking all today’s measurements of temperature, humidity, wind speed, pressure… Then tomorrow’s weather could be guessed by seeing what happened the day after there were similar conditions in the past. But a meteorologist told me that’s not how it works. Even with all the data we have, its not enough. So instead we have a weather model, based on how clouds interact, how pressure fronts collide, why hurricanes form, and so on. We’re always running this physical model, and just tweaking parameters and refining the model as new data comes in. This is far more accurate than relying on mining big data.

You might think this would involve traditional signal processing, established techniques to remove noise or interference in recordings. Its true that some of what the sound engineer does is correct artifacts due to issues in the recording process. And there are techniques like echo cancellation, source separation and noise reduction that can address this. But this is only a niche part of what the sound engineer does, and even then the techniques have rarely been optimised for real world applications.

There’s also multichannel signal processing, where one usually attempts to extract information regarding signals that were mixed together, like acquiring a GPS signal buried in noise. But in our case, we’re concerned with how to mix the sources together in the first place. This opens up a new field which involves creating ways to manipulate signals to achieve a desired output. We need to identify multitrack audio features, related to the relationships between musical signals, and develop audio effects where the processing on any sound is dependent on the other sounds in the mix.

And there is little understanding of how we perceive audio mixes. Almost all studies have been restricted to lab conditions; like measuring the perceived level of a tone in the presence of background noise. This tells us very little about real world cases. It doesn’t say how well one can hear lead vocals when there are guitar, bass and drums.

Finally, best practices are not understood. We don’t know what makes a good mix and why one production will sound dull while another makes you laugh and cry, even though both are on the same piece of music, performed by competent sound engineers. So we need to establish what is good production, how to translate it into rules and exploit it within algorithms. We need to step back and explore more fundamental questions, filling gaps in our understanding of production and perception. We don’t know where the rules will be found, so multiple approaches need to be taken.

The first approach is one of the earliest machine learning methods, knowledge engineering. Its so old school that its gone out of fashion. It assumes experts have already figured things out, they are experts after all. So lets look at the sound engineering literature and work with experts to formalise their approach. Capture best practices as a set of rules and processes. But this is no easy task. Most sound engineers don’t know what they did. Ask a famous producer what he or she did on a hit song and you often get an answer like ‘I turned the knob up to 11 to make it sound phat.” How do you turn that into a mathematical equation? Or worse, they say it was magic and can’t be put into words.

To give you an idea, we had a technique to prevent acoustic feedback, that high pitched squeal you sometimes hear when a singer first approaches a microphone. We thought we had captured techniques that sound engineers often use, and turned it into an algorithm. To verify this, I was talking to an experienced live sound engineer and asked when was the last time he had feedback at one of the gigs where he ran the sound. ‘Oh, that never happens for me,’ he said. That seemed strange. I knew it was a common problem. ‘Really, never ever?’ ‘No, I know what I’m doing. It doesn’t happen.’ ‘Not even once?’ ‘Hmm, maybe once but its extremely rare.’ ‘Tell me about it.’ ‘Well, it was at the show I did last night…’! See, it’s a tricky situation. The sound engineer does have invaluable knowledge, but also has to protect their reputation as being one of a select few that know the secrets of the trade.

So we’re working with domain experts, generating hypotheses and formulating theories. We’ve been systematically testing all the assumptions about best practices and supplementing them with lots of listening tests. These studies help us understand how people perceive complex sound mixtures and identify attributes necessary for a good sounding mix. And we know the data will help. So we’re also curating multitrack audio, with detailed information about how it was recorded, often with multiple mixes and evaluations of those mixes.

By combining these approaches, my team have developed intelligent systems that automate much of the audio and music production process. Prototypes analyse all incoming sounds and manipulate them in much the same way a professional operates the controls at a mixing desk.

I didn’t realise at first the importance of this research. But I remember giving a talk once at a convention in a room that had panel windows all around. The academic talks are usually half full. But this time it was packed, and I could see faces outside all pressed up against the windows. They all wanted to find out about this idea of automatic mixing. Its  a unique opportunity for academic research to have transformational impact on an entire industry. It addresses the fact that music production technologies are often not fit for purpose. Intelligent mixing systems automate the technical and mundane, allowing sound engineers to work more productively and creatively, opening up new opportunities. Audio quality could be improved, amateur musicians can create high quality mixes of their content, small venues can put on live events without needing a professional engineer, time and preparation for soundchecks could be drastically reduced, and large venues and broadcasters could significantly cut manpower costs.

Its controversial. We once entered an automatic mix in a student recording competition as a sort of Turing Test. Technically, we were cheating, because all the mixes were supposed to be made by students, but in our case it was made by an ‘artificial intelligence’ created by a student. We didn’t win of course, but afterwards I asked the judges what they thought of the mix, and then told them how it was done. The first two were surprised and curious when I told them how it was done. But the third judge offered useful comments when he thought it was a student mix. But when I told him that it was an ‘automatic mix’, he suddenly switched and said it was rubbish and he could tell all along.

Mixing is a creative process where stylistic decisions are made. Is this taking away creativity, is it taking away jobs? Will it result in music sounding more the same? Such questions come up time and time again with new technologies, going back to 19th century protests by the Luddites, textile workers who feared that time spent on their skills and craft would be wasted as machines could replace their role in industry.

These are valid concerns, but its important to see other perspectives. A tremendous amount of audio production work is technical, and audio quality would be improved by addressing these problems. As the graffiti artist Banksy said;

“All artists are willing to suffer for their work. But why are so few prepared to learn to draw?” – BaNKSY

Girl-with-a-Balloon-by-Banksy

Creativity still requires technical skills. To achieve something wonderful when mixing music, you first have to achieve something pretty good and address issues with masking, microphone placement, level balancing and so on.

The real benefit is not replacing sound engineers. Its dealing with all those situations when a talented engineer is not available; the band practicing in the garage, the small pub or restaurant venue that does not provide any support, or game audio, where dozens of incoming sounds need to be mixed and there is no miniature sound guy living inside the games console.

High resolution audio

The history of audio production is one of continual innovation. New technologies arise to make the work easier, but artists also figure out how to use that technology in new creative ways. And the artistry is not the only element music producers care about. They’re interested, some would say obsessed, with fidelity. They want the music consumed at home to be as close as possible to the experience of hearing it live. But we consume digitial audio. Sound waves are transformed into bits and then transformed back to sound when we listen. We sample sound many times a second and render each sample with so many bits. Luckily, there is a very established theory on how to do the sampling.

We only hear frequencies up to about 20 kHz. That’s a wave which repeats 20,000 times a second. There’s a famous theorem by Claude Shannon and Harry Nyquist which states that you need twice that number of samples a second to fully represent a signal up to 20 kHz, so sample at 40,000 samples a second, or 40 kHz. So the standard music format, 16 bit samples and 44.1 kHz sampling rate, should be good enough.

Inaugural shared_Page_11

But most music producers want to work with higher quality formats and audio companies make equipment for recording and playing back audio in these high resolution formats. Some people swear they hear a difference, others say it’s a myth and people are fooling themselves. What’s going on? Is the sampling theorem, which underpins all signal processing, fundamentally wrong? Have we underestimated the ability of our own ears and in which case the whole field of audiology is flawed? Or could it be that the music producers and audiophiles, many of whom are renowned for their knowledge and artistry, are deluded?

Around the time I was wondering about this, I went to a dinner party and was sat across from a PhD student. His PhD was in meta-analysis, and he explained that it was when you gather all the data from previous studies on a question and do formal statistical analysis to come up with more definitive results than the original studies. It’s a major research method in evidence-based medicine, and every few weeks a meta-analysis makes headlines because it shows the effectiveness or lack of effectiveness of treatments.

So I set out to do a meta-analysis. I tried to find every study that ever looked at perception of high resolution audio, and get their data. I scoured every place they could have been published and asked everyone in the field, all around the world. One author literally found his old data tucked away in the back of a filing cabinet. Another couldn’t get permission to provide the raw data, but told me enough about it for me to write a little program that ran through all possible results until it found the details that would reproduce the summary data as well. In the end, I found 18 relevant studies and could get data from all of them except one. That was strange, since it was the most famous study. But the authors had ‘lost’ the data, and got angry with me when I asked them for details about the experiment.

The results of the meta-analysis were fascinating, and not at all what I expected. There were researchers who thought their data had or hadn’t shown an effect, but when you apply formal analysis, it’s the opposite. And a few experiments had major flaws. For instance, in one experiment many of the high resolution recordings were actually standard quality, which means there never was a difference to be perceived. In another, test subjects were given many versions of the same audio, including a direct live feed, and asked which sounds closer to live. People actually ranked the live feed as sounding least close to live, indicating they just didn’t know what to listen for.

As for the one study where the authors lost their data? Well, they had published some of it, but it basically went like this. 55 participants listened to many recordings many times and could not discriminate between high resolution and standard formats. But men discriminated more than women, older far more than younger listeners, audiophiles far more than nonexperts. Yet only 3 people ever guessed right more than 6 times out of 10. The chance of all this happening by luck if there really was no difference is less likely than winning the lottery. Its extremely unlikely even if there was a difference to be heard. Conclusion: they faked their data.

And this was the study which gave the most evidence that people couldn’t hear anything extra in high resolution recordings. In fact the studies with the most flaws were those that didn’t show an effect. Those that found an effect were generally more rigourous and took extra care in their design, set-up and analysis. This was counterintuitive. People are always looking for a new cure or a new effect. But in this case, there was a bias towards not finding a result. It seems researchers wanted to show that the claims of hearing a difference are false.

The biggest factor was training. Studies where subjects, even those experienced working with audio, just came in and were asked to state when two versions of a song were the same, rarely performed better than chance. But if they were told what to listen for, given examples, were told when they got it right or wrong, and then came back and did it under blind controlled conditions, they performed far better. All studies where participants were given training gave higher results than all studies where there was no training. So it seems we can hear a difference between standard and high resolution formats, we just don’t know what to listen for. We listen to music everyday, but we do it passively and rarely focus on recording quality. We don’t sit around listening for subtle differences in formats, but they are there and they can be perceived. To audiophiles, that’s a big deal.

In 2016 I published this meta-analysis in the Journal of the Audio Engineering Society, and it created a big splash. I had a lot of interviews in the press, and it was discussed on social media and internet forums. And that’s when I found out, people on the internet are crazy! I was accused of being a liar, a fraud, paid by the audio industry, writing press releases, working the system and pushing an agenda. These criticisms came from all sides, since differences were found which some didn’t think existed, but they also weren’t as strong as others wanted them to be. I was also accused of cherry-picking the studies, even though one of the goals of the paper was to avoid exactly that, which is why I included every study I could find.

But my favorite comment was when someone called me an ‘intellectually dishonest placebophile apologist’. Whoever wrote that clearly spent time and effort coming up with a convoluted insult.

It wasn’t just people online who were crazy. At an audio engineering society convention, two people were discussing the paper. One was a multi-grammy award winning mixing engineer and inventor, the other had a distinguished career as chief scientist at a major audio company.

What started as discussion escalated to heated argument, then shouting, then pushing and shoving. It was finally broken up when a famous mastering engineer intervened. I guess I should be proud of this.

I learned what most people already know, how very hard it is to change people’s minds once an opinion has been formed. And people rarely look at the source. Instead, they rely on biased opinions discussing that source. But for those interested in the issue whose minds were not already made up, I think the paper was useful.

I’m trying to figure out why we hear this difference. Its not due to problems with the high resolution audio equipment, that was checked in every study that found a difference. There’s no evidence that people have super hearing or that the sampling theorem is violated. But we need to remove all the high frequencies in a signal before we convert it to digital, even if we don’t hear them. That brings up another famous theorem, the uncertainty principle. In quantum mechanics, it tells us that we can’t resolve a particle’s position and momentum at the same time. In signal processing, it tells us that restricting a signal’s frequency content will make us less certain about its temporal aspects. When we remove those inaudible high frequencies, we smear out the signal. It’s a small effect, but this spreading the sound a tiny bit may be audible.

The End

The sounds around us shape our perception of the world. We saw that in films, games, music and virtual reality, we recreate those sounds or create unreal sounds to evoke emotions and capture the imagination. But there is a world of fascinating phenomena related to sound and perception that is not yet understood. Can we create an auditory reality without relying on recorded samples? Could a robot replace the sound engineer, should it? Investigating such questions has led to a deeper understanding of auditory perception, and has the potential to revolutionise sound design and music production.

What are the limits of human hearing? Do we make far greater use of auditory information than simple models can account for? And if so, can we feed this back for better audio production and sound design?

Inaugural shared_Page_13

To answer these questions, we need to look at the human auditory system. Sound waves are transferred to the inner ear, which contains one of the most amazing organs in the human body, the cochlea. 3,500 inner hair cells line the cochlea, and resonate in response to frequencies across the audible range. These hair cells connect to a nerve string containing 30,000 neurons which can fire 600 pulses a second. So the brainstem receives up to 18 million pulses per second. Hence the cochlea is a very high resolution frequency analyser with digital outputs. Audio engineers would pay good money for that sort of thing, and we have two of them, free, inside our heads!

The pulses carry frequency and temporal information about sounds. This is sent to the brain’s auditory cortex, where hearing sensations are stored as aural activity images. They’re compared with previous aural activity images, other sensory images and overall context to get an aural scene representing the meaning of hearing sensations. This scene is made available to other processes in the brain, including thought processes such as audio assessment. It’s all part of 100 billion brain cells with 500 trillion connections, a massively powerful machine to manage body functions, memory and thinking.

These connections can be rewired based on experiences and stimuli. We have the power to learn new ways to process sounds. The perception is up to us. Like we saw with hot and cold water sounds, with perception of sound effects and music production, with high resolution audio, we have the power to train ourselves to perceive the subtlest aspects. Nothing is stopping us from shaping and appreciating a better auditory world.

Credits

All synthesised sounds created using FXive.

Sound design by Dave Moffat.

Synthesised sounds by Thomas Vassallo, Parham Bahadoran, Adan Benito and Jake Lee

Videos by Enrique Perez Gonzalez (automatic mixing) and Rod Selfridge (animation).

Special thanks to all my current and former students and researchers, collaborators and colleagues. See the video for the full list.

And thanks to my lovely wife Sabrina and daughter Eliza.

Your phd examination – the best defense is a good offense

Previously, I’ve written a few blog entries giving research advice, like ‘So you want to write a research paper‘ and ‘What a PhD thesis is really about… really!‘ I thought I’d come up with a good title for this blog entry, but then I saw this.

thesis_defense

The PhD examination is certainly one of the most important moments in a researcher’s career. Its structure differs from country to country, institution to institution, and subject to subject. In some places, the PhD examination is open to the public, and failure is very rare. The student wouldn’t get to that stage unless the committee was confident that only minor issues remained. It might even be a bit of an event, with the committee wearing gowns and some of the student’s family attending.

But in most countries and most subjects it’s a bit more adversarial and passing is not guaranteed. It usually has a small committee. A public talk might be given, but the question and answer sessions are just the student and the committee.

There are lots and lots of guidance online about how to prepare for a PhD exam, and I’m not going to try to summarise them. Instead, I’ll give you some insights from my own experience, being examined, preparing others for a phd examination, or doing the examination myself. And having had experience with students who ranged from near flawless to, unfortunately, almost hopeless.

First off, congratulations for getting to this stage. That is already a major achievement. And keep in mind is that ultimately, it’s the document itself that is most important. If your thesis is strong, and you can explain it and discuss it well, then you’re already in a good position for the defense.

phd031905s

I’ve noticed that there are questions which seem relevant for me to ask in most PhD examinations, and other examiners tend to ask similar ones. So you can certainly prepare for them. The first are the sort of general PhD study questions; what’s it all about? Here’s a few typical ones.

  • Summarise your key findings?
  • What is your main contribution?
  • What is novel/significant/new?
  • What is the impact? your contribution to the field?
  • What are the weakest parts of your thesis?
  • Knowing what you know now, what would you change?

If there were aspects of your PhD study that were unusual, they also might ask you just to clarify things. For instance, I once examined a PhD whose research had taken a very long time. I wanted to know if there was research that hadn’t made it into the thesis, or whether there were technical issues that made the research more challenging. So I asked a question something like, ‘When did you start your phd research? Were there technical reasons it took so long?’ As it turned out, it was due to a perfectly understandable change of supervisor.

And the examiners will want to know what you know about your subject area and the state of the art.

  • Who else is doing research in this subject?
  • What are the most significant results in the last few years?
  • How does your approach differ from others?
  • Please characterise and summarise other approaches to your topic.

Then there will be some questions specific to your field. These questions might touch on the examiners’ knowledge, or on specific aspects of the literature that may or may not have been mentioned in the thesis.

  • Explain, in your own words, the following concepts -.
  • Compare the – and -. What are the fundamental differences?
  • Is all of your work relevant to other — challenges?
  • Why use —? Are there other approaches?
  • How does your work connect to — and — research?

And many examiners will want to know about the impact of the research so far, e.g. publications or demonstrators. If you do have any demonstrations (audio samples, videos, software, interfaces), it’s a good idea to present them, or at least be ready to present them.

  • Are the community aware of your work? Are people using your software?
  • Do you have any publications?
  • Which (other) results could you publish, and where?
  • Have you attended or presented at any conferences? What did you learn from them?

Ct_XEoLXgAAniid

Then typically, the examiners start diving into the fine details of the thesis. So you should know where to find anything in your own document. Its also a good idea to reread your whole document a couple of days before the examination, so that its all fresh in your mind. It could have been a long time since you wrote it!

Spider-viva-1

And best of luck to you!

 

The Audiovisual bounce-inducing effect (Bounce, bounce, bounce… Part II)

Last week we talked about bouncing sounds. Its very much a physical phenomenon, but a lot has been made of a perceptual effect sometimes referred to as the ‘Audiovisual bounce-inducing effect.’ The idea is that if someone is presented with two identical objects moving on a screen in opposing direction and crossing paths, they appear to do just that- cross paths. But if a short sound is played at the moment they first intersect, they appear to bounce off each other.

I’ve read a couple of papers on this, and browsed a few more, and I’ve yet to see anything interesting here.

Consider the figures below. On the left are the two paths taken by the two objects, one with short dashes in blue, one with long dashes in red. Since they are identical (usually just circles on a computer screen), it could just as easily be the paths shown on the right.

AVbounceillusion

So which one is perceived? Well, two common occurrences are;

– Two objects, and one of them passes behind the other. This usually doesn’t produce a sound.
– Two objects, and they bounce off each other, producing the sound of a bounce.

If you show the objects without a sound, it perfectly matches the first scenario. It would be highly unlikely to perceive this as a bounce since then we would expect to hear something. On the other hand, if you play a short sound at the moment the two objects interact, even if it doesn’t exactly match a ‘bounce sound’, it is still a noise at the moment of visual contact. And so this is much more likely to be perceived as a bounce (which clearly produces a sound) than as passing by (which doesn’t). Further studies showed that the more ‘bounce-like’ the sound is, the more likely it is to be perceived as a bounce, and its less likely to be perceived as a bounce if similar sounds are also played when the objects do not intersect.

The literature gives all sorts of fanciful explanations for the basic phenomenon. And maybe someone can enlighten me as to why this is interesting. I suppose, if one begins with the assumption that auditory cues (even silence) do not play a role in perception of motion, then this may be surprising. But to me, this just seems to match everyday experience of sight and sound, and is intuitively obvious.

I should also note that in one of the papers on the ‘Audiovisual bounce-inducing effect’ (Watanabe 2001), the authors committed the cardinal sin of including one of the authors as a test subject and performing standard statistical analysis on the results. There are situations when this sort of thing may be acceptable or even appropriate*, but in which case one should be very careful to take that into account in any analysis and interpretation of results.

* In the following two papers, participants rated multrack audio mixes, where one of the mixes had been created by the participant. But this was intentional, to see whether the participant would rate their own mix highly.

And here’s just a few references on the audiovisual bounce inducing effect.

Grassi M, Casco C. Audiovisual bounce-inducing effect: When sound congruence affects grouping in vision. Attention, Perception, & Psychophysics. 2010 Feb 1;72(2):378-86.

Remijn GB, Ito H, Nakajima Y. Audiovisual integration: An investigation of the ‘streaming-bouncing’phenomenon. Journal of physiological anthropology and applied human science. 2004;23(6):243-7.

Watanabe K, Shimojo S. When sound affects vision: effects of auditory grouping on visual motion perception. Psychological Science. 2001 Mar;12(2):109-16.

Zeljko M, Grove PM. Sensitivity and Bias in the Resolution of Stream-Bounce Stimuli. Perception. 2017 Feb;46(2):178-204.

How does this sound? Evaluating audio technologies

The audio engineering team here have done a lot of work on audio evaluation, both in collaboration with companies and as an essential part of our research. Some challenges come up time and time again, not just in terms of formal approaches, but also in terms of just establishing a methodology that works. I’m aware of cases where a company has put a lot of effort into evaluating the technologies that they create, only for it to make absolutely no difference in the product. So here are some ideas about how to do it, especially from an informal industry perspective.

– When you are tasked with evaluating a technology, you should always maintain a dialogue with the developer. More than anyone else, he or she knows what the tool is supposed to do, how it all works, what content might be best to use and has suggestions on how to evaluate it.

subjective evaluation details

– Developers should always have some test audio content that they use during development. They work with this content all the time to check that the algorithm is modifying or analysing the audio correctly. We’ll come back to this.

– The first stage of evaluation is documentation. Each tool should have some form of user guide, tester guide and developer guide. The idea is that if the technology remains unused for a period of time and those who worked on it have moved on, a new person can read the guides and have a good idea how to use it and test it, and a new developer should be able to understand the algorithm and the source code. Documentation should also include test audio content, preferably both input and output files with information on how the tool should be used with this content.

– The next stage of evaluation is duplication. You should be able run the tool as suggested in the guide and get the expected results with the test audio. If anything in the documentation is incorrect or incomplete, get in touch with the developers for more information.

– Then we have the collection stage. You need test content to evaluate the tool. The most important content is that which shows off exactly what the tool is intended to do. You should also gather content that tests challenging cases, or content where you need to ensure that the effect doesn’t make things worse.

– The preparation stage is next, though this may be performed in tandem with collection. With the test content, you may need to edit it, in order that its ready to use in testing. You may also want to create manually create target content, demonstrating ideal results, or at least of similar sound quality to expected results.

– Next is informal perceptual evaluation. This is lots of listening and playing around with the tool. The goal is to identify problems, find out when it works best, identify interesting cases, problematic or preferred parameter settings.

untitled

– Now on to semi-formal evaluation. Have focused questions that you need to find the answer to and procedures and methodologies to answer them. Be sure to document your findings, so that you can say what content causes what problem, how and why, etc. This needs to be done so that the problem can be exactly replicated by developers, and so that you can see if the problem still exists in the next iteration.

– Now comes the all-important listening tests. Be sure that the technology is at a level such that the test will give meaningful results. You don’t want to ask a bunch of people to listen and evaluate if the tool still has major known bugs. You also want to make sure that the test is structured in such a way so that it gives really useful information. This is very important, and often overlooked. Finding out that people preferred implementation A over implementation B is nice, but its much better to find out why, and how much, and if listeners would have preferred something else. You also want to do this test with lots of content. If, for instance only one piece of content is used in a listening test, then you’ve only found out that people prefer A over B for one example. So, generally, listening tests should involve lots of questions, lots of content, and everything should be randomised to prevent bias. You may not have time to do everything, but its definitely worth putting significant time and effort into listening test design.

Keeping Score for the Team

We’ve developed the Web Audio Evaluation Toolbox, designed to make listening test design and implementation straightforward and high quality.

– And there is the feedback stage. Evaluation counts for very little unless all the useful information gets back to developers (and possibly others), and influences further development. All this feedback needs to be prepared and stored, so that people can always refer back to it.

– Finally, there is revisiting and reiteration. If we identify a problem, or a place for improvement, we need to perform the same evaluation on the next iteration of the tool to ensure that the problem has indeed been fixed. Otherwise, issues perpetuate and we never actually know if the tool is improving and problems are resolved and closed.

By the way, I highly recommend the book Perceptual Audio Evaluation by Bech and Zacharov, which is the bible on this subject.

So you want to write a research paper

The Audio Engineering research team here submit a lot of conference papers. In our internal reviewing and when we review submissions by others, certain things come up again and again. I’ve compiled all this together as some general advice for putting together a research paper for an academic conference, especially in engineering or computer science. Of course, there are always exceptions, and the advice below doesn’t always apply. But its worth thinking of this as a checklist to catch errors and issues in an early draft.

researchpaper
Abstract
Make sure the abstract is self-contained. Don’t assume the person reading the abstract will read the paper, or vice-versa. Avoid acronyms. Be sure to actually say what the results were and what you found out, rather than just saying you applied the techniques and analysed the data that came out.
The abstract is part summary of the paper, and part an advertisement for why someone should read the paper. And keep in mind that far more people read the abstract than read the paper itself.
Introduction
Make clear what the problem is and why it is important. Why is this paper needed, and what is going to distinguish this paper from the others?
In the last paragraph, outline the structure of the rest of the paper. But make sure that it is specific to the structure of the paper.

Background/state of the art/prior work – this could be a subsection of introduction, text within the introduction, or its own section right after the introduction. What have others done, what is the most closely related work? Don’t just list a lot of references. Have something to say about each reference, and relate them to the paper. If a lot of people have approached the same or similar problems, consider putting the methods into a table, where for each method, you have columns for short description, the reference(s), their various properties and their assumptions. If you think no one has dealt with your topic before, you probably just haven’t looked deep enough 😉 . Regardless, you should still explain what is the closest work, perhaps highlighting how they’ve overlooked your specific problem.

Problem formulation – after describing state of the art, this could be a subsection of introduction, text within the introduction, or its own section. Give a clear and unambiguous statement of the problem, as you define it and as it is addressed herein. The aim here is to be rigorous, and remove any doubt about what you are doing. It also allows other work to be framed in the same way. When appropriate, this is described mathematically, e.g., we define these terms, assume this and that, and we attempt to find an optimal solution to the following equation.

Method/Analysis/Results
The structure of this, the core of the paper, is highly dependent on the specific work. One good approach is to have quite a lot of figures and tables. Then most of the writing is mainly just explaining and discusses these figures and tables, and the ordering of these should be mostly clear.
A typical ordering is
Describe the method, giving block diagrams where appropriate
Give any plots that analyse and illustrate the method, but aren’t using the method to produce results that address the problem
Present the results of using your method to address the problem. Keep the interpretation of the results here short, unless detailed explanation of a result it is needed to justify the next result that is presented. If there is lengthy discussion or interpretation, then leave that to a discussion section.

Equations and notation
For most papers in signal processing and related fields, at least a few equations are expected. The aim with equations is always to make the paper more understandable and less ambiguous. So avoid including equations just for the sake of it, avoid equations if they are just an obvious intermediate step, or if they aren’t really used in any way (e.g. ‘we use the Fourier transform, which by the way, can be given in this equation. Moving on…’), do use equations if they clear up any confusion when a technical concept is explained just with text.
Make sure every equation can be fully understood. All terms and notation should be defined, right before or right after they are used in the text. The logic or process of going from one equation to the next should be made clear.
Tables and figures
Where possible, these should be somewhat self-contained. So one should be able to look at a figure and understand it without reading the paper. If that isn’t possible, then it should be understood just by looking at the figure and figure caption. If not, then by just looking at the figure, caption and a small amount of text where the figure is described.
Figure captions typically go immediately below figures, but table captions typically above tables.
Label axes in figures wherever possible, and give units. If units are not appropriate, make clear that an axis is unitless. For any text within a figure, make sure that the font size used is close to the font size of the main text in the paper. Often, if you import a figure from software intending for viewing on a screen (like matlab), then the font can appear miniscule when the figure is imported into a paper.
Make sure all figures and tables are numbered and are all referenced, by their number, in the main text. Position them close to where they are first mentioned in the text. Don’t use phrasing that refers to their location, like ‘the figure below’ or ‘the table on the next page’, partly because their location may change in the final version.
Make sure all figures are high quality. Print out the paper before submitting and check that it all looks good, is high resolution, and nicely formatted.

researchpaperconclusion

Discussion/Future work/conclusion
Discussion and future work may be separate sections or part of the conclusion. Discussion is useful if the results need to be interpreted, but is often kept very brief in short papers where the results may speak for themselves.
Future work is not about what the author plans to do next. Its about research questions that arose or were not addressed, and research directions that are worth pursuing. The answers to these research questions may be pursued by the author or others. Here, you are encouraging others to build on the work in this paper, and suggesting to them the most promising directions and approaches. Future work is usually just a couple sentences or couple paragraphs at the end of conclusion, unless there is something particularly special about it.
The conclusion should not simply repeat the abstract or summarise the paper, though there may be an element of that. Its about getting across what were the main things that the reader should take away and remember. What was found out? What was surprising? What are the main insights that arose? If the research question is straightforward and directly addressed, what was the answer?

researchpaperbib

References
The most important criterion for references is to cite wherever it justifies a claim, clarifies a point, identifies where an idea is coming from someone else, or helps the reader find pertinent additional material. If you’re dealing with a very niche or underexplored topic, you may wish to give a full review of all existing literature on the subject.
Aim for references to come from high impact, recent peer reviewed journal articles, or as close to that as possible. So for instance, choose a journal over a conference article if you can, but maybe a highly cited conference paper over an obscure journal paper.
Avoid using web site references. If the reference is essentially just a URL, then put that directly in the text or as a footnote, but not as a citation. And no one cares when you accessed the website so no need to say ‘accessed on [date]’. If it’s a temporary record that may have only been there for a short period of time before the paper submission date, its probably not a reliable reference, won’t help the reader and you should probably find an alternative citation.
Check your reference formatting, especially if you use someone else’s reference library or some automatically generated citations. For instance some citations will have a publisher and a conference name, so it reads as ‘the X Society Conference, published by the X Society.
Be consistent. So for instance, have all references use author initials, or none of them. Always use journal abbreviations, or never use them. Always include the city of a conference, or never do it. And so on.

What a PhD thesis is really about… really!

I was recently pointed to a blog post about doing a PhD. It had lots of interesting advice, mainly along the lines of ‘if you are finding it difficult, don’t worry, that probably means you’re doing it right.’ True, and good advice to keep in mind for PhD researchers who might be feeling lost in the wilderness. But it reminded me that I’d recently given a talk about PhD research, based on experience I have either examining or supervising dozens of theses, and some of the main points that I made are worth sharing. And I think they are applicable to research-based PhDs across lots of different disciplines.

First off, lets think of a few things that a PhD thesis is not supposed to be;

thesis

  • A thesis isn’t easy

See the blog I mentioned above. Easy research may still be publishable, but its not going to make a thesis. If you’re finding it easy, you’re probably missing the point.

  • A thesis is not only what you already know

I’ve known researchers unwilling to learn a bit of new maths, or learn what’s going on under the hood in the software they use. Expect the research to lead you out of your comfort zone.

  • A thesis isn’t just something you do to get a phd

It’s not simply a box that needs to be checked off so that you can get ‘Doctor’ next to your name.

  • A thesis isn’t obvious

If you and most others can predict the outcome in advance based on common sense, then why do it?

  • A thesis isn’t just several years of hard work

It may take years of hard work to achieve, but that’s not the point. You don’t get a PhD just for time and effort.

  • A thesis isn’t about building a system

that’s challenging and technical, and may be a byproduct of the research, but its not the research result.

  • A thesis isn’t a lot of little achievements

I’ve seen theses that read a bit like ‘I did this little interesting thing, then this other one, then another one…’ That doesn’t look good. If no one contribution is strong enough to be a thesis, then just putting them all into one document still isn’t a strong contribution. Note that in some cases, you can do a ‘thesis by publication’, which is a collection of papers, usually with an introduction and some wrapper information. But in which case it should still tie together with an overall contribution.

So with that in mind, lets now think about what a thesis is, with a few highlighted aspects that are often neglected.

thesis1

  • A thesis advances knowledge

That’s the key. Some new understanding, new insights, backed up by evidence and critical thinking. But this also suggests that it needs to actually be an advance, so you really need to know the prior art. How much reading of the literature one should do is a different question, and depends on the topic, the field, and the researcher. But in my experience, researchers generally don’t explore the literature deep enough. One thing is for sure though; if the researcher ever makes the claim that no one has done this before, they better have the evidence to back that up.

  • A thesis is an argument

The word ‘thesis’ comes from Greek, and means an argument in the sense of putting forth a position. That means that there needs to be some element of controversy in the topic, and the thesis provides strong evidence supporting a particular position. That is, someone knowledgeable in the field could read the abstract and think, ‘no, I don’t believe it,’ but then change his or her mind after reading the whole thesis.

  • A thesis tells a story

People tend to forget that it’s a book. Its meant to be read, and in some sense, enjoyed. So the researcher should think about the reader. I don’t mean it should be silly or salacious, but it should be engaging, and the researcher should always consider whether they (or at least some people in the field) would want to read what they’d written.