Do you hear what I hear? The science of everyday sounds.

I became a professor last year, which is quite a big deal here. On April 17th, I gave my Inaugural lecture, which is a talk on my subject area to the general public. I tried to make it as interesting as possible, with sound effects, videos, a live experiment and even a bit of physical comedy. Here’s the video, and below I have a (sort of) transcript.

The Start

 

What did you just hear, what’s the weather like outside? Did that sound like a powerful, wet storm with rain, wind and thunder, or did it sound fake, was something not quite right? All you had was nearly identical, simple signals from each speaker, and you only received two simple, nearly identical signals, one to each ear.  Yet somehow you were able to interpret all the rich details, know what it was and assess the quality.

Over the next hour or so, we’ll investigate the research that links deep understanding of sound and sound perception to wonderful new audio technologies. We’ll look at how market needs in the commercial world are addressed by basic scientific advances. We will explore fundamental challenges about how we interact with the auditory world around us, and see how this leads to new creative artworks and disruptive innovations.

Sound effect synthesis

But first, lets get back to the storm sounds you heard. Its an example of a sound effect, like what might be used in a film. Very few of the sounds that you hear in film or TV, and more and more frequently, in music too, are recorded live on set or on stage.

Such sounds are sometimes created by what is known as Foley, named after Jack Foley, a sound designer working in film and radio from the late 1920s all the way to the early 1960s. In its simplest form, Foley is basically banging pots and pans together and sticking a microphone next to it. It also involves building mechanical contraptions to create all sorts of sounds. Foley sound designers are true artists, but its not easy, its expensive and time consuming. And the Foley studio today looks almost exactly the same as it did 60 years ago. The biggest difference is that the photos of the Foley studios are now in colour.

foley in the pastfoley today

But most sound effects come from sample libraries. These consist of tens or hundreds of thousands of high quality recordings. But they are still someone else’s vision of the sounds you might need. They’re never quite right. So sound designers either ‘make do’ with what’s there, or expend effort trying to shape them towards some desired sound. The designer doesn’t have the opportunity to do creative sound design. Reliance on pre-recorded sounds has dictated the workflow. The industry hasn’t evolved, we’re simply adapting old ways to new problems.

In contrast, digital video effects have reached a stunning level of realism, and they don’t rely on hundreds of thousands of stock photos, like the sound designers do with sample libraries. And animation is frequently created by specifying the scene and action to some rendering engine, without designers having to manipulate every little detail.

There might be opportunities for better and more creative sound design. Instead of a sound effect as a chunk of bits played out in sequence, conceptualise the sound generating mechanism, a procedure or recipe that when implemented, produces the desired sound. One can change the procedure slightly, shaping the sound. This is the idea behind sound synthesis. No samples need be stored. Instead, realistic and desired sounds can be generated from algorithms.

This has a lot of advantages. Synthesis can produce a whole range of sounds, like walking and running at any speed on any surface, whereas a sound effect library has only a finite number of predetermined samples. Synthesized sounds can play for any amount of time, but samples are fixed duration. Synthesis can have intuitive controls, like the enthusiasm of an applauding audience. And synthesis can create unreal or imaginary sounds that never existed in nature, a roaring dragon for instance, or Jedi knights fighting with light sabres..

Give this to sound designers, and they can take control, shape sounds to what they want. Working with samples is like buying microwave meals, cheap and easy, but they taste awful and there’s no satisfaction. Synthesis on the other hand, is like a home-cooked meal, you choose the ingredients and cook it the way you wish. Maybe you aren’t a fine chef, but there’s definitely satisfaction in knowing you made it.

This represents a disruptive innovation, changing the marketplace and changing how we do things. And it matters; not just to professional sound designers, but to amateurs and to the consumers, when they’re watching a film and especially, since we’re talking about sound, when they are listening to music, which we’ll come to later in the talk.

That’s the industry need, but there is some deep research required to address it. How do you synthesise sounds? They’re complex, with lots of nuances that we don’t fully understand. A few are easy, like these-

I just played that last one to get rid of the troublemakers in the audience.

But many of those are artificial or simple mechanical sounds. And the rest?

Almost no research is done in isolation, and there’s a community of researchers devising sound synthesis methods. Many approaches are intended for electronic music, going back to the work of Daphne Oram and Delia Derbyshire at the BBC Radiophonics Workshop, or the French Musique Concrete movement. But they don’t need a high level of realism. Speech synthesis is very advanced, but tailored for speech of course, and doesn’t apply to things like the sound of a slamming door. Other methods concentrate on simulating a particular sound with incredible accuracy. They construct a physical model of the whole system that creates the sound, and the sound is an almost incidental output of simulating the system. But this is very computational and inflexible.

And this is where we are today. The researchers are doing fantastic work on new methods to create sounds, but its not addressing the needs of sound designers.

Well, that’s not entirely true.

The games community has been interested in procedural audio for quite some time. Procedural audio embodies the idea of sound as a procedure, and involves looking at lightweight interactive sound synthesis models for use in a game. Start with some basic ingredients; noise, pulses, simple tones. Stir them together with the right amount of each, bake them with filters that bring out various pitches, add some spice and you start to get something that sounds like wind, or an engine or a hand clap. That’s the procedural audio approach.

A few tools have seen commercial use, but they’re specialised and integration of new technology in a game engine is extremely difficult. Such niche tools will supplement but not replace the sample libraries.

A few years ago, my research team demonstrated a sound synthesis model for engine and motor sounds. We showed that this simple software tool could be used by a sound designer to create a diverse range of sounds, and it could match those in the BBC sound effect library, everything from a handheld electric drill to a large boat motor.

 

This is the key. Designed right, one synthesis model can create a huge, diverse range of sounds. And this approach can be extended to simulate an entire effects library using only a small number of versatile models.

That’s what you’ve been hearing. Every sound sample you’ve heard in this talk was synthesised. Artificial sounds created and shaped in real-time. And they can be controlled and rendered in the same way that computer animation is performed. Watch this example, where the synthesized propeller sounds are driven by the scene in just the same way as the animation was.

It still needs work of course. You could hear lots of little mistakes, and the models missed details. And what we’ve achieved so far doesn’t scale. We can create hundred of sounds that one might want, but not yet thousands or tens of thousands.

But we know the way forward. We have a precious resource, the sound effect libraries themselves. Vast quantities of high quality recordings, tried and tested over decades. We can feed these into machine learning systems to uncover the features associated with every type of sound effect, and then train our models to find settings that match recorded samples.

We can go further, and use this approach to learn about sound itself. What makes a rain storm sound different from a shower? Is there something in common with all sounds that startle us, or all sounds that calm us? The same approach that hands creativity back to sound designers, resulting in wonderful new sonic experiences, can also tell us so much about sound perception.

Hot versus cold

I pause, say “I’m thirsty”. I have an empty jug and pretend to pour

Pretend to throw it at the audience.

Just kidding. That’s another synthesised sound. It’s a good example of this hidden richness in sounds. You knew it was pouring because the gesture helped, and there is an interesting interplay between our visual and auditory senses. You also heard bubbles, splashes, the ring of the container that its poured into. But do you hear more?

I’m going to run a little experiment. I have two sound samples, hot water being poured and cold water being poured. I want you to guess which is which.

Listen and try it yourself at our previous blog entry on the sound of hot and cold water.

I think its fascinating that we can hear temperature. There must be some physical phenomenon affecting the sound, which we’ve learned to associate with heat. But what’s really interesting is what I found when I looked online. Lots of people have discussed this. One argument goes ‘Cold water is more viscuous or sticky, and so it gives high pitched sticky splashes.’ That makes sense. But another argument states ‘There are more bubbles in a hot liquid, and they produce high frequency sounds.’

Wait, they can’t both be right. So we analysed recordings of hot and cold water being poured, and it turns out they’re both wrong! The same tones are there in both recordings, so essentially the same pitch. But the strengths of the tones are subtly different. Some sonic aspect is always present, but its loudness is a function of temperature. We’re currently doing analysis to find out why.

And no one noticed! In all the discussion, no one bothered to do a little critical analysis or an experiment. It’s an example of a faulty assumption, that because you can come up with a solution that makes sense, it should be the right one. And it demonstrates the scientific method; nothing is known until it is tested and confirmed, repeatedly.

Intelligent Music Production

Its amazing what such subtle changes can do, how they can indicate elements that one never associates with hearing. Audio production thrives on such subtle changes and there is a rich tradition of manipulating them to great effect. Music is created not just by the composer and performers. The sound engineer mixes and edits it towards some artistic vision. But phrasing the work of a mixing engineer as an art form is a double-edged sword, we aren’t doing justice to the technical challenges. The sound engineer is after all, an engineer.

In audio production, whether for broadcast, live sound, games, film or music, one typically has many sources. They each need to be heard simultaneously, but can all be created in different ways, in different environments and with different attributes. Some may mask each other, some may be too loud or too quiet. The final mix should have all sources sound distinct yet contribute to a nice clean blend of the sounds. To achieve this is very labour intensive and requires a professional engineer. Modern audio production systems help, but they’re incredibly complex and all require manual manipulation. As technology has grown, it has become more functional but not simpler for the user.

In contrast, image and video processing has become automated. The modern digital camera comes with a wide range of intelligent features to assist the user; face, scene and motion detection, autofocus and red eye removal. Yet an audio recording or editing device has none of this. It is essentially deaf; it doesn’t listen to the incoming audio and has no knowledge of the sound scene or of its intended use. There is no autofocus for audio!

Instead, the user is forced to accept poor sound quality or do a significant amount of manual editing.

But perhaps intelligent systems could analyse all the incoming signals and determine how they should be modified and combined. This has the potential to revolutionise music production, in effect putting a robot sound engineer inside every recording device, mixing console or audio workstation. Could this be achieved? This question gets to the heart of what is art and what is science, what is the role of the music producer and why we prefer one mix over another.

But unlike replacing sound effect libraries, this is not a big data problem. Ideally, we would get lots of raw recordings and the produced content that results. Then extract features from each track and the final mix in order to establish rules for how audio should be mixed. But we don’t have the data. Its not difficult to access produced content. But the initial multitrack recordings are some of the most highly guarded copyright material. This is the content that recording companies can use over and over, to create remixes and remastered versions. Even if we had the data, we don’t know the features to use and we don’t know how to manipulate those features to create a good mix. And mixing is a skilled craft. Machine learning systems are still flawed if they don’t use expert knowledge.

There’s a myth that as long as we get enough data, we can solve almost any problem. But lots of problems can’t be tackled this way. I thought weather prediction was done by taking all today’s measurements of temperature, humidity, wind speed, pressure… Then tomorrow’s weather could be guessed by seeing what happened the day after there were similar conditions in the past. But a meteorologist told me that’s not how it works. Even with all the data we have, its not enough. So instead we have a weather model, based on how clouds interact, how pressure fronts collide, why hurricanes form, and so on. We’re always running this physical model, and just tweaking parameters and refining the model as new data comes in. This is far more accurate than relying on mining big data.

You might think this would involve traditional signal processing, established techniques to remove noise or interference in recordings. Its true that some of what the sound engineer does is correct artifacts due to issues in the recording process. And there are techniques like echo cancellation, source separation and noise reduction that can address this. But this is only a niche part of what the sound engineer does, and even then the techniques have rarely been optimised for real world applications.

There’s also multichannel signal processing, where one usually attempts to extract information regarding signals that were mixed together, like acquiring a GPS signal buried in noise. But in our case, we’re concerned with how to mix the sources together in the first place. This opens up a new field which involves creating ways to manipulate signals to achieve a desired output. We need to identify multitrack audio features, related to the relationships between musical signals, and develop audio effects where the processing on any sound is dependent on the other sounds in the mix.

And there is little understanding of how we perceive audio mixes. Almost all studies have been restricted to lab conditions; like measuring the perceived level of a tone in the presence of background noise. This tells us very little about real world cases. It doesn’t say how well one can hear lead vocals when there are guitar, bass and drums.

Finally, best practices are not understood. We don’t know what makes a good mix and why one production will sound dull while another makes you laugh and cry, even though both are on the same piece of music, performed by competent sound engineers. So we need to establish what is good production, how to translate it into rules and exploit it within algorithms. We need to step back and explore more fundamental questions, filling gaps in our understanding of production and perception. We don’t know where the rules will be found, so multiple approaches need to be taken.

The first approach is one of the earliest machine learning methods, knowledge engineering. Its so old school that its gone out of fashion. It assumes experts have already figured things out, they are experts after all. So lets look at the sound engineering literature and work with experts to formalise their approach. Capture best practices as a set of rules and processes. But this is no easy task. Most sound engineers don’t know what they did. Ask a famous producer what he or she did on a hit song and you often get an answer like ‘I turned the knob up to 11 to make it sound phat.” How do you turn that into a mathematical equation? Or worse, they say it was magic and can’t be put into words.

To give you an idea, we had a technique to prevent acoustic feedback, that high pitched squeal you sometimes hear when a singer first approaches a microphone. We thought we had captured techniques that sound engineers often use, and turned it into an algorithm. To verify this, I was talking to an experienced live sound engineer and asked when was the last time he had feedback at one of the gigs where he ran the sound. ‘Oh, that never happens for me,’ he said. That seemed strange. I knew it was a common problem. ‘Really, never ever?’ ‘No, I know what I’m doing. It doesn’t happen.’ ‘Not even once?’ ‘Hmm, maybe once but its extremely rare.’ ‘Tell me about it.’ ‘Well, it was at the show I did last night…’! See, it’s a tricky situation. The sound engineer does have invaluable knowledge, but also has to protect their reputation as being one of a select few that know the secrets of the trade.

So we’re working with domain experts, generating hypotheses and formulating theories. We’ve been systematically testing all the assumptions about best practices and supplementing them with lots of listening tests. These studies help us understand how people perceive complex sound mixtures and identify attributes necessary for a good sounding mix. And we know the data will help. So we’re also curating multitrack audio, with detailed information about how it was recorded, often with multiple mixes and evaluations of those mixes.

By combining these approaches, my team have developed intelligent systems that automate much of the audio and music production process. Prototypes analyse all incoming sounds and manipulate them in much the same way a professional operates the controls at a mixing desk.

I didn’t realise at first the importance of this research. But I remember giving a talk once at a convention in a room that had panel windows all around. The academic talks are usually half full. But this time it was packed, and I could see faces outside all pressed up against the windows. They all wanted to find out about this idea of automatic mixing. Its  a unique opportunity for academic research to have transformational impact on an entire industry. It addresses the fact that music production technologies are often not fit for purpose. Intelligent mixing systems automate the technical and mundane, allowing sound engineers to work more productively and creatively, opening up new opportunities. Audio quality could be improved, amateur musicians can create high quality mixes of their content, small venues can put on live events without needing a professional engineer, time and preparation for soundchecks could be drastically reduced, and large venues and broadcasters could significantly cut manpower costs.

Its controversial. We once entered an automatic mix in a student recording competition as a sort of Turing Test. Technically, we were cheating, because all the mixes were supposed to be made by students, but in our case it was made by an ‘artificial intelligence’ created by a student. We didn’t win of course, but afterwards I asked the judges what they thought of the mix, and then told them how it was done. The first two were surprised and curious when I told them how it was done. But the third judge offered useful comments when he thought it was a student mix. But when I told him that it was an ‘automatic mix’, he suddenly switched and said it was rubbish and he could tell all along.

Mixing is a creative process where stylistic decisions are made. Is this taking away creativity, is it taking away jobs? Will it result in music sounding more the same? Such questions come up time and time again with new technologies, going back to 19th century protests by the Luddites, textile workers who feared that time spent on their skills and craft would be wasted as machines could replace their role in industry.

These are valid concerns, but its important to see other perspectives. A tremendous amount of audio production work is technical, and audio quality would be improved by addressing these problems. As the graffiti artist Banksy said;

“All artists are willing to suffer for their work. But why are so few prepared to learn to draw?” – BaNKSY

Girl-with-a-Balloon-by-Banksy

Creativity still requires technical skills. To achieve something wonderful when mixing music, you first have to achieve something pretty good and address issues with masking, microphone placement, level balancing and so on.

The real benefit is not replacing sound engineers. Its dealing with all those situations when a talented engineer is not available; the band practicing in the garage, the small pub or restaurant venue that does not provide any support, or game audio, where dozens of incoming sounds need to be mixed and there is no miniature sound guy living inside the games console.

High resolution audio

The history of audio production is one of continual innovation. New technologies arise to make the work easier, but artists also figure out how to use that technology in new creative ways. And the artistry is not the only element music producers care about. They’re interested, some would say obsessed, with fidelity. They want the music consumed at home to be as close as possible to the experience of hearing it live. But we consume digitial audio. Sound waves are transformed into bits and then transformed back to sound when we listen. We sample sound many times a second and render each sample with so many bits. Luckily, there is a very established theory on how to do the sampling.

We only hear frequencies up to about 20 kHz. That’s a wave which repeats 20,000 times a second. There’s a famous theorem by Claude Shannon and Harry Nyquist which states that you need twice that number of samples a second to fully represent a signal up to 20 kHz, so sample at 40,000 samples a second, or 40 kHz. So the standard music format, 16 bit samples and 44.1 kHz sampling rate, should be good enough.

Inaugural shared_Page_11

But most music producers want to work with higher quality formats and audio companies make equipment for recording and playing back audio in these high resolution formats. Some people swear they hear a difference, others say it’s a myth and people are fooling themselves. What’s going on? Is the sampling theorem, which underpins all signal processing, fundamentally wrong? Have we underestimated the ability of our own ears and in which case the whole field of audiology is flawed? Or could it be that the music producers and audiophiles, many of whom are renowned for their knowledge and artistry, are deluded?

Around the time I was wondering about this, I went to a dinner party and was sat across from a PhD student. His PhD was in meta-analysis, and he explained that it was when you gather all the data from previous studies on a question and do formal statistical analysis to come up with more definitive results than the original studies. It’s a major research method in evidence-based medicine, and every few weeks a meta-analysis makes headlines because it shows the effectiveness or lack of effectiveness of treatments.

So I set out to do a meta-analysis. I tried to find every study that ever looked at perception of high resolution audio, and get their data. I scoured every place they could have been published and asked everyone in the field, all around the world. One author literally found his old data tucked away in the back of a filing cabinet. Another couldn’t get permission to provide the raw data, but told me enough about it for me to write a little program that ran through all possible results until it found the details that would reproduce the summary data as well. In the end, I found 18 relevant studies and could get data from all of them except one. That was strange, since it was the most famous study. But the authors had ‘lost’ the data, and got angry with me when I asked them for details about the experiment.

The results of the meta-analysis were fascinating, and not at all what I expected. There were researchers who thought their data had or hadn’t shown an effect, but when you apply formal analysis, it’s the opposite. And a few experiments had major flaws. For instance, in one experiment many of the high resolution recordings were actually standard quality, which means there never was a difference to be perceived. In another, test subjects were given many versions of the same audio, including a direct live feed, and asked which sounds closer to live. People actually ranked the live feed as sounding least close to live, indicating they just didn’t know what to listen for.

As for the one study where the authors lost their data? Well, they had published some of it, but it basically went like this. 55 participants listened to many recordings many times and could not discriminate between high resolution and standard formats. But men discriminated more than women, older far more than younger listeners, audiophiles far more than nonexperts. Yet only 3 people ever guessed right more than 6 times out of 10. The chance of all this happening by luck if there really was no difference is less likely than winning the lottery. Its extremely unlikely even if there was a difference to be heard. Conclusion: they faked their data.

And this was the study which gave the most evidence that people couldn’t hear anything extra in high resolution recordings. In fact the studies with the most flaws were those that didn’t show an effect. Those that found an effect were generally more rigourous and took extra care in their design, set-up and analysis. This was counterintuitive. People are always looking for a new cure or a new effect. But in this case, there was a bias towards not finding a result. It seems researchers wanted to show that the claims of hearing a difference are false.

The biggest factor was training. Studies where subjects, even those experienced working with audio, just came in and were asked to state when two versions of a song were the same, rarely performed better than chance. But if they were told what to listen for, given examples, were told when they got it right or wrong, and then came back and did it under blind controlled conditions, they performed far better. All studies where participants were given training gave higher results than all studies where there was no training. So it seems we can hear a difference between standard and high resolution formats, we just don’t know what to listen for. We listen to music everyday, but we do it passively and rarely focus on recording quality. We don’t sit around listening for subtle differences in formats, but they are there and they can be perceived. To audiophiles, that’s a big deal.

In 2016 I published this meta-analysis in the Journal of the Audio Engineering Society, and it created a big splash. I had a lot of interviews in the press, and it was discussed on social media and internet forums. And that’s when I found out, people on the internet are crazy! I was accused of being a liar, a fraud, paid by the audio industry, writing press releases, working the system and pushing an agenda. These criticisms came from all sides, since differences were found which some didn’t think existed, but they also weren’t as strong as others wanted them to be. I was also accused of cherry-picking the studies, even though one of the goals of the paper was to avoid exactly that, which is why I included every study I could find.

But my favorite comment was when someone called me an ‘intellectually dishonest placebophile apologist’. Whoever wrote that clearly spent time and effort coming up with a convoluted insult.

It wasn’t just people online who were crazy. At an audio engineering society convention, two people were discussing the paper. One was a multi-grammy award winning mixing engineer and inventor, the other had a distinguished career as chief scientist at a major audio company.

What started as discussion escalated to heated argument, then shouting, then pushing and shoving. It was finally broken up when a famous mastering engineer intervened. I guess I should be proud of this.

I learned what most people already know, how very hard it is to change people’s minds once an opinion has been formed. And people rarely look at the source. Instead, they rely on biased opinions discussing that source. But for those interested in the issue whose minds were not already made up, I think the paper was useful.

I’m trying to figure out why we hear this difference. Its not due to problems with the high resolution audio equipment, that was checked in every study that found a difference. There’s no evidence that people have super hearing or that the sampling theorem is violated. But we need to remove all the high frequencies in a signal before we convert it to digital, even if we don’t hear them. That brings up another famous theorem, the uncertainty principle. In quantum mechanics, it tells us that we can’t resolve a particle’s position and momentum at the same time. In signal processing, it tells us that restricting a signal’s frequency content will make us less certain about its temporal aspects. When we remove those inaudible high frequencies, we smear out the signal. It’s a small effect, but this spreading the sound a tiny bit may be audible.

The End

The sounds around us shape our perception of the world. We saw that in films, games, music and virtual reality, we recreate those sounds or create unreal sounds to evoke emotions and capture the imagination. But there is a world of fascinating phenomena related to sound and perception that is not yet understood. Can we create an auditory reality without relying on recorded samples? Could a robot replace the sound engineer, should it? Investigating such questions has led to a deeper understanding of auditory perception, and has the potential to revolutionise sound design and music production.

What are the limits of human hearing? Do we make far greater use of auditory information than simple models can account for? And if so, can we feed this back for better audio production and sound design?

Inaugural shared_Page_13

To answer these questions, we need to look at the human auditory system. Sound waves are transferred to the inner ear, which contains one of the most amazing organs in the human body, the cochlea. 3,500 inner hair cells line the cochlea, and resonate in response to frequencies across the audible range. These hair cells connect to a nerve string containing 30,000 neurons which can fire 600 pulses a second. So the brainstem receives up to 18 million pulses per second. Hence the cochlea is a very high resolution frequency analyser with digital outputs. Audio engineers would pay good money for that sort of thing, and we have two of them, free, inside our heads!

The pulses carry frequency and temporal information about sounds. This is sent to the brain’s auditory cortex, where hearing sensations are stored as aural activity images. They’re compared with previous aural activity images, other sensory images and overall context to get an aural scene representing the meaning of hearing sensations. This scene is made available to other processes in the brain, including thought processes such as audio assessment. It’s all part of 100 billion brain cells with 500 trillion connections, a massively powerful machine to manage body functions, memory and thinking.

These connections can be rewired based on experiences and stimuli. We have the power to learn new ways to process sounds. The perception is up to us. Like we saw with hot and cold water sounds, with perception of sound effects and music production, with high resolution audio, we have the power to train ourselves to perceive the subtlest aspects. Nothing is stopping us from shaping and appreciating a better auditory world.

Credits

All synthesised sounds created using FXive.

Sound design by Dave Moffat.

Synthesised sounds by Thomas Vassallo, Parham Bahadoran, Adan Benito and Jake Lee

Videos by Enrique Perez Gonzalez (automatic mixing) and Rod Selfridge (animation).

Special thanks to all my current and former students and researchers, collaborators and colleagues. See the video for the full list.

And thanks to my lovely wife Sabrina and daughter Eliza.

Advertisements

Sampling the sampling theorem: a little knowledge is a dangerous thing

In 2016, I published a paper on perception of differences between standard resolution audio (typically 16 bit, 44.1 kHz) and high resolution audio formats (like 24 bit, 96 kHz). It was a meta-analysis, looking at all previous studies, and showed strong evidence that this difference can be perceived. It also did not find evidence that this difference was due to high bit depth, distortions in the equipment, or golden ears of some participants.

The paper generated a lot of discussion, some good and some bad. One argument presented many times as to why its overall conclusion must be wrong (its implied here, here and here, for instance) basically goes like this;

We can’t hear above 20 kHz. The sampling theorem says that we need to sample at twice the bandwidth to fully recover the signal. So a bit beyond 40 kHz should be fully sufficient to render audio with no perceptible difference from the original signal.

But one should be very careful when making claims regarding the sampling theorem. It states that all information in a bandlimited signal is completely represented by sampling at twice the bandwidth (the Nyquist rate). It further implies that the continuous time bandlimited signal can be perfectly reconstructed by this sampled signal.

For that to mean that there is no audible difference between 44.1 kHz (or 48 kHz) sampling and much higher sample rate formats (leaving aside reproduction equipment), there are a few important assumptions;

  1. Perfect brickwall filter to bandlimit the signal
  2. Perfect reconstruction filter to recover the bandlimited signal
  3. No audible difference whatsoever between the original full bandwidth signal and the bandlimited 48 kHz signal.

The first two are generally not true in practice, especially with lower sample rates. Though we can get very good performance by oversampling in the analog to digital and digital to analog converters, but they are not perfect. There may still be some minute pass-band ripple or some very low amplitude signal outside the pass-band, resulting in aliasing. But many modern high quality A/D and D/A converters and some sample rate converters are high performance, so their impact may be small.

But the third assumption is an open question and could make a big difference. The problem arises from another very important theorem, the uncertainty principle. Though first derived by Heisenberg for quantum mechanics, Gabor showed that it exists as a purely mathematical concept. The more localised a signal is in frequency, the less localised it is in time. For instance, a pure impulse (localised in time) has content over all frequencies. Bandlimiting this impulse spreads the signal in time.

For instance, consider filtering an impulse to retain only frequency content below 20 kHz. We will use the matlab function IFIR (Interpolated FIR filter), which is a high performance design. We aim for low passband ripple (<0.01 dB) up to 20 kHz and 120 dB stopband attenuation starting at 22.05, 24, or 48 kHz, corresponding to 44.1 kHz, 48 kHz or 96 kHz sample rates. You can see excellent behaviour in the magnitude response below.

mag response

The impulse response also looks good, but now the original impulse has become smeared in time. This is an inevitable consequence of the uncertainty principle.

impulse response

Still, on the surface this may not be so problematic. But we perceive loudness on a logarithmic scale. So have a look at this impulse response on a decibel scale.

impulse response db

The 44.1 and 48 kHz filters spread energy over 1 msec or more, but the 96 kHz filter keeps most energy within 100 microseconds. And this is a particularly good filter, without considering quantization effects or the additional reconstruction (anti-imaging) filter required for analog output. Note also that all of this frequency content has already been bandlimited, so its almost entirely below 20 kHz.

One millisecond still isn’t very much. However, this lack of high frequency content has affected the temporal fine structure of the signal, and we know a lot less about how we perceive temporal information than how we perceive frequency content. This is where psychoacoustic studies in the field of auditory neuroscience come into play. They’ve approached temporal resolution from very different perspectives. Abel found that we can distinguish temporal gaps in sound of only 0.4 ms, and Wiegrebe’s study suggested a resolution of 0.72 ms. Studies by Wiegrebe (same paper), Lotze and Aiba all suggested that we can distinguish between a single click and a closely spaced pair of clicks when the gap between the pair of clicks is below one millisecond. And a study by Henning suggested that we can distinguish the ordering of a high amplitude and low amplitude click when the spacing between them is only about one fifth of a millisecond.

All of these studies should be taken with a grain of salt. Some are quite old, and its possible there may have been issues with the audio set-up. Furthermore, they aren’t directly testing the audibility of anti-alias filters. But its clear that they indicate that the time domain spread of energy in transient sounds due to filtering might be audible.

Big questions still remain. In the ideal scenario, the only thing missing after bandlimiting a signal is the high frequency content, which we shouldn’t be able to hear. So what really is going on?

By the way, I recommend reading Shannon’s original papers on the sampling theorem and other subjects. They’re very good and a joy to read. Shannon was a fascinating character. I read his Collected Papers, and off the top of my head, it included inventing the rocket powered Frisbee, the gasoline powered pogo stick, a calculator that worked using roman numerals (wonderfully named THROBAC, for Thrifty Roman numerical BACkward looking computer), and discovering the fundamental equation of juggling. He also built a robot mouse to compete against real mice, inspired by classic psychology experiments where a mouse was made to find its way out of a maze.

Nyquist’s papers aren’t so easy though, and feel a bit dated.

  • S. M. Abel, “Discrimination of temporal gaps,” Journal of the Acoustical Society of America, vol. 52, 1972.
  • E. Aiba, M. Tsuzaki, S. Tanaka, and M. Unoki, “Judgment of perceptual synchrony between two pulses and verification of its relation to cochlear delay by an auditory model,” Japanese Psychological Research, vol. 50, 2008.
  • Gabor, D (1946). Theory of communication. Journal of the Institute of Electrical Engineering 93, 429–457
  • G. B. Henning and H. Gaskell, “Monaural phase sensitivity with Ronken’s paradigm,” Journal of the Acoustical Society of America, vol. 70, 1981.
  • M. Lotze, M. Wittmann, N. von Steinbüchel, E. Pöppel, and T. Roenneberg, “Daily rhythm of temporal resolution in the auditory system,” Cortex, vol. 35, 1999.
  • Nyquist, H. (April 1928). “Certain topics in telegraph transmission theory“. Trans. AIEE. 47: 617–644.
  • J. D. Reiss, ‘A meta-analysis of high resolution audio perceptual evaluation,’ Journal of the Audio Engineering Society, vol. 64 (6), June 2016.
  • Shannon, Claude E. (January 1949). “Communication in the presence of noise“. Proceedings of the Institute of Radio Engineers. 37 (1): 10–21
  • L. Wiegrebe and K. Krumbholz, “Temporal resolution and temporal masking properties of transient stimuli: Data and an auditory model,” J. Acoust. Soc. Am., vol. 105, pp. 2746-2756, 1999.

The future of microphone technology

We recently had a blog entry about the Future of Headphones. Today, we’ll look at another ubiquitous piece of audio equipment, the microphone, and what technological revolutions are on the horizon.

Its not a new technology, but the Eigenmike is deserving of attention. First released around 2010 by mh acoustics (their website and other searches don’t reveal much historical information), the Eigenmike is a microphone array composed of 32 high quality microphones positioned on the surface of a rigid sphere. Outputs of the individual microphones are combined to capture the soundfield. By beamforming, the soundfield can be steered and aimed in a desired direction.

fig-eigenmike-300x284The Eigenmike

This and related technologies (Core Sound’s TetraMic, Soundfield’s MKV, Sennheiser’s Ambeo …) are revolutionising high-end soundfield recording. Enda Bates has a nice blog entry about them, and they were formally evaluated in two AES papers, Comparing Ambisonic Microphones Part 1 and Part 2.

Soundskrit is TandemLaunch’s youngest incubated venture, based on research by Ron Miles and colleagues from the University of Binghampton. Tandem Launch, by the way, create companies often arising from academic research, and previously invested in research arising from the audio engineering research team behind this blog.

Jian Zhou and Ron Miles were inspired by the manner in which insects ‘hear’ with their hairs. They devised a method to record audio by sensing changes in airflow velocity rather than pressure. Spider silk is thin enough that it moves with the air when hit by sound waves, even for infrasound frequencies. To translate this movement into an electronic signal, they coated the spider silk with gold and put it in a magnetic field. Almost any fiber that is thin enough could be used in the same way, and different approaches could be applied for transduction. This new approach is intrinsically directional and may have a frequency response far superior to competing directional solutions.

MEMS (MicroElectrical-Mechanical System) microphones usually involve a pressure-sensitive diaphragm etched directly into a silicon wafer. The Soundskrit team is currently focused on developing a MEMs compatible design so that it could be used in a wide variety of devices and applications where directional recording is needed.

Another start-up aiming to revolutionise MEMS technology is Vesper .  Vesper MEMS was developed by founders Bobby Littrell and Karl Grosh at the University of Michigan. It uses piezoelectric materials which produce a voltage when subjected to pressure. This approach can achieve a superior signal-to-noise ratio over the capacitive MEMS microphones that currently dominate the market.

A few years ago, graphene-based microphones were receiving a lot of attention, In 2014, Dejan Todorovic and colleagues investigated the feasibility of graphene as a microphone membrane, and simulations suggested that it could have high sensitivity (the voltage generated in response to a pressure input) over a wide frequency range, far better than conventional microphones. Later that year, Peter Gaskell and others from McGill University performed physical and acoustical measurements of graphene oxide which confirmed Todorovic’s simulation results. But they seemed unaware of Todorovic’s work, despite both groups publishing at AES Conventions.

Gaskell and colleagues went on to commercialise graphene-based loudspeakers, as we discussed previously. But the Todorovic team continued research on graphene  microphones, apparently to great success.

But I haven’t yet found out about any further developments from this group. However, researchers from Kyungpook National University in Korea just recently reported a high sensitivity hearing aid microphone that uses a graphene-based diaphragm.

 

For a bit of fun, check out Catchbox, which bills itself as the ‘the World’s First Soft Throwable Microphone.’ Its not exactly a technological revolution, though their patent pending Automute relates a bit to the field of Automatic Mixing. But I can think of a few meetings that would have been livened up by having this around.

As previously when I’ve discussed commercial technologies, a disclaimer is needed. This blog is not meant as an endorsement of any of the mentioned companies. I haven’t tried their products. They are a sample of what is going on at the frontiers of microphone technology, but by no means cover the full range of exciting developments. In fact, since many of the technological advances are concerned with microphone array processing (source separation, localisation, beam forming and so on) as in some of our own contributions, this blog entry is really only giving you a taste of one exciting direction of research. But these technologies will surely change the way we capture sound in the near future.

Some of our own contributions to microphone technology, mainly on the signal processing and evaluation side of things, are listed below;

  1. L. Wang, J. D. Reiss and A. Cavallaro, ‘Over-Determined Source Separation and Localization Using Distributed Microphones,’ IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 24 (9), 2016.
  2. L. Wang, T. K. Hon, J. D. Reiss and A. Cavallaro, ‘An Iterative Approach to Source Counting and Localization Using Two Distant Microphones,’ IEEE/ACM Transactions on Audio, Speech and Language Processing, 24 (6), June 2016.
  3. L. Wang, T. K. Hon, J. D. Reiss and A. Cavallaro, ‘Self-Localization of Ad-hoc Arrays Using Time Difference of Arrivals,’ IEEE Transactions on Signal Processing, 64 (4), Feb., 2016.
  4. T. K. Hon, L. Wang, J. D. Reiss and A. Cavallaro, ‘Audio Fingerprinting for Multi-Device Self-Localisation,’ IEEE Transactions on Audio, Speech and Language Processing, 23 (10), p. 1623-1636, 2015.
  5. E. K. Kokkinis, J. D. Reiss and J. Mourjopoulos, “A Wiener Filter Approach to Microphone Leakage Reduction in Close-Microphone Applications,” IEEE Transactions on Audio, Speech, and Language Processing, V.20 (3), p.767-79, 2012.
  6. T-K. Hon, L. Wang, J. D. Reiss and A. Cavallaro, ‘Fine landmark-based synchronization of ad-hoc microphone arrays,’ 23rd European Signal Processing Conference (EUSIPCO), p. 1341-1345, Nice, France, 2015.
  7. B. De Man and J. D. Reiss, “A Pairwise and Multiple Stimuli Approach to Perceptual Evaluation of Microphone Types,” 134th AES Convention, Rome, May, 2013.
  8. A. Clifford and J. D. Reiss, Proximity effect detection for directional microphones , 131st AES Convention, New York, p. 1-7, Oct. 20-23, 2011
  9. A. Clifford and J. D. Reiss, Microphone Interference Reduction in Live Sound, Proc. of the 14th Int. Conference on Digital Audio Effects (DAFx-11), Paris, p. 2-9, Sept 19-23, 2011
  10. E. Kokkinis, J. D. Reiss and J. Mourjopoulos, Detection of ‘solo intervals’ in multiple microphone multiple source audio applications, AES 130th Convention, May 2011.
  11. C. Uhle and J. D. Reiss, “Determined Source Separation for Microphone Recordings Using IIR Filters,” 129th AES Convention, San Francisco, Nov. 4-7, 2010.

 

Our meta-analysis wins best JAES paper 2016!

Last year, we published an Open Access article in the Journal of the Audio Engineering Society (JAES) on “A meta-analysis of high resolution audio perceptual evaluation.”

JAES_V64_6_ALL

I’m very pleased and proud to announce that this paper won the award for best JAES paper for the calendar year 2016.

We discussed the research a little bit while it was ongoing, and then in more detail soon after publication. The research addressed a contentious issue in the audio industry. For decades, professionals and enthusiasts have engaged in heated debate over whether high resolution audio (beyond CD quality) really makes a difference. So I undertook a meta-analysis to assess the ability to perceive a difference between high resolution and standard CD quality audio. Meta-analysis is a popular technique in medical research, but this may be the first time that its been formally applied to audio engineering and psychoacoustics. Results showed a highly significant ability to discriminate high resolution content in trained subjects that had not previously been revealed. With over 400 participants in over 12,500 trials, it represented the most thorough investigation of high resolution audio so far.

Since publication, this paper was covered broadly across social media, popular press and trade journals. Thousands of comments were made on forums, with hundreds of thousands of reads.

Here’s one popular independent youtube video discussing it.

and an interview with Scientific American about it,

and some discussion of it in this article for Forbes magazine (which is actually about the lack of a headphone jack in the iPhone 7).

But if you want to see just how angry this research made people, check out the discussion on hydrogenaudio. Wow, I’ve never been called an intellectually dishonest placebophile apologist before 😉 .

In fact, the discussion on social media was full of misinformation, so I’ll try and clear up a few things here;

When I first started looking into this subject , it became clear that potential issues in the studies was a problem. One option would have been to just give up, but then I’d be adding no rigour to a discussion because I felt it wasn’t rigourous enough. Its the same as not publishing because you don’t get a significant result, only now on a meta scale. And though I did not have a strong opinion either way as to whether differences could be perceived, I could easily be fooling myself. I wanted to avoid any of my own biases or judgement calls. So I set some ground rules.

  • I committed to publishing all results, regardless of outcome.
  • A strong motivation for doing the meta-analysis was to avoid cherry-picking studies. So I included all studies for which there was sufficient data for them to be used in meta-analysis.  Even if I thought a study was poor, its conclusions seemed flawed or it disagreed with my own conceptions, if I could get the minimal data to do meta-analysis, I included it. I then discussed potential issues.
  • Any choices regarding analysis or transformation of data was made a priori, regardless of the result of that choice, in an attempt to minimize any of my own biases influencing the outcome.
  • I did further analysis to look at alternative methods of study selection and representation.

I found the whole process of doing a meta-analysis in this field to be fascinating. In audio engineering and psychoacoustics, there are a wealth of studies investigating big questions, and I hope others will use similar approaches to gain deeper insights and perhaps even resolve some issues.

Exciting research at the upcoming Audio Engineering Society Convention

aes143

About five months ago, we previewed the last European Audio Engineering Society Convention, which we followed with a wrap-up discussion. The next AES  convention is just around the corner, October 18 to 21st in New York. As before, the Audio Engineering research team here aim to be quite active at the convention.

These conventions are quite big, with thousands of attendees, but not so large that you get lost or overwhelmed. Away from the main exhibition hall is the Technical Program, which includes plenty of tutorials and presentations on cutting edge research.

So here, we’ve gathered together some information about a lot of the events that we will be involved in, attending, or we just thought were worth mentioning. And I’ve gotta say, the Technical Program looks amazing.

Wednesday

One of the first events of the Convention is the Diversity Town Hall, which introduces the AES Diversity and Inclusion Committee. I’m a firm supporter of this, and wrote a recent blog entry about female pioneers in audio engineering. The AES aims to be fully inclusive, open and encouraging to all, but that’s not yet fully reflected in its activities and membership. So expect to see some exciting initiatives in this area coming soon.

In the 10:45 to 12:15 poster session, Steve Fenton will present Alternative Weighting Filters for Multi-Track Program Loudness Measurement. We’ve published a couple of papers (Loudness Measurement of Multitrack Audio Content Using Modifications of ITU-R BS.1770, and Partial loudness in multitrack mixing) showing that well-known loudness measures don’t correlate very well with perception when used on individual tracks within a multitrack mix, so it would be interesting to see what Steve and his co-author Hyunkook Lee found out. Perhaps all this research will lead to better loudness models and measures.

At 2 pm, Cleopatra Pike will present a discussion and analysis of Direct and Indirect Listening Test Methods. I’m often sceptical when someone draws strong conclusions from indirect methods like measuring EEGs and reaction times, so I’m curious what this study found and what recommendations they propose.

The 2:15 to 3:45 poster session will feature the work with probably the coolest name, Influence of Audience Noises on the Classical Music Perception on the Example of Anti-cough Candies Unwrapping Noise. And yes, it looks like a rigorous study, using an anechoic chamber to record the sounds of sweets being unwrapped, and the signal analysis is coupled with a survey to identify the most distracting sounds. It reminds me of the DFA faders paper from the last convention.

At 4:30, researchers from Fraunhofer and the Technical University of Ilmenau present Training on the Acoustical Identification of the Listening Position in a Virtual Environment. In a recent paper in the Journal of the AES, we found that training resulted in a huge difference between participant results in a discrimination task, yet listening tests often employ untrained listeners. This suggests that maybe we can hear a lot more than what studies suggest, we just don’t know how to listen and what to listen for.

Thursday

If you were to spend only one day this year immersing yourself in frontier audio engineering research, this is the day to do it.

At 9 am, researchers from Harman will present part 1 of A Statistical Model that Predicts Listeners’ Preference Ratings of In-Ear Headphones. This was a massive study involving 30 headphone models and 71 listeners under carefully controlled conditions. Part 2, on Friday, focuses on development and validation of the model based on the listening tests. I’m looking forward to both, but puzzled as to why they weren’t put back-to-back in the schedule.

At 10 am, researchers from the Tokyo University of the Arts will present Frequency Bands Distribution for Virtual Source Widening in Binaural Synthesis, a technique which seems closely related to work we presented previously on Cross-adaptive Dynamic Spectral Panning.

From 10:45 to 12:15, our own Brecht De Man will be chairing and speaking in a Workshop on ‘New Developments in Listening Test Design.’ He’s quite a leader in this field, and has developed some great software that makes the set up, running and analysis of listening tests much simpler and still rigorous.

In the 11-12:30 poster session, Nick Jillings will present Automatic Masking Reduction in Balance Mixes Using Evolutionary Computing, which deals with a challenging problem in music production, and builds on the large amount of research we’ve done on Automatic Mixing.

At 11:45, researchers from McGill will present work on Simultaneous Audio Capture at Multiple Sample Rates and Formats. This helps address one of the challenges in perceptual evaluation of high resolution audio (and see the open access journal paper on this), ensuring that the same audio is used for different versions of the stimuli, with only variation in formats.

At 1:30, renowned audio researcher John Vanderkooy will present research on how a  loudspeaker can be used as the sensor for a high-performance infrasound microphone. In the same session at 2:30, researchers from Plextek will show how consumer headphones can be augmented to automatically perform hearing assessments. Should we expect a new audiometry product from them soon?

At 2 pm, our own Marco Martinez Ramirez will present Analysis and Prediction of the Audio Feature Space when Mixing Raw Recordings into Individual Stems, which applies machine learning to challenging music production problems. Immediately following this, Stephen Roessner discusses a Tempo Analysis of Billboard #1 Songs from 1955–2015, which builds partly on other work analysing hit songs to observe trends in music and production tastes.

At 3:45, there is a short talk on Evolving the Audio Equalizer. Audio equalization is a topic on which we’ve done quite a lot of research (see our review article, and a blog entry on the history of EQ). I’m not sure where the novelty is in the author’s approach though, since dynamic EQ has been around for a while, and there are plenty of harmonic processing tools.

At 4:15, there’s a presentation on Designing Sound and Creating Soundscapes for Still Images, an interesting and unusual bit of sound design.

Friday

Judging from the abstract, the short Tutorial on the Audibility of Loudspeaker Distortion at Bass Frequencies at 5:30 looks like it will be an excellent and easy to understand review, covering practice and theory, perception and metrics. In 15 minutes, I suppose it can only give a taster of what’s in the paper.

There’s a great session on perception from 1:30 to 4. At 2, perceptual evaluation expert Nick Zacharov gives a Comparison of Hedonic and Quality Rating Scales for Perceptual Evaluation. I think people often have a favorite evaluation method without knowing if its the best one for the test. We briefly looked at pairwise versus multistimuli tests in previous work, but it looks like Nick’s work is far more focused on comparing methodologies.

Immediately after that, researchers from the University of Surrey present Perceptual Evaluation of Source Separation for Remixing Music. Techniques for remixing audio via source separation is a hot topic, with lots of applications whenever the original unmixed sources are unavailable. This work will get to the heart of which approaches sound best.

The last talk in the session, at 3:30 is on The Bandwidth of Human Perception and its Implications for Pro Audio. Judging from the abstract, this is a big picture, almost philosophical discussion about what and how we hear, but with some definitive conclusions and proposals that could be useful for psychoacoustics researchers.

Saturday

Grateful Dead fans will want to check out Bridging Fan Communities and Facilitating Access to Music Archives through Semantic Audio Applications in the 9 to 10:30 poster session, which is all about an application providing wonderful new experiences for interacting with the huge archives of live Grateful Dead performances.

At 11 o’clock, Alessia Milo, a researcher in our team with a background in architecture, will discuss Soundwalk Exploration with a Textile Sonic Map. We discussed her work in a recent blog entry on Aural Fabric.

In the 2 to 3:30 poster session, I really hope there will be a live demonstration accompanying the paper on Acoustic Levitation.

At 3 o’clock, Gopal Mathur will present an Active Acoustic Meta Material Loudspeaker System. Metamaterials are receiving a lot of deserved attention, and such advances in materials are expected to lead to innovative and superior headphones and loudspeakers in the near future.

 

The full program can be explored on the Convention Calendar or the Convention website. Come say hi to us if you’re there! Josh Reiss (author of this blog entry), Brecht De Man, Marco Martinez and Alessia Milo from the Audio Engineering research team within the Centre for Digital Music  will all be there.
 

 

The future of headphones

headphonememe

Headphones have been around for over a hundred years, but recently there has been a surge in new technologies, spurred on in part by the explosive popularity of Beats headphones. In this blog, we will look at three advances in headphones arising from high tech start-ups. I’ve been introduced to each of these companies recently, but don’t have any affiliation with them.

EAVE (formerly Eartex) are a London-based company, who have developed headphones aimed at the industrial workplace; construction sites, the maritime industry… Typical ear defenders do a good job of blocking out noise, but make communication extremely difficult. EAVE’s headphones are designed to protect from excessive noise, yet still allow effective communication with others. One of the founders, David Greenberg, has a background in auditory neuroscience, focusing on hearing disorders. He brought that knowledge to the company. He used his knowledge of hearing aids to design headphones that amplify speech while attenuating noise sources. They are designed for use in existing communication networks, and use beam forming microphones to focus the microphone on the speaker’s voice. They also have sensors to monitor noise levels so that noise maps can be created and personal noise exposure data can be gathered.

This use of additional sensors in the headset opens up lots of opportunities. Ossic are a company that emerged from Abbey Road Red, the start-up incubator established by the legendary Abbey Road Studios. Their headphone is packed with sensors, measuring the shape of your ears, head and torso. This allows them to estimate your own head-related transfer function, or HRTF, which describes how sounds are filtered as they travel from to your ear canal. They can then apply this filtering to the headphone output, allowing sounds to be far more accurately placed around you. Without HRTF filtering, sources always appear to be coming from inside your head.

Its not as simple as that of course. For instance, when you move your head, you can still identify the direction of arrival of different sound sources. So the Ossic headphones also incorporate head tracking. And a well-measured HRTF is essential for accurate localization, but calibration to the ear is not perfect. So their headphones also have eight drivers rather than the usual two, allowing more careful positioning of sounds over a wide range of frequencies.

Ossic was funded by a Kickstarter campaign. Another headphone start-up, Ora, currently has a Kickstarter campaign. Ora is a venture that was founded at Tandem Launch, who create companies often arising from academic research, and have previously invested in research arising from the audio engineering research team behind this blog.

Ora aim to release ‘the world’s first graphene headphones.’ Graphene is a form of carbon, shaped in a one atom thick lattice of hexagons. In 2004, Andre Geim and Konstantin Novoselov of the University of Manchester, isolated the material, analysed its properties, and showed how it could be easily fabricated, for which they won the Nobel prize in 2010. Andre Geim, by the way, is a colourful character, and the only person to have won both the Nobel and Ig Nobel prizes, the latter awarded for experiments involving levitating frogs.

graphene-headGraphene

Graphene has some amazing properties. Its 200 times stronger than the strongest steel, efficiently conducts heat and electricity and is nearly transparent. In 2013, Zhou and Zettl published early results on a graphene-based loudspeaker. In 2014, Dejan Todorovic and colleagues investigated the feasibility of graphene as a microphone membrane, and simulations suggested that it could have high sensitivity (the voltage generated in response to a pressure input) over a wide frequency range, far better than conventional microphones. Later that year, Peter Gaskell and others from McGill University performed physical and acoustical measurements of graphene oxide which confirmed Todorovic’s simulation results. Interestingly, they seemed unaware of Todorovic’s work.

graphene_speaker_640Graphene loudspeaker, courtesy Zettl Research Group, Lawrence Berkeley National Laboratory and University of California at Berkeley

Ora’s founders include some of the graphene microphone researchers from McGill University. Ora’s headphone uses a Graphene-based composite material optimized for use in acoustic transducers. One of the many benefits is the very wide frequency range, making it an appealing choice for high resolution audio reproduction.

I should be clear. This blog is not meant as an endorsement of any of the mentioned companies. I haven’t tried their products. They are a sample of what is going on at the frontiers of headphone technology, but by no means cover the full range of exciting developments. Still, one thing is clear. High-end headphones in the near future will sound very different from the typical consumer headphones around today.

Cool stuff at the Audio Engineering Society Convention in Berlin

aesberlin17_IDS_headerThe next Audio Engineering Society convention is just around the corner, May 20-23 in Berlin. This is an event where we always have a big presence. After all, this blog is brought to you by the Audio Engineering research team within the Centre for Digital Music, so its a natural fit for a lot of what we do.

These conventions are quite big, with thousands of attendees, but not so big that you get lost or overwhelmed. The attendees fit loosely into five categories: the companies, the professionals and practitioners, students, enthusiasts, and the researchers. That last category is where we fit.

I thought I’d give you an idea of some of the highlights of the Convention. These are some of the events that we will be involved in or just attending, but of course, there’s plenty else going on.

On Saturday May 20th, 9:30-12:30, Dave Ronan from the team here will be presenting a poster on ‘Analysis of the Subgrouping Practices of Professional Mix Engineers.’ Subgrouping is a greatly understudied, but important part of the mixing process. Dave surveyed 10 award winning mix engineers to find out how and why they do subgrouping. He then subjected the results to detailed thematic analysis to uncover best practices and insights into the topic.

2:45-4:15 pm there is a workshop on ‘Perception of Temporal Response and Resolution in Time Domain.’ Last year we published an article in the Journal of the Audio Engineering Society  on ‘A meta-analysis of high resolution audio perceptual evaluation.’ There’s a blog entry about it too. The research showed very strong evidence that people can hear a difference between high resolution audio and standard, CD quality audio. But this brings up the question of why? Many people have suggested that the fine temporal resolution of oversampled audio might be perceived. I expect that this Workshop will shed some light on this as yet unresolved question.

Overlapping that workshop, there are some interesting posters from 3 to 6 pm. ‘Mathematical Model of the Acoustic Signal Generated by the Combustion Engine‘ is about synthesis of engine sounds, specifically for electric motorbikes. We are doing a lot of sound synthesis research here, and so are always on the lookout for new approaches and new models. ‘A Study on Audio Signal Processed by “Instant Mastering” Services‘ investigates the effects applied to ten songs by various online, automatic mastering platforms. One of those platforms, LandR, was a high tech spin-out from our research a few years ago, so we’ll be very interested in what they found.

For those willing to get up bright and early Sunday morning, there’s a 9 am panel on ‘Audio Education—What Does the Future Hold,’ where I will be one of the panellists. It should have some pretty lively discussion.

Then there’s some interesting posters from 9:30 to 12:30. We’ve done a lot of work on new interfaces for audio mixing, so will be quite interested in ‘The Mixing Glove and Leap Motion Controller: Exploratory Research and Development of Gesture Controllers for Audio Mixing.’ And returning to the subject of high resolution audio, there is ‘Discussion on Subjective Characteristics of High Resolution Audio,’ by Mitsunori Mizumachi. Mitsunori was kind enough to give me details about his data and experiments in hi-res audio, which I then used in the meta-analysis paper. He’ll also be looking at what factors affect high resolution audio perception.

From 10:45 to 12:15, our own Brecht De Man will be chairing and speaking in a Workshop on ‘New Developments in Listening Test Design.’ He’s quite a leader in this field, and has developed some great software that makes the set up, running and analysis of listening tests much simpler and still rigorous.

From 1 to 2 pm, there is the meeting of the Technical Committee on High Resolution Audio, of which I am co-chair along with Vicki Melchior. The Technical Committee aims for comprehensive understanding of high resolution audio technology in all its aspects. The meeting is open to all, so for those at the Convention, feel free to stop by.

Sunday evening at 6:30 is the Heyser lecture. This is quite prestigious, a big talk by one of the eminent people in the field. This one is given by Jorg Sennheiser of, well, Sennheiser Electronic.

Monday morning 10:45-12:15, there’s a tutorial on ‘Developing Novel Audio Algorithms and Plugins – Moving Quickly from Ideas to Real-time Prototypes,’ given by Mathworks, the company behind Matlab. They have a great new toolbox for audio plugin development, which should make life a bit simpler for all those students and researchers who know Matlab well and want to demo their work in an audio workstation.

Again in the mixing interface department, we look forward to hearing about ‘Formal Usability Evaluation of Audio Track Widget Graphical Representation for Two-Dimensional Stage Audio Mixing Interface‘ on Tuesday, 11-11:30. The authors gave us a taste of this work at the Workshop on Intelligent Music Production which our group hosted last September.

In the same session – which is all about ‘Recording and Live Sound‘ so very close to home – a new approach to acoustic feedback suppression is discussed in ‘Using a Speech Codec to Suppress Howling in Public Address Systems‘, 12-12:30. With several past projects on gain optimization for live sound, we are curious to hear (or not hear) the results!

The full program can be explored on the AES Convention planner or the Convention website. Come say hi to us if you’re there!