The future of headphones

headphonememe

Headphones have been around for over a hundred years, but recently there has been a surge in new technologies, spurred on in part by the explosive popularity of Beats headphones. In this blog, we will look at three advances in headphones arising from high tech start-ups. I’ve been introduced to each of these companies recently, but don’t have any affiliation with them.

EAVE (formerly Eartex) are a London-based company, who have developed headphones aimed at the industrial workplace; construction sites, the maritime industry… Typical ear defenders do a good job of blocking out noise, but make communication extremely difficult. EAVE’s headphones are designed to protect from excessive noise, yet still allow effective communication with others. One of the founders, David Greenberg, has a background in auditory neuroscience, focusing on hearing disorders. He brought that knowledge to the company. He used his knowledge of hearing aids to design headphones that amplify speech while attenuating noise sources. They are designed for use in existing communication networks, and use beam forming microphones to focus the microphone on the speaker’s voice. They also have sensors to monitor noise levels so that noise maps can be created and personal noise exposure data can be gathered.

This use of additional sensors in the headset opens up lots of opportunities. Ossic are a company that emerged from Abbey Road Red, the start-up incubator established by the legendary Abbey Road Studios. Their headphone is packed with sensors, measuring the shape of your ears, head and torso. This allows them to estimate your own head-related transfer function, or HRTF, which describes how sounds are filtered as they travel from to your ear canal. They can then apply this filtering to the headphone output, allowing sounds to be far more accurately placed around you. Without HRTF filtering, sources always appear to be coming from inside your head.

Its not as simple as that of course. For instance, when you move your head, you can still identify the direction of arrival of different sound sources. So the Ossic headphones also incorporate head tracking. And a well-measured HRTF is essential for accurate localization, but calibration to the ear is not perfect. So their headphones also have eight drivers rather than the usual two, allowing more careful positioning of sounds over a wide range of frequencies.

Ossic was funded by a Kickstarter campaign. Another headphone start-up, Ora, currently has a Kickstarter campaign. Ora is a venture that was founded at Tandem Launch, who create companies often arising from academic research, and have previously invested in research arising from the audio engineering research team behind this blog.

Ora aim to release ‘the world’s first graphene headphones.’ Graphene is a form of carbon, shaped in a one atom thick lattice of hexagons. In 2004, Andre Geim and Konstantin Novoselov of the University of Manchester, isolated the material, analysed its properties, and showed how it could be easily fabricated, for which they won the Nobel prize in 2010. Andre Geim, by the way, is a colourful character, and the only person to have won both the Nobel and Ig Nobel prizes, the latter awarded for experiments involving levitating frogs.

graphene-headGraphene

Graphene has some amazing properties. Its 200 times stronger than the strongest steel, efficiently conducts heat and electricity and is nearly transparent. In 2013, Zhou and Zettl published early results on a graphene-based loudspeaker. In 2014, Dejan Todorovic and colleagues investigated the feasibility of graphene as a microphone membrane, and simulations suggested that it could have high sensitivity (the voltage generated in response to a pressure input) over a wide frequency range, far better than conventional microphones. Later that year, Peter Gaskell and others from McGill University performed physical and acoustical measurements of graphene oxide which confirmed Todorovic’s simulation results. Interestingly, they seemed unaware of Todorovic’s work.

graphene_speaker_640Graphene loudspeaker, courtesy Zettl Research Group, Lawrence Berkeley National Laboratory and University of California at Berkeley

Ora’s founders include some of the graphene microphone researchers from McGill University. Ora’s headphone uses a Graphene-based composite material optimized for use in acoustic transducers. One of the many benefits is the very wide frequency range, making it an appealing choice for high resolution audio reproduction.

I should be clear. This blog is not meant as an endorsement of any of the mentioned companies. I haven’t tried their products. They are a sample of what is going on at the frontiers of headphone technology, but by no means cover the full range of exciting developments. Still, one thing is clear. High-end headphones in the near future will sound very different from the typical consumer headphones around today.

Cool stuff at the Audio Engineering Society Convention in Berlin

aesberlin17_IDS_headerThe next Audio Engineering Society convention is just around the corner, May 20-23 in Berlin. This is an event where we always have a big presence. After all, this blog is brought to you by the Audio Engineering research team within the Centre for Digital Music, so its a natural fit for a lot of what we do.

These conventions are quite big, with thousands of attendees, but not so big that you get lost or overwhelmed. The attendees fit loosely into five categories: the companies, the professionals and practitioners, students, enthusiasts, and the researchers. That last category is where we fit.

I thought I’d give you an idea of some of the highlights of the Convention. These are some of the events that we will be involved in or just attending, but of course, there’s plenty else going on.

On Saturday May 20th, 9:30-12:30, Dave Ronan from the team here will be presenting a poster on ‘Analysis of the Subgrouping Practices of Professional Mix Engineers.’ Subgrouping is a greatly understudied, but important part of the mixing process. Dave surveyed 10 award winning mix engineers to find out how and why they do subgrouping. He then subjected the results to detailed thematic analysis to uncover best practices and insights into the topic.

2:45-4:15 pm there is a workshop on ‘Perception of Temporal Response and Resolution in Time Domain.’ Last year we published an article in the Journal of the Audio Engineering Society  on ‘A meta-analysis of high resolution audio perceptual evaluation.’ There’s a blog entry about it too. The research showed very strong evidence that people can hear a difference between high resolution audio and standard, CD quality audio. But this brings up the question of why? Many people have suggested that the fine temporal resolution of oversampled audio might be perceived. I expect that this Workshop will shed some light on this as yet unresolved question.

Overlapping that workshop, there are some interesting posters from 3 to 6 pm. ‘Mathematical Model of the Acoustic Signal Generated by the Combustion Engine‘ is about synthesis of engine sounds, specifically for electric motorbikes. We are doing a lot of sound synthesis research here, and so are always on the lookout for new approaches and new models. ‘A Study on Audio Signal Processed by “Instant Mastering” Services‘ investigates the effects applied to ten songs by various online, automatic mastering platforms. One of those platforms, LandR, was a high tech spin-out from our research a few years ago, so we’ll be very interested in what they found.

For those willing to get up bright and early Sunday morning, there’s a 9 am panel on ‘Audio Education—What Does the Future Hold,’ where I will be one of the panellists. It should have some pretty lively discussion.

Then there’s some interesting posters from 9:30 to 12:30. We’ve done a lot of work on new interfaces for audio mixing, so will be quite interested in ‘The Mixing Glove and Leap Motion Controller: Exploratory Research and Development of Gesture Controllers for Audio Mixing.’ And returning to the subject of high resolution audio, there is ‘Discussion on Subjective Characteristics of High Resolution Audio,’ by Mitsunori Mizumachi. Mitsunori was kind enough to give me details about his data and experiments in hi-res audio, which I then used in the meta-analysis paper. He’ll also be looking at what factors affect high resolution audio perception.

From 10:45 to 12:15, our own Brecht De Man will be chairing and speaking in a Workshop on ‘New Developments in Listening Test Design.’ He’s quite a leader in this field, and has developed some great software that makes the set up, running and analysis of listening tests much simpler and still rigorous.

From 1 to 2 pm, there is the meeting of the Technical Committee on High Resolution Audio, of which I am co-chair along with Vicki Melchior. The Technical Committee aims for comprehensive understanding of high resolution audio technology in all its aspects. The meeting is open to all, so for those at the Convention, feel free to stop by.

Sunday evening at 6:30 is the Heyser lecture. This is quite prestigious, a big talk by one of the eminent people in the field. This one is given by Jorg Sennheiser of, well, Sennheiser Electronic.

Monday morning 10:45-12:15, there’s a tutorial on ‘Developing Novel Audio Algorithms and Plugins – Moving Quickly from Ideas to Real-time Prototypes,’ given by Mathworks, the company behind Matlab. They have a great new toolbox for audio plugin development, which should make life a bit simpler for all those students and researchers who know Matlab well and want to demo their work in an audio workstation.

Again in the mixing interface department, we look forward to hearing about ‘Formal Usability Evaluation of Audio Track Widget Graphical Representation for Two-Dimensional Stage Audio Mixing Interface‘ on Tuesday, 11-11:30. The authors gave us a taste of this work at the Workshop on Intelligent Music Production which our group hosted last September.

In the same session – which is all about ‘Recording and Live Sound‘ so very close to home – a new approach to acoustic feedback suppression is discussed in ‘Using a Speech Codec to Suppress Howling in Public Address Systems‘, 12-12:30. With several past projects on gain optimization for live sound, we are curious to hear (or not hear) the results!

The full program can be explored on the AES Convention planner or the Convention website. Come say hi to us if you’re there!

 

 

High resolution audio- finally, rigorously put to the test. And the verdict is…

Yes, you can hear a difference! (but it is really hard to measure)

See http://www.aes.org/e-lib/browse.cfm?elib=18296 for the June 2016 Open Access article in the Journal of the Audio Engineering Society  on “A meta-analysis of high resolution audio perceptual evaluation”

For years, I’ve been hearing people in the audio engineering community arguing over whether or not it makes any difference to record, mix and playback better than CD quality (44.1 kHz, 16 bit) or better than production quality (48 kHz, 16 bit) audio. Some people swear they can hear a difference, others have stories about someone they met who could always pick out the differences, others say they’re all just fooling themselves. A few people could mention a study or two that supported their side, but the arguments didn’t seem to ever get resolved.

Then, a bit more than a year ago I was at a dinner party where a guy sitting across from me was about to complete his PhD in meta-analysis. Meta-analysis? I’d never heard of it. But the concept, analysing and synthesising the results of many studies to get a more definitive answer and gain more insights and knowledge, really intrigued me. So it was about time that someone tried this on the question of perception of hi-res audio.

Unfortunately, no one I asked was willing to get involved. A couple of experts thought there couldn’t be enough data out there to do the meta-analysis. A couple more thought that the type of studies (not your typical clinical trial with experimental and control groups) couldn’t be analysed using the established statistical approaches in meta-analysis. So, I had to do it myself. This also meant I had to be extra careful, and seek out as much advice as possible, since no one was looking over my shoulder to tell me when I was wrong or stupid.

The process was fascinating. The more I looked, the more I uncovered studies of high resolution audio perception. And my main approach for finding them (start with a few main papers, then look at everyone they cited and everyone who cited them, and repeat with any further interesting papers found), was not mentioned in the guidance to meta-analysis that I read. Then getting the data was interesting. Some researchers had it all prepared in handy, well-labelled spreadsheets, one other found it in an old filing cabinet, one had never kept it at all! And for some data, I had to write little programs to reverse engineer the raw data from T values for trials with finite outcomes.

Formal meta-analysis techniques could be applied, and I gained a strong appreciation for both the maths behind them, and the general guidance that helps ensure rigour and helps avoid bias in the meta-study, But the results, in a few places, disagreed with what is typical. The potential biases in the studies seemed to occur more often with those that did not reject the null hypothesis, i.e., those that found no evidence for discriminating between high resolution and CD quality audio. Evidence of publication bias seemed to mostly go away if one put the studies into subgroups. And use of binomial probabilities allowed the statistical approaches in meta-analysis to be applied to studies where there was not a control group (‘no effect’ can be determined just from binomial probabilities).

The end result was that people could, sometimes, perceive the difference between hi-res and CD audio. But they needed to be trained and the test needed to be carefully designed. And it was nice to see that the experiments and analysis were generally a little better today than in the past, so research is advancing. Still, most tests had some biases towards false negatives. So perhaps, careful experiments, incorporating all the best approaches, may show this perception even more strongly.

Meta-analysis is truly fascinating, and audio engineering, psychoacoustics, music technology and related fields need more of it.

Human echolocation, absolute pitch and Golden Ears

I’m always intrigued by stories of people with amazing abilities, and similar questions often come up. Is this for real, and is this a latent ability that we all might have?

 

A few years ago there was a lot of news stories about Daniel Kish, see “The blind man who taught himself to see”  or “Human echolocation: Using tongue-clicks to navigate the world.” Daniel is a master of echolocation, the ability to sense the environment by listening to the echoes from actively produced sounds, though Daniel is also newsworthy for his humanitarian contributions helping other visually impaired people, see his World Access for the Blind charity . His ability is amazing, and the first question, “Is this for real?” is easily answered in the affirmative. Quite a few studies have also shown that many (or most or even all) people have some echolocation ability, and that the blind generally perform better. And Daniel has taught others to hone their skills.

 

You can find Daniel Kish’s TedX talk at https://www.youtube.com/watch?v=ob-P2a6Mrjs

 

And here’s a wonderful light piece about an eight year old learning echolocation skills

This got me thinking about some other amazing auditory skills. I remember when I was a teen, at a friend’s house, and he told me the names of the white keys on a piano (my musical knowledge was nonexistent). He then asked me to play any of them and he’d tell me which one it was. I thought I’d trick him and so I played one of the black keys. He turned around surprised and said A sharp. So I tried hitting two keys, and he got that right. I soon established that he could identify correctly at most any two keys, and sometimes even three hit together. I said, “Wow, you were born with perfect pitch.” And he looked at me and said “Not born with it. It’s because I’ve been playing piano since I was four!” I also remember that he was amazing at playing music by ear, which is no doubt related, but lousy at sheet reading.

And I don’t know if he had absolute pitch in the true sense. Could he identify the note played on other instruments? Maybe his skill was just limited to what was played at home on his piano, or just generally to piano. Absolute pitch is a phenomenon where there is some debate about the extent to which we might all be able to do it. Some studies suggest that there could be a genetic trait, but there’s also a lot of evidence to suggest that it could be learned. So can anyone learn it, and can they learn it at any time? Certainly, relative pitch skills can be acquired late in life (there’s a lot of material on critical listening and ear training that help someone learn this skill), and repeated exposure can provide someone with an external reference. With enough training, enough examples of different timbres with different fundamentals, perhaps almost anyone could identify the pitch of a wide variety of different sounds.

Extraordinary auditory skills has also come up in some recent research that we’ve been involved in, see J. D. Reiss, “A Meta-Analysis of High Resolution Audio Perceptual Evaluation,” Journal of the Audio Engineering Society, v. 64 (6), June 2016, http://www.aes.org/e-lib/browse.cfm?elib=18296 .

. We were interested in whether people could perceive a difference in CD quality audio (16 bits, 44.1 kHz) and high resolution audio (loosely, anything beyond CD quality). Some anecdotes have mentioned individuals with ‘Golden Ears.’ That is, there might exist a few special people with an exceptional ability to hear this difference, even if the vast majority cannot distinguish the two formats. Our research involved a meta-analysis of all studies looking into the ability to discriminate between high resolution and standard format audio. For a lot of studies, participants were asked lots of binary questions (like ‘are these two samples the same or different?’ or ‘which of these two samples sounds closest to a high resolution reference sample?’). Thus, one could assign a p value to each participant, which corresponds to the probability of getting at least that many correct answers if the participant was just guessing. If everyone was always just guessing, then the p values should be uniformly distributed. If there is a Golden Ears phenomenon, then there should be a ‘bump’ in the low p values.

Well, here’s the histogram of p values from participants from a lot of the studies.

Picture1

You can’t really tell if there’s a Golden Ears phenomenon or not. Why? Well, first, you need a lot of data to see structure in a histogram. But also, our p values are discrete and finite. If a participant was involved in only 4 trials, there are only 5 possible p values; 0.0625 (all 4 correct), 0.3125 (at least 3 correct), 0.6875 (at least 2 correct), 0.9375 (at least one correct), and 1. So there are a lot of bins in our histogram that this participant will never hit. The histogram isn’t showing any participants who only did 4 trials, but the problem is still there even for participants who did a lot, but still finite number, of trials.

There are other issues of course. Maybe this Golden Ears phenomenon only occurs in one out of a thousand people, and those people just weren’t participants. Just one of many reasons why its hard to reject the alternative hypothesis in null hypothesis testing.

But what we did find is that, on average, participants were correct much more than 50% of the time, and that was statistically significant. More on that in an upcoming blog, and in the above mentioned paper ‘A meta-analysis of high resolution audio perceptual evaluation’ in the Journal of the Audio Engineering Society.

Why 44.1 kHz?

Why is  44.1 kHz the standard sample rate in consumer audio?
44.1 kHz, or 44,100 samples persecond, is perhaps the most popular sample rate used in digital audio, especially for music content. The short answer as to why it is so popular is simple; it was the sample rate chosen for the Compact Disc, and thus is the sample rate of much audio taken from CDs, and the default sample rate of much audio workstation software.
As to why it was chosen as the sample rate for the Compact Disc, the answer is a bit more interesting. In the 1970s, when digital recording was still in its infancy, many different sample rates were used, including 37kHz and 50 kHz in Soundstream’s recordings. In the late 70s, Philips and Sony collaborated on the Compact Disc, and there was much debate between the two companies regarding sample rate. In the end, 44.1 kHz was chosen for a number of reasons.
According to the Nyquist theorem, 44.1 kHz allows reproduction of all frequency content below 22.05 kHz. This covers all frequencies heard by a normal person. Though there is still debate about perception of high frequency content, it is generally agreed that few people can hear tones above 20 kHz.
44.1 kHz also allowed the creators of the CD format to fit at least 80 minutes of music (more than on a vinyl LP record) on a 120 millimeter disc, which was considered a strong selling point.
But 44,100 is a rather special number. 44,100 = 2x2x3x3x5x5x7x7, and hence 44.1kHz is actually an easy number to work with for many calculations.