Sneak preview of the research to be unveiled at the 145th Audio Engineering Society

max-audio-logo2[1]

We’ve made it a tradition on this blog to preview the technical program at the Audio Engineering Society Conventions, as we did with the 142nd, 143rd, and 144th AES Conventions. The 145th AES  convention is just around the corner, October 17 to 20 in New York. As before, the Audio Engineering research team behind this blog will be quite active at the convention.

These conventions have thousands of attendees, but aren’t so large that you get lost or overwhelmed. Away from the main exhibition hall is the Technical Program, which includes plenty of tutorials and presentations on cutting edge research.

So we’ve gathered together some information about a lot of the events that caught our eye as being unusual, exceptionally high quality involved in, attending, or just worth mentioning. And this Convention will certainly live up to the hype. Plus, its a special one, the 70th anniversary of the founding of the AES.

By the way, I don’t think I mention a single loudspeaker paper below, but the Technical Program is full of them this time. You could have a full conference just on loudspeakers from them. If you want to become an expert on loudspeaker research, this is the place to be.

Anyway, lets dive right in.

Wednesday, October 17th

We know different cultures listen to music differently, but do they listen to audio coding artifacts differently? Find out at 9:30 when Sascha Disch and co-authors present On the Influence of Cultural Differences on the Perception of Audio Coding Artifacts in Music.

ABX, AB, MUSHRA… so many choices for subjective evaluation and listening tests, so little time. Which one to use, which one gives the strongest results? Lets put them all to the test while looking at the same question. This is what was done in Investigation into the Effects of Subjective Test Interface Choice on the Validity of Results, presented at 11:30. The results are strong, and surprising. Authors include former members of the team behind this blog, Nick Jillings and Brecht de Man, myself and frequent collaborator Ryan Stables.

From 10-11:30, Steve Fenton will be presenting the poster Automatic Mixing of Multitrack Material Using Modified Loudness Models. Automatic mixing is a really hot research area, one where we’ve made quite a few contributions. And a lot of it has involved loudness models for level balancing or fader settings. Someone really should do a review of all the papers focused on that, or better yet, a meta-analysis. Dr. Fenton and co-authors also have another poster in the same session, about a Real-Time System for the Measurement of Perceived Punch. Fenton’s PhD was about perception and modelling of punchiness in audio, and I suggested to him that the thesis should have just been titled ‘Punch!’

The researchers from Harman continue their analysis of headphone preference and quality with A Survey and Analysis of Consumer and Professional Headphones Based on Their Objective and Subjective Performances at 3:30. Harman obviously have a strong interest in this, but its rigorous, high quality research, not promotion.

In the 3:00 to 4:30 poster session, Daniel Johnston presents a wonderful spatial audio application, SoundFields: A Mixed Reality Spatial Audio Game for Children with Autism Spectrum Disorder. I’m pretty sure this isn’t the quirky lo-fi singer/songwriter Daniel Johnston.

Thursday, October 18th

There’s something bizarre about the EBU R128 / ITU-R BS.1770 specification for loudness measurements. It doesn’t give the filter coefficients as a function of sample rate. So, for this and other reasons, even though the actual specification is just a few lines of code, you have to reverse engineer it if you’re doing it yourself, as was done here. At 10 am, Brecht de Man presents Evaluation of Implementations of the EBU R128 Loudness Measurement, which looks carefully at different implementations and provides full implementations in several programming languages.

Roughly one in six people in developed countries suffer some hearing impairment. If you think that seems too high, think how many wear glasses or contact lenses or had eye surgery. And given the sound exposure, I’d expect the average to be higher with music producers. But we need good data on this. Thus, Laura Sinnott’s 3 pm presentation on Risk of Sound-Induced Hearing Disorders for Audio Post Production Engineers: A Preliminary Study is particularly relevant.

Some interesting posters in the 2:45 to 4:15 session. Maree Sheehan’s Audio Portraiture –The Sound of Identity, an Indigenous Artistic Enquiry uses 3D immersive and binaural sound to create audio portraits of Maori women. Its a wonderful use of state of the art audio technologies for cultural and artistic study. Researchers from the University of Alcala in Madrid present an improved method to detect anger in speech in Precision Maximization in Anger Detection in Interactive Voice Response Systems.

Friday, October 19th

There’s plenty of interesting papers this day, but only one I’m highlighting. By coincidence, its my own presentation of work with He Peng, on Why Can You Hear a Difference between Pouring Hot and Cold Water? An Investigation of Temperature Dependence in Psychoacoustics. This was inspired by the curious phenomenon and initial investigations described in a previous blog entry.

Saturday, October 20th

Get there early on Saturday to find out about audio branding from a designer’s perspective in the 9 am Creative Approach to Audio in Corporate Brand Experiences.

Object-based audio allows broadcasters to deliver separate channels for sound effects, music and dialog, which can then be remixed on the client-side. This has high potential for delivering better sound for the hearing-impaired, as described in Lauren Ward’s Accessible Object-Based Audio Using Hierarchical Narrative Importance Metadata at 9:45. I’ve heard this demonstrated by the way, and it sounds amazing.

A big challenge with spatial audio systems is the rendering of sounds that are close to the listener. Descriptions of such systems almost always begin with ‘assume the sound source is in the far field.’ In the 10:30 to 12:00 poster session, researchers from the Chinese Academy of Science present a real advance in this subject with Near-Field Compensated Higher-Order Ambisonics Using a Virtual Source Panning Method.

Rob Maher is one of the world’s leading audio forensics experts. At 1:30 in Audio Forensic Gunshot Analysis and Multilateration, he looks at how to answer the question ‘Who shot first?’ from audio recordings. As is often the case in audio forensics, I suspect this paper was motivated by real court cases.

When visual cues disagree with auditory cues, which ones do you believe? Or conversely, does low quality audio seem more realistic if strengthened by visual cues? These sorts of questions are investigated at 2 pm in the large international collaboration Influence of Visual Content on the Perceived Audio Quality in Virtual Reality. Audio Engineering Society Conventions are full of original research, but survey and review papers are certainly welcomed, especially ones like the thorough and insightful HRTF Individualization: A Survey, presented at 2:30.

Standard devices for measuring auditory brainstem response are typically designed to work only with clicks or tone bursts. A team of researchers from Gdansk developed A Device for Measuring Auditory Brainstem Responses to Audio, presented in the 2:30 to 4 pm poster session.

 

Hopefully, I can also give a wrap-up after the Convention, as we did here and here.

Advertisements

Aeroacoustic Sound Effects – Journal Article

I am delighted to be able to announce that my article on Creating Real-Time Aeroacoustic Sound Effects Using Physically Informed Models is in this months Journal of the Audio Engineering Society. This is an invited article following winning the best paper award at the Audio Engineering Society 141st Convention in LA. It is an open access article so free for all to download!

The article extends the original paper by examining how the Aeolian tone synthesis models can be used to create a number of sound effects. The benefits of these models are that the produce plausible sound effects which operate in real-time. Users are presented with a number of highly relevant parameters to control the effects which can be mapped directly to 3D models within game engines.

The basics of the Aeolian tone were given in a previous blog post. To summarise, a tone is generated when air passes around an object and vortices are shed behind it. Fluid dynamic equations are available which allow a prediction of the tone frequency based on the physics of the interaction between the air and object. The Aeolian tone is modelled as a compact sound source.

To model a sword or similar object a number of these compact sound sources are placed in a row. A previous blog post describes this in more detail. The majority of compact sound sources are placed at the tip as this is where the airspeed is greatest and the greatest sound is generated.

The behaviour of a sword when being swung has to be modelled which then used to control some of the parameters in the equations. This behaviour can be controlled by a game engine making fully integrated procedural audio models.

The sword model was extended to include objects like a baseball bat and golf club, as well as a broom handle. The compact sound source of a cavity tone was also added in to replicate swords which have grooved profiles. Subjective evaluation gave excellent results, especially for thicker objects which were perceived as plausible as pre-recorded samples.

The synthesis model could be extended to look at a range of sword cross sections as well as any influence of the material of the sword. It is envisaged that other sporting equipment which swing or fly through the air could be modelled using compact sound sources.

A propeller sound is one which is common in games and film and partially based on the sounds generated from the Aeolian tone and vortex shedding. As a blade passes through the air vortices are shed at a specific frequency along the length. To model individual propeller blades the profiles of a number were obtained with specific span length (centre to tip) and chord lengths (leading edge to trailing edge).

Another major sound source is the loading sounds generated by the torque and thrust. A procedure for modelling these sounds is outlined in the article. Missing from the propeller model is distortion sounds. These are more associated with rotors which turn in the horizontal plane.

An important sound when hearing a propeller powered aircraft is the engine sound. The one taken for this model was based on one of Andy Farnell’s from his book Designing Sound. Once complete a user is able to select an aircraft from a pre-programmed bank and set the flight path. If linked to a game engine the physical dimensions and flight paths can all be controlled procedurally.

Listening tests indicate that the synthesis model was as plausible as an alternative method but still not as plausible as pre-recorded samples. It is believed that results may have been more favourable if modelling electric-powered drones and aircraft which do not have the sound of a combustion engine.

The final model exploring the use of the Aeolian tone was that of an Aeolian Harp. This is a musical instrument that is activated by wind blowing around the strings. The vortices that are shed behind the string can activate a mechanical vibration if they are around the frequency of one of the strings natural harmonics. This produces a distinctive sound.

The digital model allows a user to synthesis a harp of up to 13 strings. Tension, mass density, length and diameter can all be adjusted to replicate a wide variety of string material and harp size. Users can also control a wind model modified from one presented in Andy Farnell’s book Designing Sound, with control over the amount of gusts. Listening tests indicate that the sound is not as plausible as pre-recorded ones but is as plausible as alternative synthesis methods.

The article describes the design processes in more detail as well as the fluid dynamic principles each was developed from. All models developed are open source and implemented in pure data. Links to these are in the paper as well as my previous publications. Demo videos can be found on YouTube.

Weird and wonderful research to be unveiled at the 144th Audio Engineering Society Convention

th

Last year, we previewed the142nd and 143rd AES Conventions, which we followed with a wrap-up discussions here and here. The next AES  convention is just around the corner, May 23 to 26 in Milan. As before, the Audio Engineering research team here aim to be quite active at the convention.

These conventions have thousands of attendees, but aren’t so large that you get lost or overwhelmed. Away from the main exhibition hall is the Technical Program, which includes plenty of tutorials and presentations on cutting edge research.

So we’ve gathered together some information about a lot of the events that caught our eye as being unusual, exceptionally high quality involved in, attending, or just worth mentioning. And this Convention will certainly live up to the hype.

Wednesday May 23rd

From 11:15 to 12:45 that day, there’s an interesting poster by a team of researchers from the University of Limerick titled Can Visual Priming Affect the Perceived Sound Quality of a Voice Signal in Voice over Internet Protocol (VoIP) Applications? This builds on work we discussed in a previous blog entry, where they did a perceptual study of DFA Faders, looking at how people’s perception of mixing changes when the sound engineer only pretends to make an adjustment.

As expected given the location, there’s lots of great work being presented by Italian researchers. The first one that caught my eye is the 2:30-4 poster on Active noise control for snoring reduction. Whether you’re a loud snorer, sleep next to someone who is a loud snorer or just interested in unusual applications of audio signal processing, this one is worth checking out.

Do you get annoyed sometimes when driving and the road surface changes to something really noisy? Surely someone should do a study and find out which roads are noisiest so that then we can put a bit of effort into better road design and better in-vehicle equalisation and noise reduction? Well, now its finally happened with this paper in the same session on Deep Neural Networks for Road Surface Roughness Classification from Acoustic Signals.

Thursday, May 24

If you were to spend only one day this year immersing yourself in frontier audio engineering research, this is the day to do it.

How do people mix music differently in different countries? And do people perceive the mixes differently based on their different cultural backgrounds? These are the sorts of questions our research team here have been asking. Find out more in this 9:30 presentation by Amandine Pras. She led this Case Study of Cultural Influences on Mixing Practices, in collaboration with Brecht De Man (now with Birmingham City University) and myself.

Rod Selfridge has been blazing new trails in sound synthesis and procedural audio. He won the Best Student Paper Award at AES 141st Convention and the Best Paper Award at Sound and Music Computing. He’ll give another great presentation at noon on Physically Derived Synthesis Model of an Edge Tone which was also discussed in a recent blog entry.

I love the title of this next paper, Miniaturized Noise Generation System—A Simulation of a Simulation, which will be presented at 2:30pm by researchers from Intel Technology in Gdansk, Poland. This idea of a meta-simulation is not as uncommon as you might think; we do digital emulation of old analogue synthesizers, and I’ve seen papers on numerical models of Foley rain sound generators.

A highlight for our team here is our 2:45 pm presentation, FXive: A Web Platform for Procedural Sound Synthesis. We’ll be unveiling a disruptive innovation for sound design, FXive.com, aimed at replacing reliance on sound effect libraries. Please come check it out, and get in touch with the presenters or any members of the team to find out more.

Immediately following this is a presentation which asks Can Algorithms Replace a Sound Engineer? This is a question the research team here have also investigated a lot, you could even say it was the main focus of our research for several years. The team behind this presentation are asking it in relation to Auto-EQ. I’m sure it will be interesting, and I hope they reference a few of our papers on the subject.

From 9-10:30, I will chair a Workshop on The State of the Art in Sound Synthesis and Procedural Audio, featuring the world’s experts on the subject. Outside of speech and possibly music, sound synthesis is still in its infancy, but its destined to change the world of sound design in the near future. Find out why.

12:15 — 13:45 is a workshop related to machine learning in audio (a subject that is sometimes called Machine Listening), Deep Learning for Audio Applications. Deep learning can be quite a technical subject, and there’s a lot of hype around it. So a Workshop on the subject is a good way to get a feel for it. See below for another machine listening related workshop on Friday.

The Heyser Lecture, named after Richard Heyser (we discussed some of his work in a previous entry), is a prestigious evening talk given by one of the eminent individuals in the field. This one will be presented by Malcolm Hawksford. , a man who has had major impact on research in audio engineering for decades.

Friday

The 9:30 — 11 poster session features some unusual but very interesting research. A talented team of researchers from Ancona will present A Preliminary Study of Sounds Emitted by Honey Bees in a Beehive.

Intense solar activity in March 2012 caused some amazing solar storms here on Earth. Researchers in Finland recorded them, and some very unusual results will be presented in the same session with the poster titled Analysis of Reports and Crackling Sounds with Associated Magnetic Field Disturbances Recorded during a Geomagnetic Storm on March 7, 2012 in Southern Finland.

You’ve been living in a cave if you haven’t noticed the recent proliferation of smart devices, especially in the audio field. But what makes them tick, is there a common framework and how are they tested? Find out more at 10:45 when researchers from Audio Precision will present The Anatomy, Physiology, and Diagnostics of Smart Audio Devices.

From 3 to 4:30, there’s a Workshop on Artificial Intelligence in Your Audio. It follows on from a highly successful workshop we did on the subject at the last Convention.

Saturday

A couple of weeks ago, John Flynn wrote an excellent blog entry describing his paper on Improving the Frequency Response Magnitude and Phase of Analogue-Matched Digital Filters. His work is a true advance on the state of the art, providing digital filters with closer matches to their analogue counterparts than any previous approaches. The full details will be unveiled in his presentation at 10:30.

If you haven’t seen Mariana Lopez presenting research, you’re missing out. Her enthusiasm for the subject is infectious, and she has a wonderful ability to convey the technical details, their deeper meanings and their importance to any audience. See her one hour tutorial on Hearing the Past: Using Acoustic Measurement Techniques and Computer Models to Study Heritage Sites, starting at 9:15.

The full program can be explored on the Convention Calendar or the Convention website. Come say hi to us if you’re there! Josh Reiss (author of this blog entry), John Flynn, Parham Bahadoran and Adan Benito from the Audio Engineering research team within the Centre for Digital Music, along with two recent graduates Brecht De Man and Rod Selfridge, will all be there.

Analogue matched digital EQ: How far can you go linearly?

(Background post for the paper “Improving the frequency response magnitude and phase of
analogue-matched digital filters” by John Flynn & Josh Reiss for AES Milan 2018)

Professional audio mastering is a field that is still dominated by analogue hardware. Many mastering engineers still favour their go-to outboard compressors and equalisers over digital emulations. As a practising mastering engineer myself, I empathise. Quality analogue gear has a proven track record in terms of sonic quality spanning about a century. Even though digital approximations of analogue tools have gotten better, particularly over the past decade, I too have tended to reach for analogue hardware. However, through my research at Queen Mary with Professor Josh Reiss, that is changing.

When modelling an analogue EQ, a lot of focus has been in modelling distortions and other non-linearities, we chose to look at the linear component. Have we reached a ceiling in terms of modelling an analogue prototype filter in the digital domain? Can we do better? We found that yes there was room for improvement and yes we can do better.

The milestone of research in this area is Orfanidis’ 1997 paper “Digital parametric equalizer design with prescribed Nyquist-frequency gain“, the first major improvement over the bilinear transform which has a reknowned ‘cramped’ sound in the high frequencies. Basically, the bilinear transform is what all first generation digital equalisers is based on. It’s high frequencies towards 20kHz drops sharply, giving a ‘closed/cramped’ sound. Orfanidis and later improvements by Massberg [9] & Gunness/Chauhan [10] give a much better approximation of an analogue prototype.

blt

However [9],[10] improve magnitude, they don’t capture analogue phase. Bizarrely, the bilinear transform performs reasonably well on phase. So we knew it was possible.

So the problem is: how do you get a more accurate magnitude match to analogue than [9],[10]? While also getting a good match to phase? Many attempts, including complicated iterative Parks/McClellen filter design approaches, fell flat. It turned out that Occam was right, in this case a simple answer was the better answer.

By combining a matched-z transform, frequency sampling filter design and a little bit of clever coefficient manipulation, we achieved excellent results. A match to the analogue prototype to an arbitrary degree. At low filter lengths you get a filter that performs as well as [9],[10] in magnitude but also matches analogue phase. By using longer filter lengths the match to analogue is extremely precise, in both magnitude and phase (lower error is more accurate)

error-vs

 

Since submitting the post I have released the algorithm in a plugin with my mastering company and been getting informal feedback from other mastering engineers about how this sounds in use.

balance-mastering-analog-magpha-eq-plugin-small-new

Overall the word back has been overwhelmingly positive, with one engineer claiming it to be the “the best sounding plugin EQ on the market to date”. It’s nice know that those long hours staring at decibel error charts have not been in vain.

Are you heading to AES Milan next month? Come up and say hello!

 

Audio Engineering Society E-library

I try to avoid too much promotion in this blog, but in this case I think its justified. I’m involved in advancing a resource from a non-profit professional organisation, the Audio Engineering Society. They do lots and lots of different things, promoting the science, education and practice of all things audio engineering related. Among others, they’ve been publishing research in the area for almost 70 years, and institutions can get full access to all the content in a searchable library. In recent posts, I’ve written about some of the greatest papers ever published there, Part 1 and Part 2, and about one of my own contributions.

In an ideal world, this would all be Open Access . But publishing still costs money, so the AES support both gold Open Access (free to all, but authors pay Article Processing Charges) and the traditional model, where its free to publish but individuals or institutions subscribe or articles can be purchased individually. AES members get free access. I could write many blog articles just about Open Access (should I?)- its never as straightforward as it seems. At its best it is freely disseminating information for the benefit of all, but at its worst its like Pay to Play, a highly criticised practice for the music industry, and gives publishers an incentive to lower acceptance standards. But for now I’ll just point out that the AES does its absolute best to keep the costs down, regardless of publishing model, and the costs are generally much less than similar publishers.

Anyway, the AES realised that one of the most cost effective ways to get our content out to large communities is through institutional licenses or subscriptions. And we’re missing an opportunity here since we haven’t really promoted this option. And everybody benefits from it; wider dissemination of knowledge and research, more awareness of the AES, better access, etc. With this in mind, the AES issued the following press release, which I have copied verbatim. You can also find it as a tweet, blog entry or facebook post.

AES_ELibrary

AES E-Library Subscriptions Benefit Institutions and Organizations

— The Audio Engineering Society E-Library is the world’s largest collection of audio industry resources, and subscriptions provide access to extensive content for research, product development and education — 

New York, NY, March 22, 2018 — Does your research staff, faculty or students deserve access to the world’s most comprehensive collection of audio information? The continuously growing Audio Engineering Society (AES) E-Library contains over 16,000 fully searchable PDF files documenting the progression of audio research from 1953 to the present day. It includes every AES paper published from every AES convention and conference, as well as those published in the Journal of the Audio Engineering Society. From the phonograph to MP3s, from early concepts of digital audio through its fulfillment as the mainstay of audio production, distribution and reproduction, to leading-edge realization of spatial audio and audio for augmented and virtual reality, the E-Library provides a gateway to both the historical and the forward-looking foundational knowledge that sustains an entire industry.  

The AES E-Library has become the go-to online resource for anyone looking to gain instant access to the vast amount of information gathered by the Audio Engineering Society through research, presentations, interviews, conventions, section meetings and more. “Our academic and research staff, and PhD and undergraduate Tonmeister students, use the AES E-Library a lot,” says Dr. Tim Brookes, Senior Lecturer in Audio & Director of Research Institute of Sound Recording (IoSR) University of Surrey. “It’s an invaluable resource for our teaching, for independent student study and, of course, for our research.” 

“Researchers, academics and students benefit from E-Library access daily,” says Joshua Reiss, Chair of the AES Publications Policy Committee, “while many relevant institutions – academic, governmental or corporate – do not have an institutional license of the AES E-library, which means their staff or students are missing out on all the wonderful content there. We encourage all involved in audio research and investigation to inquire if their libraries have an E-Library subscription and, if not, suggest the library subscribe.” 

E-Library subscriptions can be obtained directly from the AES or through journal bundling services. A subscription allows a library’s users to download any document in the E-Library at no additional cost. 

“As an international audio company with over 25,000 employees world-wide, the AES E-library has been an incredibly valuable resource used by Harman audio researchers, engineers, patent lawyers and others,” says Dr. Sean Olive, Acoustic Research Fellow, Harman International. “It has paid for itself many times over.” 

The fee for an institutional online E-Library subscription is $1800 per year, which is significantly less than equivalent publisher licenses. 

To search the E-library, go to http://www.aes.org/e-lib/

To arrange for an institutional license, contact Lori Jackson directly at lori.jackson@aes.org, or go to http://www.aes.org/e-lib/subscribe/.

 

About the Audio Engineering Society
The Audio Engineering Society, celebrating its 70th anniversary in 2018, now counts over 12,000 members throughout the U.S., Latin America, Europe, Japan and the Far East. The organization serves as the pivotal force in the exchange and dissemination of technical information for the industry. Currently, its members are affiliated with 90 AES professional sections and more than 120 AES student sections around the world. Section activities include guest speakers, technical tours, demonstrations and social functions. Through local AES section events, members experience valuable opportunities for professional networking and personal growth. For additional information visit http://www.aes.org.

Join the conversation and keep up with the latest AES News and Events:
Twitter: #AESorg (AES Official) 
Facebook: http://facebook.com/AES.org

Greatest JAES papers of all time, Part 2

Last week I revealed Part 1 of the greatest ever papers published in the Journal of the Audio Engineering Society (JAES). JAES is the premier peer-reviewed journal devoted exclusively to audio technology, and the flagship publication of the AES. This week, its time for Part 2. There’s little rhyme or reason to how I divided up and selected the papers, other than I started by looking at the most highly cited ones according to Google Scholar. But all the papers listed here have had major impact on the science, education and practice of audio engineering and related fields.

All of the papers below are available from the Audio Engineering Society (AES) E-library, the world’s most comprehensive collection of audio information. It contains over 16,000 fully searchable PDF files documenting the progression of audio research from 1953 to the present day. It includes every AES paper published at a convention, conference or in the Journal. Members of the AES get free access to the E-library. To arrange for an institutional license, giving full access to all members of an institution, contact Lori Jackson Lori Jackson directly, or go to http://www.aes.org/e-lib/subscribe/ .

And without further ado, here are the rest of the Selected greatest JAES papers

More than any other work, this 1992 paper by Stanley Lipshitz and co-authors has resulted in the correct application of dither by music production. Its one possible reason that digital recording quality improved after the early years of the Compact Disc (though the loudness wars reversed that trend). As renowned mastering engineer Bob Katz put it, “if you want to get your digital audio done just right, then you should learn about dither,” and there is no better resource than this paper.

According to Wikipedia, this 1993 paper coined the term Auralization as an analogy to visualization for rendering audible (imaginary) sound fields. This general research area of understanding and rendering the sound field of acoustic spaces has resulted in several other highly influential papers. Berkhout’s 1988 A holographic approach to acoustic control (575 citations) described the appealingly named acoustic holography method for rendering sound fields. In 1999, the groundbreaking Creating interactive virtual acoustic environments (427 citations) took this further, laying out the theory and challenges of virtual acoustics rendering, and paving the way for highly realistic audio in today’s Virtual Reality systems.

The Schroeder reverberator was first described here, way back in 1962. It has become the basis for almost all algorithmic reverberation approaches. Manfred Schroeder was another great innovator in the audio engineering field. A long transcript of a fascinating interview is available here, and a short video interview below.

These two famous papers are the basis for the Thiele Small parameters. Thiele rigorously analysed and simulated the performance of loudspeakers in the first paper from 1971, and Small greatly extended the work in the second paper in 1972. Both had initially published the work in small Australian journals, but it didn’t get widely recognised until the JAES publications. These equations form the basis for much of loudspeaker design.

Check out;

or the dozens of youtube videos about choosing and designing loudspeakers which make use of these parameters.

This is the first English language publication to describe the Haas effect, named after the author. Also called the precedence effect, it investigated the phenomenon that when sending the same signal to two loudspeakers, a small delay between the speakers results in the sound appearing to come just from one speaker. Its now widely used in sound reinforcement systems, and in audio production to give a sense of depth or more realistic panning (the Haas trick).

Hass-effect

This is the first ever research paper published in JAES. Published in August 1949, it set a high standard for rigour, while at the same time emphasising that many publications will have strong relevance not just to researchers, but to audiophiles and practitioners as well.

It described a new instrument for frequency response measurement and display. People just love impulse response and transfer function measurements, and some of the most highly cited JAES papers are on this topic; 1983’s An efficient algorithm for measuring the impulse response using pseudorandom noise (308 citations), Transfer-function measurement with maximum-length sequences (771 citations), the 2001 paper from a Brazil-based team, Transfer-function measurement with sweeps (722 citations), and finally Comparison of different impulse response measurement techniques (276 citations) in 2002. With a direct link between theory and new applications, these papers on maximum length sequence approaches and sine sweeps were major advances over the alternatives, and changed the way such measurements are made.

And the winner is… Ville Pulkki’s Vector Base Amplitude Panning (VBAP) paper! This is the highest cited paper in JAES. Besides deriving the stereo panning law from basic geometry, it unveiled VBAP, an intuitive and now widely used spatial audio technique. Ten years later, Pulkki unveiled another groundbreaking spatial audio format, DirAC, in Spatial sound reproduction with directional audio coding (386 citations).

Greatest JAES papers of all time, Part 1

The Journal of the Audio Engineering Society (JAES) is the premier publication of the AES, and is the only peer-reviewed journal devoted exclusively to audio technology. The first issue was published in 1949, though volume 1 began in 1953. For the past 70 years, it has had major impact on the science, education and practice of audio engineering and related fields.

I was curious which were the most important JAES papers, so had a look at Google Scholar to see which had the most citations. This has lots of issues, not just because Scholar won’t find everything, but because a lot of the impact is in products and practice, which doesn’t usually lead to citing the papers. Nevertheless, I looked over the list, picked out some of the most interesting ones and following no rules except my own biases, selected the Greatest Papers of All Time Published in the Journal of the Audio Engineering Society. Not surprisingly, the list is much longer than a single blog entry, so this is just part 1.

All of the papers below are available from the Audio Engineering Society (AES) E-library, the world’s most comprehensive collection of audio information. It contains over 16,000 fully searchable PDF files documenting the progression of audio research from 1953 to the present day. It includes every AES paper published at a convention, conference or in the Journal. Members of the AES get free access to the E-library. To arrange for an institutional license, giving full access to all members of an institution, contact Lori Jackson Lori Jackson directly, or go to http://www.aes.org/e-lib/subscribe/ .

Selected greatest JAES papers

ambisonicsThis is the main ambisonics paper by one* of its originator, Michael Gerzon, and perhaps the first place the theory was described in detail (and very clearly too). Ambisonics is incredibly flexible and elegant. It is now used in a lot of games and has become the preferred audio format for virtual reality. Two other JAES ambisonics papers are also very highly cited. In 1985, Michael Gerzon’s Ambisonics in multichannel broadcasting and video (368 citations) described the high potential of ambisonics for broadcast audio, which is now reaching its potential due to the emergence of object-based audio production. And 2005 saw Mark Poletti’s Three-dimensional surround sound systems based on spherical harmonics (348 citations), which rigorously laid out and generalised all the mathematical theory of ambisonics.

*See the comment on this entry. Jerry Bauck correctly pointed out that Duane H. Cooper was the first to describe ambisonics in some form, and Michael Gerzon credited him for it too. Cooper’s work was also published in JAES. Thanks Jerry.

James Moorer

This isn’t one of the highest cited papers, but it still had huge impact, and James Moorer is a legend in the field of audio engineering (see his prescient ‘Audio in the New Millenium‘). The paper popularised the phase vocoder, now one of the most important building blocks of modern audio effects. Auto-tune, anyone?

Richard Heyser’s Time Delay Spectrometry technique allowed one to make high quality anechoic spectral measurements in the presence of a reverberant environment. It was ahead of its time since despite the efficiency and elegance, computing power was not up to employing the method. But by the 1980s, it was possible to perform complex on-site measurements of systems and spaces using Time Delay Spectrometry. The AES now organises Heyser Memorial Lectures in his honor.

hrtf

Together, these two papers by Henrik Møller et al completed transformed the world of binaural audio. The first paper described the first major dataset of detailed HRTFs, and how they vary from subject to subject. The second studied localization performance when subjects listened to a soundfield, the same soundfield using binaural recordings with their own HRTFs, and those soundfields using the HRTFs of others. It nailed down the state of the art and the challenges for future research.

The early MPEG audio standards. MPEG 1 unveiled the MP3, followed by the improved MPEG2 AAC. They changed the face of not just audio encoding, but completely revolutionised music consumption and the music industry.

John Chowning was a pioneer and visionary in computer music. This seminal work described FM synthesis, where the timbre of a simple waveform is changed by frequency modulating it with another frequency also in the audio range, resulting in a surprisingly rich control of audio spectra and their evolution in time. In 1971, Chowning also published The simulation of moving sound sources (278 citations), perhaps the first system (and using digital technology) for synthesising an evolving sound scene.

The famous Glasberg and Moore loudness model is perhaps the most widely used auditory model for loudness and masking estimation. Other aspects of it have appeared in other papers (including A model of loudness applicable to time-varying sounds, 487 citations, 2002).

More greatest papers in the next blog entry.