Sampling the sampling theorem: a little knowledge is a dangerous thing

In 2016, I published a paper on perception of differences between standard resolution audio (typically 16 bit, 44.1 kHz) and high resolution audio formats (like 24 bit, 96 kHz). It was a meta-analysis, looking at all previous studies, and showed strong evidence that this difference can be perceived. It also did not find evidence that this difference was due to high bit depth, distortions in the equipment, or golden ears of some participants.

The paper generated a lot of discussion, some good and some bad. One argument presented many times as to why its overall conclusion must be wrong (its implied here, here and here, for instance) basically goes like this;

We can’t hear above 20 kHz. The sampling theorem says that we need to sample at twice the bandwidth to fully recover the signal. So a bit beyond 40 kHz should be fully sufficient to render audio with no perceptible difference from the original signal.

But one should be very careful when making claims regarding the sampling theorem. It states that all information in a bandlimited signal is completely represented by sampling at twice the bandwidth (the Nyquist rate). It further implies that the continuous time bandlimited signal can be perfectly reconstructed by this sampled signal.

For that to mean that there is no audible difference between 44.1 kHz (or 48 kHz) sampling and much higher sample rate formats (leaving aside reproduction equipment), there are a few important assumptions;

  1. Perfect brickwall filter to bandlimit the signal
  2. Perfect reconstruction filter to recover the bandlimited signal
  3. No audible difference whatsoever between the original full bandwidth signal and the bandlimited 48 kHz signal.

The first two are generally not true in practice, especially with lower sample rates. Though we can get very good performance by oversampling in the analog to digital and digital to analog converters, but they are not perfect. There may still be some minute pass-band ripple or some very low amplitude signal outside the pass-band, resulting in aliasing. But many modern high quality A/D and D/A converters and some sample rate converters are high performance, so their impact may be small.

But the third assumption is an open question and could make a big difference. The problem arises from another very important theorem, the uncertainty principle. Though first derived by Heisenberg for quantum mechanics, Gabor showed that it exists as a purely mathematical concept. The more localised a signal is in frequency, the less localised it is in time. For instance, a pure impulse (localised in time) has content over all frequencies. Bandlimiting this impulse spreads the signal in time.

For instance, consider filtering an impulse to retain only frequency content below 20 kHz. We will use the matlab function IFIR (Interpolated FIR filter), which is a high performance design. We aim for low passband ripple (<0.01 dB) up to 20 kHz and 120 dB stopband attenuation starting at 22.05, 24, or 48 kHz, corresponding to 44.1 kHz, 48 kHz or 96 kHz sample rates. You can see excellent behaviour in the magnitude response below.

mag response

The impulse response also looks good, but now the original impulse has become smeared in time. This is an inevitable consequence of the uncertainty principle.

impulse response

Still, on the surface this may not be so problematic. But we perceive loudness on a logarithmic scale. So have a look at this impulse response on a decibel scale.

impulse response db

The 44.1 and 48 kHz filters spread energy over 1 msec or more, but the 96 kHz filter keeps most energy within 100 microseconds. And this is a particularly good filter, without considering quantization effects or the additional reconstruction (anti-imaging) filter required for analog output. Note also that all of this frequency content has already been bandlimited, so its almost entirely below 20 kHz.

One millisecond still isn’t very much. However, this lack of high frequency content has affected the temporal fine structure of the signal, and we know a lot less about how we perceive temporal information than how we perceive frequency content. This is where psychoacoustic studies in the field of auditory neuroscience come into play. They’ve approached temporal resolution from very different perspectives. Abel found that we can distinguish temporal gaps in sound of only 0.4 ms, and Wiegrebe’s study suggested a resolution of 0.72 ms. Studies by Wiegrebe (same paper), Lotze and Aiba all suggested that we can distinguish between a single click and a closely spaced pair of clicks when the gap between the pair of clicks is below one millisecond. And a study by Henning suggested that we can distinguish the ordering of a high amplitude and low amplitude click when the spacing between them is only about one fifth of a millisecond.

All of these studies should be taken with a grain of salt. Some are quite old, and its possible there may have been issues with the audio set-up. Furthermore, they aren’t directly testing the audibility of anti-alias filters. But its clear that they indicate that the time domain spread of energy in transient sounds due to filtering might be audible.

Big questions still remain. In the ideal scenario, the only thing missing after bandlimiting a signal is the high frequency content, which we shouldn’t be able to hear. So what really is going on?

By the way, I recommend reading Shannon’s original papers on the sampling theorem and other subjects. They’re very good and a joy to read. Shannon was a fascinating character. I read his Collected Papers, and off the top of my head, it included inventing the rocket powered Frisbee, the gasoline powered pogo stick, a calculator that worked using roman numerals (wonderfully named THROBAC, for Thrifty Roman numerical BACkward looking computer), and discovering the fundamental equation of juggling. He also built a robot mouse to compete against real mice, inspired by classic psychology experiments where a mouse was made to find its way out of a maze.

Nyquist’s papers aren’t so easy though, and feel a bit dated.

  • S. M. Abel, “Discrimination of temporal gaps,” Journal of the Acoustical Society of America, vol. 52, 1972.
  • E. Aiba, M. Tsuzaki, S. Tanaka, and M. Unoki, “Judgment of perceptual synchrony between two pulses and verification of its relation to cochlear delay by an auditory model,” Japanese Psychological Research, vol. 50, 2008.
  • Gabor, D (1946). Theory of communication. Journal of the Institute of Electrical Engineering 93, 429–457
  • G. B. Henning and H. Gaskell, “Monaural phase sensitivity with Ronken’s paradigm,” Journal of the Acoustical Society of America, vol. 70, 1981.
  • M. Lotze, M. Wittmann, N. von Steinbüchel, E. Pöppel, and T. Roenneberg, “Daily rhythm of temporal resolution in the auditory system,” Cortex, vol. 35, 1999.
  • Nyquist, H. (April 1928). “Certain topics in telegraph transmission theory“. Trans. AIEE. 47: 617–644.
  • J. D. Reiss, ‘A meta-analysis of high resolution audio perceptual evaluation,’ Journal of the Audio Engineering Society, vol. 64 (6), June 2016.
  • Shannon, Claude E. (January 1949). “Communication in the presence of noise“. Proceedings of the Institute of Radio Engineers. 37 (1): 10–21
  • L. Wiegrebe and K. Krumbholz, “Temporal resolution and temporal masking properties of transient stimuli: Data and an auditory model,” J. Acoust. Soc. Am., vol. 105, pp. 2746-2756, 1999.
Advertisements

Digging the didgeridoo

The Ig Nobel prizes are tongue-in-cheek awards given every year to celebrate unusual or trivial achievements in science. Named as a play on the Nobel prize and the word ignoble, they are intended to ‘“honor achievements that first make people laugh, and then make them think.” Previously, when discussing graphene-based headphones graphene-based headphones, I mentioned Andre Geim, the only scientist to have won both a Nobel and Ig Nobel prize.

I only recently noticed that the 2017 Ig Nobel Peace Prize went to an international team that demonstrated that playing a didgeridoo is an effective treatment for obstructive sleep apnoea and snoring. Here’s a photo of one of the authors of the study playing the didge at the award ceremony.

59bd25dffc7e9387108b4567

My own nominees for Ig Nobel prizes, from audio-related research published this past year, would included ‘Influence of Audience Noises on the Classical Music Perception on the Example of Anti-cough Candies Unwrapping Noise’, which we discussed in our preview of the 143rd Audio Engineering Society Convention, and the ‘The DFA Fader: Exploring the Power of Suggestion in Loudness Judgments’ , for which we had the blog entry ‘What the f*** are DFA faders‘.

But lets return to Digeridoo research. Its a fascinating aboriginal Australian instrument, with a rich history and interesting acoustics, and produces an eerie drone-like sound.

A search on google scholar, once removing patents and citations, shows only 38 research papers with Didgeridoo in the title. That’s great news if you want to be an expert on research in the subject. The work of Neville H. Fletcher over about a thirty year period beginning in the early 1980s is probably the main starting point.

The passive acoustics of the didgeridoo are well understood. Its a long truncated conical horn where the player’s lips at the smaller end form a pressure-controlled valve. Knowing the length and diameters involved, its not to difficult to determine the fundamental frequencies (often around 50-100 Hz) and modes excited, and their strengths, in much the same way as can be done for many woodwind instruments.

But that’s just the passive acoustics. Fletcher pointed out that traditional, solo didgeridoo players don’t pay much attention to the resonant frequencies and they’re mainly important when its played in Western music, and needs to fit with the rest of an ensemble.

Things start getting really interesting when one considers the sounding mechanism. Players make heavy use of circular breathing, breathing in through the nose while breathing out through the mouth, even more so, and more rhythmically, than is typical in performing Western brass instruments like trumpets and tubas. Changes in lip motion and vocal tract shape are then used to control the formants, allowing the manipulation of very rich timbres.

Its these aspects of didgeridoo playing that intrigued the authors of the sleep apnoea study. Like the DFA and cough drop wrapper studies mentioned above, these were serious studies on a seemingly not so serious subject. Circular breathing and training of respiratory muscles may go a long way towards improving nighttime breathing, and hence reducing snoring and sleep disturbances. The study was controlled and randomised. But, its incredibly difficult in these sorts of studies to eliminate or control for all the other variables, and very hard to identify which aspect of the didgeridoo playing was responsible for the better sleep. The authors quite rightly highlighted what I think is one of the biggest question marks in the study;

A limitation is that those in the control group were simply put on a waiting list because a sham intervention for didgeridoo playing would be difficult. A control intervention such as playing a recorder would have been an option, but we would not be able to exclude effects on the upper airways and compliance might be poor.

In that respect, drug trials are somewhat easier to interpret than practice-based intervention. But the effect was abundantly clear and quite strong. One certainly should not dismiss the results because of limitations (the limitations give rise to question marks, but they’re not mistakes) in the study.

 

The future of microphone technology

We recently had a blog entry about the Future of Headphones. Today, we’ll look at another ubiquitous piece of audio equipment, the microphone, and what technological revolutions are on the horizon.

Its not a new technology, but the Eigenmike is deserving of attention. First released around 2010 by mh acoustics (their website and other searches don’t reveal much historical information), the Eigenmike is a microphone array composed of 32 high quality microphones positioned on the surface of a rigid sphere. Outputs of the individual microphones are combined to capture the soundfield. By beamforming, the soundfield can be steered and aimed in a desired direction.

fig-eigenmike-300x284The Eigenmike

This and related technologies (Core Sound’s TetraMic, Soundfield’s MKV, Sennheiser’s Ambeo …) are revolutionising high-end soundfield recording. Enda Bates has a nice blog entry about them, and they were formally compared in two AES papers.

This and related technologies (Core Sound’s TetraMic, Soundfield’s MKV, Sennheiser’s Ambeo …) are revolutionising high-end soundfield recording. Enda Bates has a nice blog entry about them, and they were formally evaluated in two AES papers, Comparing Ambisonic Microphones Part 1 and Part 2.

Soundskrit is TandemLaunch’s youngest incubated venture, based on research by Ron Miles and colleagues from the University of Binghampton. Tandem Launch, by the way, create companies often arising from academic research, and previously invested in research arising from the audio engineering research team behind this blog.

Jian Zhou and Ron Miles were inspired by the manner in which insects ‘hear’ with their hairs. They devised a method to record audio by sensing changes in airflow velocity rather than pressure. Spider silk is thin enough that it moves with the air when hit by sound waves, even for infrasound frequencies. To translate this movement into an electronic signal, they coated the spider silk with gold and put it in a magnetic field. Almost any fiber that is thin enough could be used in the same way, and different approaches could be applied for transduction. This new approach is intrinsically directional and may have a frequency response far superior to competing directional solutions.

MEMS (MicroElectrical-Mechanical System) microphones usually involve a pressure-sensitive diaphragm etched directly into a silicon wafer. The Soundskrit team is currently focused on developing a MEMs compatible design so that it could be used in a wide variety of devices and applications where directional recording is needed.

Another start-up aiming to revolutionise MEMS technology is Vesper .  Vesper MEMS was developed by founders Bobby Littrell and Karl Grosh at the University of Michigan. It uses piezoelectric materials which produce a voltage when subjected to pressure. This approach can achieve a superior signal-to-noise ratio over the capacitive MEMS microphones that currently dominate the market.

A few years ago, graphene-based microphones were receiving a lot of attention, In 2014, Dejan Todorovic and colleagues investigated the feasibility of graphene as a microphone membrane, and simulations suggested that it could have high sensitivity (the voltage generated in response to a pressure input) over a wide frequency range, far better than conventional microphones. Later that year, Peter Gaskell and others from McGill University performed physical and acoustical measurements of graphene oxide which confirmed Todorovic’s simulation results. But they seemed unaware of Todorovic’s work, despite both groups publishing at AES Conventions.

Gaskell and colleagues went on to commercialise graphene-based loudspeakers, as we discussed previously. But the Todorovic team continued research on graphene  microphones, apparently to great success.

But I haven’t yet found out about any further developments from this group. However, researchers from Kyungpook National University in Korea just recently reported a high sensitivity hearing aid microphone that uses a graphene-based diaphragm.

 

For a bit of fun, check out Catchbox, which bills itself as the ‘the World’s First Soft Throwable Microphone.’ Its not exactly a technological revolution, though their patent pending Automute relates a bit to the field of Automatic Mixing. But I can think of a few meetings that would have been livened up by having this around.

As previously when I’ve discussed commercial technologies, a disclaimer is needed. This blog is not meant as an endorsement of any of the mentioned companies. I haven’t tried their products. They are a sample of what is going on at the frontiers of microphone technology, but by no means cover the full range of exciting developments. In fact, since many of the technological advances are concerned with microphone array processing (source separation, localisation, beam forming and so on) as in some of our own contributions, this blog entry is really only giving you a taste of one exciting direction of research. But these technologies will surely change the way we capture sound in the near future.

Some of our own contributions to microphone technology, mainly on the signal processing and evaluation side of things, are listed below;

  1. L. Wang, J. D. Reiss and A. Cavallaro, ‘Over-Determined Source Separation and Localization Using Distributed Microphones,’ IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 24 (9), 2016.
  2. L. Wang, T. K. Hon, J. D. Reiss and A. Cavallaro, ‘An Iterative Approach to Source Counting and Localization Using Two Distant Microphones,’ IEEE/ACM Transactions on Audio, Speech and Language Processing, 24 (6), June 2016.
  3. L. Wang, T. K. Hon, J. D. Reiss and A. Cavallaro, ‘Self-Localization of Ad-hoc Arrays Using Time Difference of Arrivals,’ IEEE Transactions on Signal Processing, 64 (4), Feb., 2016.
  4. T. K. Hon, L. Wang, J. D. Reiss and A. Cavallaro, ‘Audio Fingerprinting for Multi-Device Self-Localisation,’ IEEE Transactions on Audio, Speech and Language Processing, 23 (10), p. 1623-1636, 2015.
  5. E. K. Kokkinis, J. D. Reiss and J. Mourjopoulos, “A Wiener Filter Approach to Microphone Leakage Reduction in Close-Microphone Applications,” IEEE Transactions on Audio, Speech, and Language Processing, V.20 (3), p.767-79, 2012.
  6. T-K. Hon, L. Wang, J. D. Reiss and A. Cavallaro, ‘Fine landmark-based synchronization of ad-hoc microphone arrays,’ 23rd European Signal Processing Conference (EUSIPCO), p. 1341-1345, Nice, France, 2015.
  7. B. De Man and J. D. Reiss, “A Pairwise and Multiple Stimuli Approach to Perceptual Evaluation of Microphone Types,” 134th AES Convention, Rome, May, 2013.
  8. A. Clifford and J. D. Reiss, Proximity effect detection for directional microphones , 131st AES Convention, New York, p. 1-7, Oct. 20-23, 2011
  9. A. Clifford and J. D. Reiss, Microphone Interference Reduction in Live Sound, Proc. of the 14th Int. Conference on Digital Audio Effects (DAFx-11), Paris, p. 2-9, Sept 19-23, 2011
  10. E. Kokkinis, J. D. Reiss and J. Mourjopoulos, Detection of ‘solo intervals’ in multiple microphone multiple source audio applications, AES 130th Convention, May 2011.
  11. C. Uhle and J. D. Reiss, “Determined Source Separation for Microphone Recordings Using IIR Filters,” 129th AES Convention, San Francisco, Nov. 4-7, 2010.

 

Audio Research Year in Review- Part 2, the Headlines

Last week featured the first part of our ‘Audio research year in review.’ It focused on our own achievements. This week is the second, concluding part, with a few news stories related to the topics of this blog (music production, psychoacoustics, sound synthesis and everything in between) for each month of the year.

Browsing through the list, some interesting things pop up. Several news stories related to speech intelligibility in broadcast TV, which has been a recurring story the last few years. Effects of noise pollution on wildlife is also a theme in this year’s audio research headlines. And quite a few of the psychological studies are telling us what we already know. The fact that musicians (who are trained in a task that involves quick response to stimuli) have faster reaction times than non-musicians (who may not be trained in such a task) is not a surprise. Nor is the fact that if you hear the cork popping from a wine bottle, you may think it tastes better, although that’s a wonderful example of the placebo effect. But studies that end up confirming assumptions are still worth doing.

January

February

March

April

May

string wine glass

June

July

August

September

October

November

December

Sound Talking at the Science Museum featured assorted speakers on sonic semantics

sound-talking-logo-large

On Friday 3 November, Dr Brecht De Man (Centre for Digital Music, Queen Mary University of London) and Dr Melissa Dickson (Diseases of Modern Life, University of Oxford) organised a one-day workshop at the London Science Museum on the topic of language describing sound, and sound emulating language. We discussed it in a previous blog entry, but now we can wrap up and discuss what happened.

Titled ‘Sound Talking‘, it brought together a diverse lineup of speakers around the common theme of sonic semantics. And with diverse we truly mean that: the programme featured a neuroscientist, a historian, an acoustician, and a Grammy-winning sound engineer, among others.

The event was born from a friendship between two academics who had for a while assumed their work could not be more different, with music technology and history of Victorian literature as their respective fields. When learning their topics were both about sound-related language, they set out to find more researchers from maximally different disciplines and make it a day of engaging talks.

After having Dr Dickson as a resident researcher earlier this year, the Science Museum generously hosted the event, providing a very appropriate and ‘neutral’ central London venue. The venue was further supported by the Diseases of Modern Life project, funded by the European Research Council, and the Centre for Digital Music at Queen Mary University of London.

The programme featured (in order of appearance):

  • Maria Chait, Professor of auditory cognitive neuroscience at UCL, on the auditory system as the brain’s early warning system
  • Jonathan Andrews, Reader in the history of psychiatry at Newcastle University, on the soundscape of the Bethlehem Hospital for Lunatics (‘Bedlam’)
  • Melissa Dickson, postdoctoral researcher in Victorian literature at University of Oxford, on the invention of the stethoscope and the development of an associated vocabulary
  • Mariana Lopez, Lecturer in sound production and post production at University of York, on making film accessible for visually impaired audiences through sound design
  • David M. Howard, Professor of Electronic Engineering at Royal Holloway University of London, on the sound of voice and the voice of sound
  • Brecht De Man, postdoctoral researcher in audio engineering at Queen Mary University of London, on defining the language of music production
  • Mandy Parnell, mastering engineer at Black Saloon Studios, on the various languages of artistic direction
  • Trevor Cox, Professor of acoustic engineering at University of Salford, on categorisation of everyday sounds

In addition to this stellar speaker lineup, Aleks Kolkowski (Recording Angels) exhibited an array of historic sound making objects, including tuning forks, listening tubes, a monochord, and a live recording of a wax cylinder. The workshop took place in a museum, after all, where Dr Kolkowski has held a research associateship, so the display was very fitting.

The full program can be found on the event’s web page. Video proceedings of the event are forthcoming.

My favorite sessions from the 143rd AES Convention

AES_NY

Recently, several researchers from the audio engineering research team here attended the 143rd Audio Engineering Society Convention in New York. Before the Convention, I wrote a blog entry highlighting a lot of the more interesting or adventurous research that was being presented there. As is usually the case at these Conventions, I have so many meetings to attend that I miss out on a lot of highlights, even ones that I flag up beforehand as ‘must see’. Still, I managed to attend some real gems this time, and I’ll discuss a few of them here.

I’m glad that I attended ‘Audio Engineering with Hearing Loss—A Practical Symposium’ . Hearing loss amongst musicians, audiophiles and audio engineers is an important topic that needs more attention. Overexposure, both prolonged and too loud, is a major cause of hearing dage. In addition to all the issues it causes for anybody, for those in the industry, it affects their ability to work or even appreciate their passion. The session had lots of interesting advice.

The most interesting presentation in the session was from Richard Einhorn, a composer and music producer. In 2010, he lost much of his hearing due to a virus. He woke up one day to find that he had completely lost hearing in his right ear, a condition known as Idiopathic Sudden Sensorineural Hearing Loss. This then evolved into hyperacusis, with extreme distortion, excessive volume and speech intelligibility. In many ways, deafness in the right ear would have been preferred. On top of that, his left ear suffered otosclerosis, where everything was at greatly reduced volume. And given that this was his only functioning ear, the risk of surgery to correct it was too great.

Richard has found some wonderful ways to still function, and even continue working in audio and music, with the limited hearing he still has. There’s a wonderful description of them in Hearing Loss Magazine, and they include the use of the ‘Companion Mic,’ which allowed him to hear from many different locations around a busy, noisy environment, like a crowded restaurant.

Thomas Lund presented ‘The Bandwidth of Human Perception and its Implications for Pro Audio.’ I really wasn’t sure about this before the Convention. I had read the abstract, and thought it might be some meandering, somewhat philosophical talk about hearing perception, with plenty of speculation but lacking in substance. I was very glad to be proven wrong! It had aspects of all of that, but in a very positive sense. It was quite rigorous, essentially a systematic review of research in the field that had been published in medical journals. It looks at the question of auditory perceptual bandwidth, where bandwidth is in a general information theoretic and cognitive sense, not specifically frequency range. The research revolves around the fact that, though we receive many megabits of sensory information every second, it seems that we only use dozens of bits per second of information in our higher level perception. This has lots of implications for listening test design, notably on how to deal with aspects like sample duration or training of participants. This was probably the most fascinating technical talk I saw at the Convention.

There were two papers that I had flagged up as having the most interesting titles, ‘Influence of Audience Noises on the Classical Music Perception on the Example of Anti-cough Candies Unwrapping Noise’, and ‘Acoustic Levitation—Standing Wave Demonstration.’ I had an interesting chat with an author of the first one, Adam Pilch. When walking around much later looking for the poster for the second one, I bump into Adam again. Turns out, he was a co-author on both of them! It looks like Adam Pilch and Bartlomiej Chojnacki (the shared authors on those papers) and their co-authors have an appreciation of the joy of doing research for fun and curiousity, and an appreciation for a good paper title.

Leslie Ann Jones was the Heyser lecturer. The Heyser lecture, named after Richard C. Heyser, is an evening talk given by an eminent individual in audio engineering or related fields. Leslie has had a fascinating career, and gave a talk that makes one realise just how much the industry is changing and growing, and how important are the individuals and opportunities that one encounters in a career.

The last session I attended was also one of the best. Chris Pike, who recently became leader of the audio research team at BBC R&D (he has big shoes to fill, but fits them well and is already racing ahead), presented ‘What’s This? Doctor Who with Spatial Audio!’ . I knew this was going to be good because it involved two of my favorite things, but it was much better than that. The audience were all handed headphones so that they could listen to binaural renderings used throughout the presentation. I love props at technical talks! I also expected the talk to focus almost completely on the binaural, 3d sound rendering for a recent episode, but it was so much more than that. There was quite detailed discussion of audio innovation throughout the more than 50 years of Doctor Who, some of which we have discussed when mentioning Daphne Oram and Delia Derbyshire in our blog entry on female pioneers in audio engineering.

There’s a nice short interview with Chris and colleagues Darran Clement (sound mixer) and Catherine Robinson (audio supervisor) about the binaural sound in Doctor Who on BBC R&D’s blog, and here’s a youtube video promoting the binaural sound in the recent episode;

 

The Audiovisual bounce-inducing effect (Bounce, bounce, bounce… Part II)

Last week we talked about bouncing sounds. Its very much a physical phenomenon, but a lot has been made of a perceptual effect sometimes referred to as the ‘Audiovisual bounce-inducing effect.’ The idea is that if someone is presented with two identical objects moving on a screen in opposing direction and crossing paths, they appear to do just that- cross paths. But if a short sound is played at the moment they first intersect, they appear to bounce off each other.

I’ve read a couple of papers on this, and browsed a few more, and I’ve yet to see anything interesting here.

Consider the figures below. On the left are the two paths taken by the two objects, one with short dashes in blue, one with long dashes in red. Since they are identical (usually just circles on a computer screen), it could just as easily be the paths shown on the right.

AVbounceillusion

So which one is perceived? Well, two common occurrences are;

– Two objects, and one of them passes behind the other. This usually doesn’t produce a sound.
– Two objects, and they bounce off each other, producing the sound of a bounce.

If you show the objects without a sound, it perfectly matches the first scenario. It would be highly unlikely to perceive this as a bounce since then we would expect to hear something. On the other hand, if you play a short sound at the moment the two objects interact, even if it doesn’t exactly match a ‘bounce sound’, it is still a noise at the moment of visual contact. And so this is much more likely to be perceived as a bounce (which clearly produces a sound) than as passing by (which doesn’t). Further studies showed that the more ‘bounce-like’ the sound is, the more likely it is to be perceived as a bounce, and its less likely to be perceived as a bounce if similar sounds are also played when the objects do not intersect.

The literature gives all sorts of fanciful explanations for the basic phenomenon. And maybe someone can enlighten me as to why this is interesting. I suppose, if one begins with the assumption that auditory cues (even silence) do not play a role in perception of motion, then this may be surprising. But to me, this just seems to match everyday experience of sight and sound, and is intuitively obvious.

I should also note that in one of the papers on the ‘Audiovisual bounce-inducing effect’ (Watanabe 2001), the authors committed the cardinal sin of including one of the authors as a test subject and performing standard statistical analysis on the results. There are situations when this sort of thing may be acceptable or even appropriate*, but in which case one should be very careful to take that into account in any analysis and interpretation of results.

* In the following two papers, participants rated multrack audio mixes, where one of the mixes had been created by the participant. But this was intentional, to see whether the participant would rate their own mix highly.

And here’s just a few references on the audiovisual bounce inducing effect.

Grassi M, Casco C. Audiovisual bounce-inducing effect: When sound congruence affects grouping in vision. Attention, Perception, & Psychophysics. 2010 Feb 1;72(2):378-86.

Remijn GB, Ito H, Nakajima Y. Audiovisual integration: An investigation of the ‘streaming-bouncing’phenomenon. Journal of physiological anthropology and applied human science. 2004;23(6):243-7.

Watanabe K, Shimojo S. When sound affects vision: effects of auditory grouping on visual motion perception. Psychological Science. 2001 Mar;12(2):109-16.

Zeljko M, Grove PM. Sensitivity and Bias in the Resolution of Stream-Bounce Stimuli. Perception. 2017 Feb;46(2):178-204.