Digging the didgeridoo

The Ig Nobel prizes are tongue-in-cheek awards given every year to celebrate unusual or trivial achievements in science. Named as a play on the Nobel prize and the word ignoble, they are intended to ‘“honor achievements that first make people laugh, and then make them think.” Previously, when discussing graphene-based headphones graphene-based headphones, I mentioned Andre Geim, the only scientist to have won both a Nobel and Ig Nobel prize.

I only recently noticed that the 2017 Ig Nobel Peace Prize went to an international team that demonstrated that playing a didgeridoo is an effective treatment for obstructive sleep apnoea and snoring. Here’s a photo of one of the authors of the study playing the didge at the award ceremony.

59bd25dffc7e9387108b4567

My own nominees for Ig Nobel prizes, from audio-related research published this past year, would included ‘Influence of Audience Noises on the Classical Music Perception on the Example of Anti-cough Candies Unwrapping Noise’, which we discussed in our preview of the 143rd Audio Engineering Society Convention, and the ‘The DFA Fader: Exploring the Power of Suggestion in Loudness Judgments’ , for which we had the blog entry ‘What the f*** are DFA faders‘.

But lets return to Digeridoo research. Its a fascinating aboriginal Australian instrument, with a rich history and interesting acoustics, and produces an eerie drone-like sound.

A search on google scholar, once removing patents and citations, shows only 38 research papers with Didgeridoo in the title. That’s great news if you want to be an expert on research in the subject. The work of Neville H. Fletcher over about a thirty year period beginning in the early 1980s is probably the main starting point.

The passive acoustics of the didgeridoo are well understood. Its a long truncated conical horn where the player’s lips at the smaller end form a pressure-controlled valve. Knowing the length and diameters involved, its not to difficult to determine the fundamental frequencies (often around 50-100 Hz) and modes excited, and their strengths, in much the same way as can be done for many woodwind instruments.

But that’s just the passive acoustics. Fletcher pointed out that traditional, solo didgeridoo players don’t pay much attention to the resonant frequencies and they’re mainly important when its played in Western music, and needs to fit with the rest of an ensemble.

Things start getting really interesting when one considers the sounding mechanism. Players make heavy use of circular breathing, breathing in through the nose while breathing out through the mouth, even more so, and more rhythmically, than is typical in performing Western brass instruments like trumpets and tubas. Changes in lip motion and vocal tract shape are then used to control the formants, allowing the manipulation of very rich timbres.

Its these aspects of didgeridoo playing that intrigued the authors of the sleep apnoea study. Like the DFA and cough drop wrapper studies mentioned above, these were serious studies on a seemingly not so serious subject. Circular breathing and training of respiratory muscles may go a long way towards improving nighttime breathing, and hence reducing snoring and sleep disturbances. The study was controlled and randomised. But, its incredibly difficult in these sorts of studies to eliminate or control for all the other variables, and very hard to identify which aspect of the didgeridoo playing was responsible for the better sleep. The authors quite rightly highlighted what I think is one of the biggest question marks in the study;

A limitation is that those in the control group were simply put on a waiting list because a sham intervention for didgeridoo playing would be difficult. A control intervention such as playing a recorder would have been an option, but we would not be able to exclude effects on the upper airways and compliance might be poor.

In that respect, drug trials are somewhat easier to interpret than practice-based intervention. But the effect was abundantly clear and quite strong. One certainly should not dismiss the results because of limitations (the limitations give rise to question marks, but they’re not mistakes) in the study.

 

The cavity tone……

In September 2017, I attended the 20th International Conference on Digital Audio Effects in Edinburgh. At this conference, I presented my work on a real-time physically derived model of a cavity tone. The cavity tone is one of the fundamental aeroacoustic sounds, similar to previously described Aeolian tone. The cavity tone commonly occurs in aircraft when opening bomb bay doors or by the cavities left when the landing gear is extended. Another example of the cavity tone can be seen when swinging a sword with a grooved profile.

The physics of operation is a can be a little complicated. To try and keep it simple, air flows over the cavity and comes into contact with air at a different velocity within the cavity. The movement of air at one speed over air at another cause what’s known as shear layer between the two. The shear layer is unstable and flaps against the trailing edge of the cavity causing a pressure pulse. The pressure pulse travels back upstream to the leading edge and re-enforces the instability. This causes a feedback loop which will occur at set frequencies. Away from the cavity the pressure pulse will be heard as an acoustic tone – the cavity tone!

A diagram of this is shown below:

Like the previously described Aeolian tone, there are equations to derive the frequency of the cavity tone. This is based on the length of the cavity and the airspeed. There are a number of modes of operation, usually ranging from 1 – 4. The acoustic intensity has also been defined which is based on airspeed, position of the listener and geometry of the cavity.

The implementation of an individual mode cavity tone is shown in the figure below. The Reynolds number is a dimensionless measure of the ratio between the inertia and viscous force in the flow and Q relates to the bandwidth of the passband of the bandpass filter.

Comparing our model’s average frequency prediction to published results we found it was 0.3% lower than theoretical frequencies, 2.0% lower than computed frequencies and 6.4% lower than measured frequencies. A copy of the pure data synthesis model can be downloaded here.

 

The final whistle blows

Previously, we discussed screams, applause, bouncing and pouring water. Continuing our examination of every day sounds, we bring you… the whistle.

This one is a little challenging though. To name just a few, there are pea whistles, tin whistles, steam whistles, dog whistles and of course, human whistling. Covering all of this is a lot more than a single blog entry. So lets stick to the standard pea whistle or pellet whistle (or ‘escargot’ or barrel whistle because of its snail-like shape), which is the basis for a lot of the whistles that you’ve heard.

metal pea whistle

 

Typical metal pea whistle, featuring mouthpiece,  bevelled edge and sound hole where air can escape, and barrel-shaped air chamber and a pellet inside.

 

Whistles are the oldest known type of flute. They have a stopped lower end and a flue that directs the player’s breath from the mouth hole at the upper end against the edge of a hole cut in the whistle wall, causing the enclosed air to vibrate. Most whistle instruments have no finger holes and sound only one pitch.

A whistle produces sound from a stream of gas, most commonly air, and typically powered by steam or by someone blowing air. The conversion of energy to sound comes from an interaction between the air stream and a solid material.

In a pea whistle, the air stream enters through the mouthpiece. It hits the bevel (sloped edge for the opening) and splits, outwards into the air and inwards filling the air chamber. It continues to swirl around and fill the chamber until the air pressure inside  is so great that it pops out of the sound hole (a small opening next to the bevel), making room for the process to start over again. The dominant pitch of the whistle is determined by the rate at which air packs and unpacks the air chamber. The movement of air forces the pea or pellet inside the chamber to move around and around. This sometimes interrupts the flow of air and creates a warble to the whistle sound.

The size of the whistle cavity determines the volume of air contained in the whistle and the pitch of the sound produced. The air fills and empties from the chamber so many times per second, which gives the fundamental frequency of the sound.

The whistle construction and the design of the mouthpiece also have a dramatic effect on sound. A whistle made from a thick metal will produce a brighter sound compared to the more resonant mellow sound if thinner metal is used. Modern whistles are produce using different types of plastic, which increases the tones and sounds now available. The design of the mouthpiece can also dramatically alter the sound. Even a few thousandths of an inch difference in the airway, angle of the blade, size or width of the entry hole, can make a drastic difference as far as volume, tone, and chiff (breathiness or solidness of the sound) are concerned. And according to the whistle Wiki page, which might be changed by the time you read this, ‘One characteristic of a whistle is that it creates a pure, or nearly pure, tone.’

Well, is all of that correct? When we looked at the sounds of pouring hot and cold water we found that the simple explanations were not correct. In explaining the whistle, can we go a bit further than a bit of handwaving about the pea causing a warble? Do the different whistles differ a lot in sound?

Lets start with some whistle sounds. Here’s a great video where you get to hear a dozen referee’s whistles.

Looking at the spectrogram below, you can see that all the whistles produce dominant frequencies somewhere between 2200 and 4400 Hz. Some other features are also apparent. There seems to be some second and even third harmonic content. And it doesn’t seem to be just one frequency and its overtones. Rather, there are two or three closely spaced frequencies whenever the whistle is blown.

Referee Whistles

But this sound sample is all fairly short whistle blows, which could be why the pitches are not constant. And one should never rely on just one sample or one audio file (as the authors did here). So lets look at just one long whistle sound.

joe whistle spec

joe whistle

You can see that it remains fairly constant, and the harmonics are clearly present, though I can’t say if they are partly due to dynamic range compression or any other processing. However, there are semi-periodic dips or disruptions in the fundamental pitch. You can see this more clearly in the waveform, and this is almost certainly due to the pea temporarily blocking the sound hole and weakening the sound.

The same general behaviour appears with other whistles, though with some variation in the dips and their rate of occurrence, and in the frequencies and their strengths.

Once I started writing this blog, I was pointed to the fact that Perry Cook had already discussed synthesizing whistle sounds in his wonderful book Real Sound Synthesis for Interactive Applications. In building up part of a model of a police/referee whistle, he wrote

 ‘Experiments and spectrograms using real police/referee whistles showed that when the pea is in the immediate region of the jet oscillator, there is a decrease in pitch (about 7%), an increase in amplitude (about 6 dB), and a small increase in the noise component (about 2 dB)… The oscillator exhibits three significant harmonics: f, 2f and 3f at 0 dB, -10 dB and -25 dB, respectively…’

With the exception of the increase in amplitude due to the pea (was that a typo?), my results are all in rough agreement with his. So depending on whether I’m a glass half empty / glass half full kind of person, I could either be disappointed that I’m just repeating what he did, or glad that my results are independently confirmed.

This information from a few whistle recordings should be good enough to characterise the behaviour and come up with a simple, controllable synthesis. Jiawei Liu took a different approach. In his Master’s thesis, he simulated whistles using computational fluid dynamics and acoustic finite element simulation. It was very interesting work, as was a related approach by Shia, but they’re both a bit like using a sledgehammer to kill a fly. Massive effort and lots of computation, when a model that probably sounds just as good could have been derived using semi-empirical equations that model aeroacoustic sounds directly, as discussed in our previous blog entries on sound synthesis of an Aeolian Harp, a Propeller. Sword sounds, swinging objects or Aeolian tones.

There’s been some research into automatic identification of referee whistle sounds, for instance, initial work of Shirley and Oldfield in 2011 and then a more advanced algorithm a few years later. But these are either standard machine learning techniques, or based on the most basic aspects of the whistle sound, like its fundamental frequency. In either case, they don’t use much understanding of the nature of the sound. But I suppose that’s fine. They work, they enable intelligent production techniques for sports broadcasts,  and they don’t need to delve into the physical or perceptual aspects.

I said I’d stick to pellet whistles, but I can’t resist mentioning a truly fascinating and unusual synthesis of another whistle sound. Steam locomotives were equipped with train whistles for warning and signalling. to generate the sound, the train driver pulls a cord in the driver’s cabin, thereby opening a valve, so that steam shoots out of an gap and against the sharp edge of a bell. This makes the bell vibrate rapidly, which creates a whistling sound. In 1972, Herbert Chaudiere created an incredibly detailed sound system for model trains. This analogue electronic system  generated all the memorable sounds of the steam locomotive; the bark of exhausting steam, the rhythmic toll of the bell, and the wail of the chime whistle, and reproduced these sounds from a loudspeaker carried in the model locomotive.

The preparation of this blog entry also illustrates some of the problems with crowd sourced metadata and user generated tagging. When trying to find some good sound examples, I searched the whole’s most popular sound effects archive, freesound, for ‘pea whistle’. It came up with only one hit, a recording of steam and liquid escaping from a pot of boiling black-eyed peas!

References:

  • Chaudiere, H. T. (1972). Model Railroad Sound system. Journal of the Audio Engineering Society, 20(8), 650-655.
  • Liu, J. (2012). Simulation of whistle noise using computational fluid dynamics and acoustic finite element simulation, MSc Thesis, U. Kentucky.
  • Shia, Y., Da Silvab, A., & Scavonea (2014), G. Numerical Simulation of Whistles Using Lattice Boltzmann Methods, ISMA, Le Mans, France
  • Cook, P. R. (2002). Real sound synthesis for interactive applications. CRC Press.
  • Oldfield, R. G., & Shirley, B. G. (2011, May). Automatic mixing and tracking of on-pitch football action for television broadcasts. In Audio Engineering Society Convention 130
  • Oldfield, R., Shirley, B., & Satongar, D. (2015, October). Application of object-based audio for automated mixing of live football broadcast. In Audio Engineering Society Convention 139.

The future of microphone technology

We recently had a blog entry about the Future of Headphones. Today, we’ll look at another ubiquitous piece of audio equipment, the microphone, and what technological revolutions are on the horizon.

Its not a new technology, but the Eigenmike is deserving of attention. First released around 2010 by mh acoustics (their website and other searches don’t reveal much historical information), the Eigenmike is a microphone array composed of 32 high quality microphones positioned on the surface of a rigid sphere. Outputs of the individual microphones are combined to capture the soundfield. By beamforming, the soundfield can be steered and aimed in a desired direction.

fig-eigenmike-300x284The Eigenmike

This and related technologies (Core Sound’s TetraMic, Soundfield’s MKV, Sennheiser’s Ambeo …) are revolutionising high-end soundfield recording. Enda Bates has a nice blog entry about them, and they were formally evaluated in two AES papers, Comparing Ambisonic Microphones Part 1 and Part 2.

Soundskrit is TandemLaunch’s youngest incubated venture, based on research by Ron Miles and colleagues from the University of Binghampton. Tandem Launch, by the way, create companies often arising from academic research, and previously invested in research arising from the audio engineering research team behind this blog.

Jian Zhou and Ron Miles were inspired by the manner in which insects ‘hear’ with their hairs. They devised a method to record audio by sensing changes in airflow velocity rather than pressure. Spider silk is thin enough that it moves with the air when hit by sound waves, even for infrasound frequencies. To translate this movement into an electronic signal, they coated the spider silk with gold and put it in a magnetic field. Almost any fiber that is thin enough could be used in the same way, and different approaches could be applied for transduction. This new approach is intrinsically directional and may have a frequency response far superior to competing directional solutions.

MEMS (MicroElectrical-Mechanical System) microphones usually involve a pressure-sensitive diaphragm etched directly into a silicon wafer. The Soundskrit team is currently focused on developing a MEMs compatible design so that it could be used in a wide variety of devices and applications where directional recording is needed.

Another start-up aiming to revolutionise MEMS technology is Vesper .  Vesper MEMS was developed by founders Bobby Littrell and Karl Grosh at the University of Michigan. It uses piezoelectric materials which produce a voltage when subjected to pressure. This approach can achieve a superior signal-to-noise ratio over the capacitive MEMS microphones that currently dominate the market.

A few years ago, graphene-based microphones were receiving a lot of attention, In 2014, Dejan Todorovic and colleagues investigated the feasibility of graphene as a microphone membrane, and simulations suggested that it could have high sensitivity (the voltage generated in response to a pressure input) over a wide frequency range, far better than conventional microphones. Later that year, Peter Gaskell and others from McGill University performed physical and acoustical measurements of graphene oxide which confirmed Todorovic’s simulation results. But they seemed unaware of Todorovic’s work, despite both groups publishing at AES Conventions.

Gaskell and colleagues went on to commercialise graphene-based loudspeakers, as we discussed previously. But the Todorovic team continued research on graphene  microphones, apparently to great success.

But I haven’t yet found out about any further developments from this group. However, researchers from Kyungpook National University in Korea just recently reported a high sensitivity hearing aid microphone that uses a graphene-based diaphragm.

 

For a bit of fun, check out Catchbox, which bills itself as the ‘the World’s First Soft Throwable Microphone.’ Its not exactly a technological revolution, though their patent pending Automute relates a bit to the field of Automatic Mixing. But I can think of a few meetings that would have been livened up by having this around.

As previously when I’ve discussed commercial technologies, a disclaimer is needed. This blog is not meant as an endorsement of any of the mentioned companies. I haven’t tried their products. They are a sample of what is going on at the frontiers of microphone technology, but by no means cover the full range of exciting developments. In fact, since many of the technological advances are concerned with microphone array processing (source separation, localisation, beam forming and so on) as in some of our own contributions, this blog entry is really only giving you a taste of one exciting direction of research. But these technologies will surely change the way we capture sound in the near future.

Some of our own contributions to microphone technology, mainly on the signal processing and evaluation side of things, are listed below;

  1. L. Wang, J. D. Reiss and A. Cavallaro, ‘Over-Determined Source Separation and Localization Using Distributed Microphones,’ IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 24 (9), 2016.
  2. L. Wang, T. K. Hon, J. D. Reiss and A. Cavallaro, ‘An Iterative Approach to Source Counting and Localization Using Two Distant Microphones,’ IEEE/ACM Transactions on Audio, Speech and Language Processing, 24 (6), June 2016.
  3. L. Wang, T. K. Hon, J. D. Reiss and A. Cavallaro, ‘Self-Localization of Ad-hoc Arrays Using Time Difference of Arrivals,’ IEEE Transactions on Signal Processing, 64 (4), Feb., 2016.
  4. T. K. Hon, L. Wang, J. D. Reiss and A. Cavallaro, ‘Audio Fingerprinting for Multi-Device Self-Localisation,’ IEEE Transactions on Audio, Speech and Language Processing, 23 (10), p. 1623-1636, 2015.
  5. E. K. Kokkinis, J. D. Reiss and J. Mourjopoulos, “A Wiener Filter Approach to Microphone Leakage Reduction in Close-Microphone Applications,” IEEE Transactions on Audio, Speech, and Language Processing, V.20 (3), p.767-79, 2012.
  6. T-K. Hon, L. Wang, J. D. Reiss and A. Cavallaro, ‘Fine landmark-based synchronization of ad-hoc microphone arrays,’ 23rd European Signal Processing Conference (EUSIPCO), p. 1341-1345, Nice, France, 2015.
  7. B. De Man and J. D. Reiss, “A Pairwise and Multiple Stimuli Approach to Perceptual Evaluation of Microphone Types,” 134th AES Convention, Rome, May, 2013.
  8. A. Clifford and J. D. Reiss, Proximity effect detection for directional microphones , 131st AES Convention, New York, p. 1-7, Oct. 20-23, 2011
  9. A. Clifford and J. D. Reiss, Microphone Interference Reduction in Live Sound, Proc. of the 14th Int. Conference on Digital Audio Effects (DAFx-11), Paris, p. 2-9, Sept 19-23, 2011
  10. E. Kokkinis, J. D. Reiss and J. Mourjopoulos, Detection of ‘solo intervals’ in multiple microphone multiple source audio applications, AES 130th Convention, May 2011.
  11. C. Uhle and J. D. Reiss, “Determined Source Separation for Microphone Recordings Using IIR Filters,” 129th AES Convention, San Francisco, Nov. 4-7, 2010.