Sampling the sampling theorem: a little knowledge is a dangerous thing

In 2016, I published a paper on perception of differences between standard resolution audio (typically 16 bit, 44.1 kHz) and high resolution audio formats (like 24 bit, 96 kHz). It was a meta-analysis, looking at all previous studies, and showed strong evidence that this difference can be perceived. It also did not find evidence that this difference was due to high bit depth, distortions in the equipment, or golden ears of some participants.

The paper generated a lot of discussion, some good and some bad. One argument presented many times as to why its overall conclusion must be wrong (its implied here, here and here, for instance) basically goes like this;

We can’t hear above 20 kHz. The sampling theorem says that we need to sample at twice the bandwidth to fully recover the signal. So a bit beyond 40 kHz should be fully sufficient to render audio with no perceptible difference from the original signal.

But one should be very careful when making claims regarding the sampling theorem. It states that all information in a bandlimited signal is completely represented by sampling at twice the bandwidth (the Nyquist rate). It further implies that the continuous time bandlimited signal can be perfectly reconstructed by this sampled signal.

For that to mean that there is no audible difference between 44.1 kHz (or 48 kHz) sampling and much higher sample rate formats (leaving aside reproduction equipment), there are a few important assumptions;

  1. Perfect brickwall filter to bandlimit the signal
  2. Perfect reconstruction filter to recover the bandlimited signal
  3. No audible difference whatsoever between the original full bandwidth signal and the bandlimited 48 kHz signal.

The first two are generally not true in practice, especially with lower sample rates. Though we can get very good performance by oversampling in the analog to digital and digital to analog converters, but they are not perfect. There may still be some minute pass-band ripple or some very low amplitude signal outside the pass-band, resulting in aliasing. But many modern high quality A/D and D/A converters and some sample rate converters are high performance, so their impact may be small.

But the third assumption is an open question and could make a big difference. The problem arises from another very important theorem, the uncertainty principle. Though first derived by Heisenberg for quantum mechanics, Gabor showed that it exists as a purely mathematical concept. The more localised a signal is in frequency, the less localised it is in time. For instance, a pure impulse (localised in time) has content over all frequencies. Bandlimiting this impulse spreads the signal in time.

For instance, consider filtering an impulse to retain only frequency content below 20 kHz. We will use the matlab function IFIR (Interpolated FIR filter), which is a high performance design. We aim for low passband ripple (<0.01 dB) up to 20 kHz and 120 dB stopband attenuation starting at 22.05, 24, or 48 kHz, corresponding to 44.1 kHz, 48 kHz or 96 kHz sample rates. You can see excellent behaviour in the magnitude response below.

mag response

The impulse response also looks good, but now the original impulse has become smeared in time. This is an inevitable consequence of the uncertainty principle.

impulse response

Still, on the surface this may not be so problematic. But we perceive loudness on a logarithmic scale. So have a look at this impulse response on a decibel scale.

impulse response db

The 44.1 and 48 kHz filters spread energy over 1 msec or more, but the 96 kHz filter keeps most energy within 100 microseconds. And this is a particularly good filter, without considering quantization effects or the additional reconstruction (anti-imaging) filter required for analog output. Note also that all of this frequency content has already been bandlimited, so its almost entirely below 20 kHz.

One millisecond still isn’t very much. However, this lack of high frequency content has affected the temporal fine structure of the signal, and we know a lot less about how we perceive temporal information than how we perceive frequency content. This is where psychoacoustic studies in the field of auditory neuroscience come into play. They’ve approached temporal resolution from very different perspectives. Abel found that we can distinguish temporal gaps in sound of only 0.4 ms, and Wiegrebe’s study suggested a resolution of 0.72 ms. Studies by Wiegrebe (same paper), Lotze and Aiba all suggested that we can distinguish between a single click and a closely spaced pair of clicks when the gap between the pair of clicks is below one millisecond. And a study by Henning suggested that we can distinguish the ordering of a high amplitude and low amplitude click when the spacing between them is only about one fifth of a millisecond.

All of these studies should be taken with a grain of salt. Some are quite old, and its possible there may have been issues with the audio set-up. Furthermore, they aren’t directly testing the audibility of anti-alias filters. But its clear that they indicate that the time domain spread of energy in transient sounds due to filtering might be audible.

Big questions still remain. In the ideal scenario, the only thing missing after bandlimiting a signal is the high frequency content, which we shouldn’t be able to hear. So what really is going on?

By the way, I recommend reading Shannon’s original papers on the sampling theorem and other subjects. They’re very good and a joy to read. Shannon was a fascinating character. I read his Collected Papers, and off the top of my head, it included inventing the rocket powered Frisbee, the gasoline powered pogo stick, a calculator that worked using roman numerals (wonderfully named THROBAC, for Thrifty Roman numerical BACkward looking computer), and discovering the fundamental equation of juggling. He also built a robot mouse to compete against real mice, inspired by classic psychology experiments where a mouse was made to find its way out of a maze.

Nyquist’s papers aren’t so easy though, and feel a bit dated.

  • S. M. Abel, “Discrimination of temporal gaps,” Journal of the Acoustical Society of America, vol. 52, 1972.
  • E. Aiba, M. Tsuzaki, S. Tanaka, and M. Unoki, “Judgment of perceptual synchrony between two pulses and verification of its relation to cochlear delay by an auditory model,” Japanese Psychological Research, vol. 50, 2008.
  • Gabor, D (1946). Theory of communication. Journal of the Institute of Electrical Engineering 93, 429–457
  • G. B. Henning and H. Gaskell, “Monaural phase sensitivity with Ronken’s paradigm,” Journal of the Acoustical Society of America, vol. 70, 1981.
  • M. Lotze, M. Wittmann, N. von Steinbüchel, E. Pöppel, and T. Roenneberg, “Daily rhythm of temporal resolution in the auditory system,” Cortex, vol. 35, 1999.
  • Nyquist, H. (April 1928). “Certain topics in telegraph transmission theory“. Trans. AIEE. 47: 617–644.
  • J. D. Reiss, ‘A meta-analysis of high resolution audio perceptual evaluation,’ Journal of the Audio Engineering Society, vol. 64 (6), June 2016.
  • Shannon, Claude E. (January 1949). “Communication in the presence of noise“. Proceedings of the Institute of Radio Engineers. 37 (1): 10–21
  • L. Wiegrebe and K. Krumbholz, “Temporal resolution and temporal masking properties of transient stimuli: Data and an auditory model,” J. Acoust. Soc. Am., vol. 105, pp. 2746-2756, 1999.
Advertisements

Digging the didgeridoo

The Ig Nobel prizes are tongue-in-cheek awards given every year to celebrate unusual or trivial achievements in science. Named as a play on the Nobel prize and the word ignoble, they are intended to ‘“honor achievements that first make people laugh, and then make them think.” Previously, when discussing graphene-based headphones graphene-based headphones, I mentioned Andre Geim, the only scientist to have won both a Nobel and Ig Nobel prize.

I only recently noticed that the 2017 Ig Nobel Peace Prize went to an international team that demonstrated that playing a didgeridoo is an effective treatment for obstructive sleep apnoea and snoring. Here’s a photo of one of the authors of the study playing the didge at the award ceremony.

59bd25dffc7e9387108b4567

My own nominees for Ig Nobel prizes, from audio-related research published this past year, would included ‘Influence of Audience Noises on the Classical Music Perception on the Example of Anti-cough Candies Unwrapping Noise’, which we discussed in our preview of the 143rd Audio Engineering Society Convention, and the ‘The DFA Fader: Exploring the Power of Suggestion in Loudness Judgments’ , for which we had the blog entry ‘What the f*** are DFA faders‘.

But lets return to Digeridoo research. Its a fascinating aboriginal Australian instrument, with a rich history and interesting acoustics, and produces an eerie drone-like sound.

A search on google scholar, once removing patents and citations, shows only 38 research papers with Didgeridoo in the title. That’s great news if you want to be an expert on research in the subject. The work of Neville H. Fletcher over about a thirty year period beginning in the early 1980s is probably the main starting point.

The passive acoustics of the didgeridoo are well understood. Its a long truncated conical horn where the player’s lips at the smaller end form a pressure-controlled valve. Knowing the length and diameters involved, its not to difficult to determine the fundamental frequencies (often around 50-100 Hz) and modes excited, and their strengths, in much the same way as can be done for many woodwind instruments.

But that’s just the passive acoustics. Fletcher pointed out that traditional, solo didgeridoo players don’t pay much attention to the resonant frequencies and they’re mainly important when its played in Western music, and needs to fit with the rest of an ensemble.

Things start getting really interesting when one considers the sounding mechanism. Players make heavy use of circular breathing, breathing in through the nose while breathing out through the mouth, even more so, and more rhythmically, than is typical in performing Western brass instruments like trumpets and tubas. Changes in lip motion and vocal tract shape are then used to control the formants, allowing the manipulation of very rich timbres.

Its these aspects of didgeridoo playing that intrigued the authors of the sleep apnoea study. Like the DFA and cough drop wrapper studies mentioned above, these were serious studies on a seemingly not so serious subject. Circular breathing and training of respiratory muscles may go a long way towards improving nighttime breathing, and hence reducing snoring and sleep disturbances. The study was controlled and randomised. But, its incredibly difficult in these sorts of studies to eliminate or control for all the other variables, and very hard to identify which aspect of the didgeridoo playing was responsible for the better sleep. The authors quite rightly highlighted what I think is one of the biggest question marks in the study;

A limitation is that those in the control group were simply put on a waiting list because a sham intervention for didgeridoo playing would be difficult. A control intervention such as playing a recorder would have been an option, but we would not be able to exclude effects on the upper airways and compliance might be poor.

In that respect, drug trials are somewhat easier to interpret than practice-based intervention. But the effect was abundantly clear and quite strong. One certainly should not dismiss the results because of limitations (the limitations give rise to question marks, but they’re not mistakes) in the study.

 

The final whistle blows

Previously, we discussed screams, applause, bouncing and pouring water. Continuing our examination of every day sounds, we bring you… the whistle.

This one is a little challenging though. To name just a few, there are pea whistles, tin whistles, steam whistles, dog whistles and of course, human whistling. Covering all of this is a lot more than a single blog entry. So lets stick to the standard pea whistle or pellet whistle (or ‘escargot’ or barrel whistle because of its snail-like shape), which is the basis for a lot of the whistles that you’ve heard.

metal pea whistle

 

Typical metal pea whistle, featuring mouthpiece,  bevelled edge and sound hole where air can escape, and barrel-shaped air chamber and a pellet inside.

 

Whistles are the oldest known type of flute. They have a stopped lower end and a flue that directs the player’s breath from the mouth hole at the upper end against the edge of a hole cut in the whistle wall, causing the enclosed air to vibrate. Most whistle instruments have no finger holes and sound only one pitch.

A whistle produces sound from a stream of gas, most commonly air, and typically powered by steam or by someone blowing air. The conversion of energy to sound comes from an interaction between the air stream and a solid material.

In a pea whistle, the air stream enters through the mouthpiece. It hits the bevel (sloped edge for the opening) and splits, outwards into the air and inwards filling the air chamber. It continues to swirl around and fill the chamber until the air pressure inside  is so great that it pops out of the sound hole (a small opening next to the bevel), making room for the process to start over again. The dominant pitch of the whistle is determined by the rate at which air packs and unpacks the air chamber. The movement of air forces the pea or pellet inside the chamber to move around and around. This sometimes interrupts the flow of air and creates a warble to the whistle sound.

The size of the whistle cavity determines the volume of air contained in the whistle and the pitch of the sound produced. The air fills and empties from the chamber so many times per second, which gives the fundamental frequency of the sound.

The whistle construction and the design of the mouthpiece also have a dramatic effect on sound. A whistle made from a thick metal will produce a brighter sound compared to the more resonant mellow sound if thinner metal is used. Modern whistles are produce using different types of plastic, which increases the tones and sounds now available. The design of the mouthpiece can also dramatically alter the sound. Even a few thousandths of an inch difference in the airway, angle of the blade, size or width of the entry hole, can make a drastic difference as far as volume, tone, and chiff (breathiness or solidness of the sound) are concerned. And according to the whistle Wiki page, which might be changed by the time you read this, ‘One characteristic of a whistle is that it creates a pure, or nearly pure, tone.’

Well, is all of that correct? When we looked at the sounds of pouring hot and cold water we found that the simple explanations were not correct. In explaining the whistle, can we go a bit further than a bit of handwaving about the pea causing a warble? Do the different whistles differ a lot in sound?

Lets start with some whistle sounds. Here’s a great video where you get to hear a dozen referee’s whistles.

Looking at the spectrogram below, you can see that all the whistles produce dominant frequencies somewhere between 2200 and 4400 Hz. Some other features are also apparent. There seems to be some second and even third harmonic content. And it doesn’t seem to be just one frequency and its overtones. Rather, there are two or three closely spaced frequencies whenever the whistle is blown.

Referee Whistles

But this sound sample is all fairly short whistle blows, which could be why the pitches are not constant. And one should never rely on just one sample or one audio file (as the authors did here). So lets look at just one long whistle sound.

joe whistle spec

joe whistle

You can see that it remains fairly constant, and the harmonics are clearly present, though I can’t say if they are partly due to dynamic range compression or any other processing. However, there are semi-periodic dips or disruptions in the fundamental pitch. You can see this more clearly in the waveform, and this is almost certainly due to the pea temporarily blocking the sound hole and weakening the sound.

The same general behaviour appears with other whistles, though with some variation in the dips and their rate of occurrence, and in the frequencies and their strengths.

Once I started writing this blog, I was pointed to the fact that Perry Cook had already discussed synthesizing whistle sounds in his wonderful book Real Sound Synthesis for Interactive Applications. In building up part of a model of a police/referee whistle, he wrote

 ‘Experiments and spectrograms using real police/referee whistles showed that when the pea is in the immediate region of the jet oscillator, there is a decrease in pitch (about 7%), an increase in amplitude (about 6 dB), and a small increase in the noise component (about 2 dB)… The oscillator exhibits three significant harmonics: f, 2f and 3f at 0 dB, -10 dB and -25 dB, respectively…’

With the exception of the increase in amplitude due to the pea (was that a typo?), my results are all in rough agreement with his. So depending on whether I’m a glass half empty / glass half full kind of person, I could either be disappointed that I’m just repeating what he did, or glad that my results are independently confirmed.

This information from a few whistle recordings should be good enough to characterise the behaviour and come up with a simple, controllable synthesis. Jiawei Liu took a different approach. In his Master’s thesis, he simulated whistles using computational fluid dynamics and acoustic finite element simulation. It was very interesting work, as was a related approach by Shia, but they’re both a bit like using a sledgehammer to kill a fly. Massive effort and lots of computation, when a model that probably sounds just as good could have been derived using semi-empirical equations that model aeroacoustic sounds directly, as discussed in our previous blog entries on sound synthesis of an Aeolian Harp, a Propeller. Sword sounds, swinging objects or Aeolian tones.

There’s been some research into automatic identification of referee whistle sounds, for instance, initial work of Shirley and Oldfield in 2011 and then a more advanced algorithm a few years later. But these are either standard machine learning techniques, or based on the most basic aspects of the whistle sound, like its fundamental frequency. In either case, they don’t use much understanding of the nature of the sound. But I suppose that’s fine. They work, they enable intelligent production techniques for sports broadcasts,  and they don’t need to delve into the physical or perceptual aspects.

I said I’d stick to pellet whistles, but I can’t resist mentioning a truly fascinating and unusual synthesis of another whistle sound. Steam locomotives were equipped with train whistles for warning and signalling. to generate the sound, the train driver pulls a cord in the driver’s cabin, thereby opening a valve, so that steam shoots out of an gap and against the sharp edge of a bell. This makes the bell vibrate rapidly, which creates a whistling sound. In 1972, Herbert Chaudiere created an incredibly detailed sound system for model trains. This analogue electronic system  generated all the memorable sounds of the steam locomotive; the bark of exhausting steam, the rhythmic toll of the bell, and the wail of the chime whistle, and reproduced these sounds from a loudspeaker carried in the model locomotive.

The preparation of this blog entry also illustrates some of the problems with crowd sourced metadata and user generated tagging. When trying to find some good sound examples, I searched the whole’s most popular sound effects archive, freesound, for ‘pea whistle’. It came up with only one hit, a recording of steam and liquid escaping from a pot of boiling black-eyed peas!

References:

  • Chaudiere, H. T. (1972). Model Railroad Sound system. Journal of the Audio Engineering Society, 20(8), 650-655.
  • Liu, J. (2012). Simulation of whistle noise using computational fluid dynamics and acoustic finite element simulation, MSc Thesis, U. Kentucky.
  • Shia, Y., Da Silvab, A., & Scavonea (2014), G. Numerical Simulation of Whistles Using Lattice Boltzmann Methods, ISMA, Le Mans, France
  • Cook, P. R. (2002). Real sound synthesis for interactive applications. CRC Press.
  • Oldfield, R. G., & Shirley, B. G. (2011, May). Automatic mixing and tracking of on-pitch football action for television broadcasts. In Audio Engineering Society Convention 130
  • Oldfield, R., Shirley, B., & Satongar, D. (2015, October). Application of object-based audio for automated mixing of live football broadcast. In Audio Engineering Society Convention 139.

The future of microphone technology

We recently had a blog entry about the Future of Headphones. Today, we’ll look at another ubiquitous piece of audio equipment, the microphone, and what technological revolutions are on the horizon.

Its not a new technology, but the Eigenmike is deserving of attention. First released around 2010 by mh acoustics (their website and other searches don’t reveal much historical information), the Eigenmike is a microphone array composed of 32 high quality microphones positioned on the surface of a rigid sphere. Outputs of the individual microphones are combined to capture the soundfield. By beamforming, the soundfield can be steered and aimed in a desired direction.

fig-eigenmike-300x284The Eigenmike

This and related technologies (Core Sound’s TetraMic, Soundfield’s MKV, Sennheiser’s Ambeo …) are revolutionising high-end soundfield recording. Enda Bates has a nice blog entry about them, and they were formally compared in two AES papers.

This and related technologies (Core Sound’s TetraMic, Soundfield’s MKV, Sennheiser’s Ambeo …) are revolutionising high-end soundfield recording. Enda Bates has a nice blog entry about them, and they were formally evaluated in two AES papers, Comparing Ambisonic Microphones Part 1 and Part 2.

Soundskrit is TandemLaunch’s youngest incubated venture, based on research by Ron Miles and colleagues from the University of Binghampton. Tandem Launch, by the way, create companies often arising from academic research, and previously invested in research arising from the audio engineering research team behind this blog.

Jian Zhou and Ron Miles were inspired by the manner in which insects ‘hear’ with their hairs. They devised a method to record audio by sensing changes in airflow velocity rather than pressure. Spider silk is thin enough that it moves with the air when hit by sound waves, even for infrasound frequencies. To translate this movement into an electronic signal, they coated the spider silk with gold and put it in a magnetic field. Almost any fiber that is thin enough could be used in the same way, and different approaches could be applied for transduction. This new approach is intrinsically directional and may have a frequency response far superior to competing directional solutions.

MEMS (MicroElectrical-Mechanical System) microphones usually involve a pressure-sensitive diaphragm etched directly into a silicon wafer. The Soundskrit team is currently focused on developing a MEMs compatible design so that it could be used in a wide variety of devices and applications where directional recording is needed.

Another start-up aiming to revolutionise MEMS technology is Vesper .  Vesper MEMS was developed by founders Bobby Littrell and Karl Grosh at the University of Michigan. It uses piezoelectric materials which produce a voltage when subjected to pressure. This approach can achieve a superior signal-to-noise ratio over the capacitive MEMS microphones that currently dominate the market.

A few years ago, graphene-based microphones were receiving a lot of attention, In 2014, Dejan Todorovic and colleagues investigated the feasibility of graphene as a microphone membrane, and simulations suggested that it could have high sensitivity (the voltage generated in response to a pressure input) over a wide frequency range, far better than conventional microphones. Later that year, Peter Gaskell and others from McGill University performed physical and acoustical measurements of graphene oxide which confirmed Todorovic’s simulation results. But they seemed unaware of Todorovic’s work, despite both groups publishing at AES Conventions.

Gaskell and colleagues went on to commercialise graphene-based loudspeakers, as we discussed previously. But the Todorovic team continued research on graphene  microphones, apparently to great success.

But I haven’t yet found out about any further developments from this group. However, researchers from Kyungpook National University in Korea just recently reported a high sensitivity hearing aid microphone that uses a graphene-based diaphragm.

 

For a bit of fun, check out Catchbox, which bills itself as the ‘the World’s First Soft Throwable Microphone.’ Its not exactly a technological revolution, though their patent pending Automute relates a bit to the field of Automatic Mixing. But I can think of a few meetings that would have been livened up by having this around.

As previously when I’ve discussed commercial technologies, a disclaimer is needed. This blog is not meant as an endorsement of any of the mentioned companies. I haven’t tried their products. They are a sample of what is going on at the frontiers of microphone technology, but by no means cover the full range of exciting developments. In fact, since many of the technological advances are concerned with microphone array processing (source separation, localisation, beam forming and so on) as in some of our own contributions, this blog entry is really only giving you a taste of one exciting direction of research. But these technologies will surely change the way we capture sound in the near future.

Some of our own contributions to microphone technology, mainly on the signal processing and evaluation side of things, are listed below;

  1. L. Wang, J. D. Reiss and A. Cavallaro, ‘Over-Determined Source Separation and Localization Using Distributed Microphones,’ IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 24 (9), 2016.
  2. L. Wang, T. K. Hon, J. D. Reiss and A. Cavallaro, ‘An Iterative Approach to Source Counting and Localization Using Two Distant Microphones,’ IEEE/ACM Transactions on Audio, Speech and Language Processing, 24 (6), June 2016.
  3. L. Wang, T. K. Hon, J. D. Reiss and A. Cavallaro, ‘Self-Localization of Ad-hoc Arrays Using Time Difference of Arrivals,’ IEEE Transactions on Signal Processing, 64 (4), Feb., 2016.
  4. T. K. Hon, L. Wang, J. D. Reiss and A. Cavallaro, ‘Audio Fingerprinting for Multi-Device Self-Localisation,’ IEEE Transactions on Audio, Speech and Language Processing, 23 (10), p. 1623-1636, 2015.
  5. E. K. Kokkinis, J. D. Reiss and J. Mourjopoulos, “A Wiener Filter Approach to Microphone Leakage Reduction in Close-Microphone Applications,” IEEE Transactions on Audio, Speech, and Language Processing, V.20 (3), p.767-79, 2012.
  6. T-K. Hon, L. Wang, J. D. Reiss and A. Cavallaro, ‘Fine landmark-based synchronization of ad-hoc microphone arrays,’ 23rd European Signal Processing Conference (EUSIPCO), p. 1341-1345, Nice, France, 2015.
  7. B. De Man and J. D. Reiss, “A Pairwise and Multiple Stimuli Approach to Perceptual Evaluation of Microphone Types,” 134th AES Convention, Rome, May, 2013.
  8. A. Clifford and J. D. Reiss, Proximity effect detection for directional microphones , 131st AES Convention, New York, p. 1-7, Oct. 20-23, 2011
  9. A. Clifford and J. D. Reiss, Microphone Interference Reduction in Live Sound, Proc. of the 14th Int. Conference on Digital Audio Effects (DAFx-11), Paris, p. 2-9, Sept 19-23, 2011
  10. E. Kokkinis, J. D. Reiss and J. Mourjopoulos, Detection of ‘solo intervals’ in multiple microphone multiple source audio applications, AES 130th Convention, May 2011.
  11. C. Uhle and J. D. Reiss, “Determined Source Separation for Microphone Recordings Using IIR Filters,” 129th AES Convention, San Francisco, Nov. 4-7, 2010.

 

Audio Research Year in Review- Part 2, the Headlines

Last week featured the first part of our ‘Audio research year in review.’ It focused on our own achievements. This week is the second, concluding part, with a few news stories related to the topics of this blog (music production, psychoacoustics, sound synthesis and everything in between) for each month of the year.

Browsing through the list, some interesting things pop up. Several news stories related to speech intelligibility in broadcast TV, which has been a recurring story the last few years. Effects of noise pollution on wildlife is also a theme in this year’s audio research headlines. And quite a few of the psychological studies are telling us what we already know. The fact that musicians (who are trained in a task that involves quick response to stimuli) have faster reaction times than non-musicians (who may not be trained in such a task) is not a surprise. Nor is the fact that if you hear the cork popping from a wine bottle, you may think it tastes better, although that’s a wonderful example of the placebo effect. But studies that end up confirming assumptions are still worth doing.

January

February

March

April

May

string wine glass

June

July

August

September

October

November

December

Audio Research Year in Review- Part 1, It’s all about us

Enjoy the holiday!

So as 2017 is coming to an end, everyone is rushing to get their ‘Year in Review’ articles out. And we’re no different in that regard. Only we’re going to do it in two parts, first what we have been doing this year, and then a second blog entry reviewing all the great breakthroughs and interesting research results in audio engineering, psychoacoustics, sound synthesis and related fields.

But first, lets talk about us. 🙂

I think we’ve all done some wonderful research this year, and the Audio Engineering team here can be proud of the results and progress.

Social Media:

First off, we’ve increased our social media presence tremendously,

• This blog, intelligentsoundengineering.wordpress.com/ has 7,363 views, with  1,128 followers, mostly through other social media.

• We started a twitter account, twitter.com/IntelSoundEng and now have 615 followers. Not huge, but doing well for the first few months of a research-focused feed.

• Our Youtube channel, www.youtube.com/user/IntelligentSoundEng has 16,778 views and 178 subscribers

Here’s a sample video from our YouTube channel;

So people are reading and watching, which gives us even more incentive to put stuff out there that’s worth it for you to check out.

Awards:

We won three best paper or presentation awards;

Adan Benito (left) and Thomas Vassallo (right) for best presentation at the Web Audio Conference

benito vassallo awardRod Selfridge (right) , Dave Moffat and I, for best paper at Sound and Music Computing

selfridge award

I (right) won the best Journal of the Audio Engineering Society paper award, 2016 (announced in 2017 of course)

reiss award2

 

People:

Brecht De Man got his PhD and Yonghao Wang submitted his. Dave Ronan, Alessia Milo, Josh Mycroft and Rod Selfridge have all entered the write-up stage of their PhDs.

Brecht started a post-doc position and became Vice-Chair of the AES Education Committee, and I (Josh Reiss) was promoted to Professor of Audio Engineering. Dave Ronan started a position at AI Music.

We also welcomed a large number of visitors throughout the year, notably Dr. Amandine Pras and Saurjya Sarkar, now with Qualcomm.

Grants and projects:

We started the Cross-adaptive processing for musical intervention project (supporting Brecht, and Saurjya’s visit) and the Autonomous Systems for Sound Integration and GeneratioN (ASSIGN) InnovateUK project (supporting RTSFX researchers). We completed Brecht’s Yamaha postdoc, with another expected, and completed the QMI Proof of Concept: Sound Effect Synthesis project. We’ve been working closely with industry on a variety of projects, especially with RPPtv, who are funding Emmanouil Chourdakis’s PhD and collaborating on InnovateUK projects. We have other exciting grants in progress.

Events:

We’ve been involved in a few workshops. Will Wilkinson and Dave Moffat were on the organising committee for Audio Mostly. Alessia Milo gave an invited talk at the 8th International Symposium on Temporal Design, and organised a soundwalk at Audible Old Kent Road. Brecht and I were on the organizing committee of the 3rd Workshop on Intelligent Music Production. Brecht organized Sound Talking at the Science Museum, and panel sessions on listening tests design at the 142nd and 143rd AES Conventions. Dave Moffat organized a couple of Procedural Audio Now meet-ups.

Publications:

We had a fantastic year for publications, five journal papers (and one more accepted) and twenty conference papers. I’ve listed them all below.

Journal articles

  1. D. Moffat and J. D. Reiss, ‘Perceptual Evaluation of Synthesized Sound Effects,’ accepted for ACM Transactions on Applied Perception
  2. R. Selfridge, D. Moffat and J. D. Reiss, ‘Sound Synthesis of Objects Swinging through Air Using Physical Models,’ Applied Sciences, v. 7 (11), Nov. 2017, Online version doi:10.3390/app7111177
  3. A. Zacharakis, M. Terrell, A. Simpson, K. Pastiadis and J. Reiss ‘Rearrangement of timbre space due to background noise: behavioural evidence and acoustic correlates,’ Acta Acustica united with Acustica, 103 (2), 288-298, 2017. Definitive publisher-authenticated version at http://www.ingentaconnect.com/content/dav/aaua
  4. P. Pestana and J. Reiss, ‘User Preference on Artificial Reverberation and Delay Time Parameters,’ J. Audio Eng. Soc., Vol. 65, No. 1/2, January/February 2017.
  5. B. De Man, K. McNally and J. Reiss, ‘Perceptual evaluation and analysis of reverberation in multitrack music production,’ J. Audio Eng. Soc., Vol. 65, No. 1/2, January/February 2017.
  6. E. Chourdakis and J. Reiss, ‘A machine learning approach to design and evaluation of intelligent artificial reverberation,’ J. Audio Eng. Soc., Vol. 65, No. 1/2, January/February 2017.

Book chapters

  • Accepted: A. Milo, N. Bryan-Kinns, and J. D. Reiss. Graphical Research Tools for Acoustic Design Training: Capturing Perception in Architectural Settings. In Perception-Driven Approaches to Urban Assessment and Design, F. Aletta and X. Jieling (Eds.). IGI Global.
  • J. D. Reiss, ‘An Intelligent Systems Approach to Mixing Multitrack Music‘, Perspectives On Music Production: Mixing Music, Routledge, 2017

Patents

Conference papers

  1. M. A. Martinez Ramirez and J. D. Reiss, ‘Stem Audio Mixing as a Content-Based Transformation of Audio Features,’ IEEE 19th International Workshop on Multimedia Signal Processing, Luton, UK, Oct. 16-18, 2017.
  2. M. A. Martinez Ramirez and J. D. Reiss, ‘Analysis and Prediction of the Audio Feature Space when Mixing Raw Recordings into Individual Stems,’ 143rd AES Convention, New York, Oct. 18-21, 2017.
  3. A. Milo, N. Bryan-Kinns and J. D. Reiss, ‘Influences of a Key Map on Soundwalk Exploration with a Textile Sonic Map,’ 143rd AES Convention, New York, Oct. 18-21, 2017.
  4. A. Milo and J. D. Reiss, ‘Aural Fabric: an interactive textile sonic map,’ Audio Mostly, London, 2017
  5. R. Selfridge, D. Moffat and J. D. Reiss, ‘Physically Derived Sound Synthesis Model of a Propeller,’ Audio Mostly, London, 2017
  6. N. Jillings, R. Stables and J. D. Reiss, ‘Zero-Delay Large Signal Convolution Using Multiple Processor Architectures,’ WASPAA, New York, 2017
  7. E. T. Chourdakis and J. D. Reiss, ‘Constructing narrative using a generative model and continuous action policies,’ CC-NLG, 2017
  8. M. A. Martinez Ramirez and J. D. Reiss, ‘Deep Learning and Intelligent Audio Mixing,’ 3rd Workshop on Intelligent Music Production, Salford, UK, 15 September 2017.
  9. B. De Man, J. D. Reiss and R. Stables, ‘Ten years of automatic mixing,’ 3rd Workshop on Intelligent Music Production, Salford, UK, 15 September 2017.
  10. W. Wilkinson, J. D. Reiss and D. Stowell, ‘Latent Force Models for Sound: Learning Modal Synthesis Parameters and Excitation Functions from Audio Recordings,’ 20th International Conference on Digital Audio Effects (DAFx-17), Edinburgh, UK, September 5–9, 2017
  11. S. Sarkar, J. Reiss and O. Brandtsegg, ‘Investigation of a Drum Controlled Cross-adaptive Audio Effect for Live Performance,’ 20th International Conference on Digital Audio Effects (DAFx-17), Edinburgh, UK, September 5–9, 2017
  12. B. De Man and J. D. Reiss, ‘The mix evaluation dataset,’ 20th International Conference on Digital Audio Effects (DAFx-17), Edinburgh, UK, September 5–9, 2017
  13. D. Moffat, D. Ronan and J. D. Reiss, ‘Unsupervised taxonomy of sound effects,’ 20th International Conference on Digital Audio Effects (DAFx-17), Edinburgh, UK, September 5–9, 2017
  14. R. Selfridge, D. Moffat and J. D. Reiss, ‘Physically Derived Synthesis Model of a Cavity Tone,’ Digital Audio Effects (DAFx) Conf., Edinburgh, September 5–9, 2017
  15. N. Jillings, Y. Wang, R. Stables and J. D. Reiss, ‘Intelligent audio plugin framework for the Web Audio API,’ Web Audio Conference, London, 2017
  16. R. Selfridge, D. J. Moffat and J. D. Reiss, ‘Real-time physical model for synthesis of sword swing sounds,’ Best paper award, Sound and Music Computing (SMC), Helsinki, July 5-8, 2017.
  17. R. Selfridge, D. J. Moffat, E. Avital, and J. D. Reiss, ‘Real-time physical model of an Aeolian harp,’ 24th International Congress on Sound and Vibration (ICSV), London, July 23-27, 2017.
  18. A. Benito and J. D. Reiss, ‘Intelligent Multitrack Reverberation Based on Hinge-Loss Markov Random Fields,’ AES Semantic Audio, Erlangen Germany, June 2017
  19. D. Ronan, H. Gunes and J. D. Reiss, “Analysis of the Subgrouping Practices of Professional Mix Engineers“, AES 142nd Convention, Berlin, May 20-23, 2017
  20. Y. Song, Y. Wang, P. Bull and J. D. Reiss, ‘Performance Evaluation of a New Flexible Time Division Multiplexing Protocol on Mixed Traffic Types,’ 31st IEEE International Conference on Advanced Information Networking and Applications (AINA), Taipei, Taiwan, March 27-29, 2017.

 

Your phd examination – the best defense is a good offense

Previously, I’ve written a few blog entries giving research advice, like ‘So you want to write a research paper‘ and ‘What a PhD thesis is really about… really!‘ I thought I’d come up with a good title for this blog entry, but then I saw this.

thesis_defense

The PhD examination is certainly one of the most important moments in a researcher’s career. Its structure differs from country to country, institution to institution, and subject to subject. In some places, the PhD examination is open to the public, and failure is very rare. The student wouldn’t get to that stage unless the committee was confident that only minor issues remained. It might even be a bit of an event, with the committee wearing gowns and some of the student’s family attending.

But in most countries and most subjects it’s a bit more adversarial and passing is not guaranteed. It usually has a small committee. A public talk might be given, but the question and answer sessions are just the student and the committee.

There are lots and lots of guidance online about how to prepare for a PhD exam, and I’m not going to try to summarise them. Instead, I’ll give you some insights from my own experience, being examined, preparing others for a phd examination, or doing the examination myself. And having had experience with students who ranged from near flawless to, unfortunately, almost hopeless.

First off, congratulations for getting to this stage. That is already a major achievement. And keep in mind is that ultimately, it’s the document itself that is most important. If your thesis is strong, and you can explain it and discuss it well, then you’re already in a good position for the defense.

phd031905s

I’ve noticed that there are questions which seem relevant for me to ask in most PhD examinations, and other examiners tend to ask similar ones. So you can certainly prepare for them. The first are the sort of general PhD study questions; what’s it all about? Here’s a few typical ones.

  • Summarise your key findings?
  • What is your main contribution?
  • What is novel/significant/new?
  • What is the impact? your contribution to the field?
  • What are the weakest parts of your thesis?
  • Knowing what you know now, what would you change?

If there were aspects of your PhD study that were unusual, they also might ask you just to clarify things. For instance, I once examined a PhD whose research had taken a very long time. I wanted to know if there was research that hadn’t made it into the thesis, or whether there were technical issues that made the research more challenging. So I asked a question something like, ‘When did you start your phd research? Were there technical reasons it took so long?’ As it turned out, it was due to a perfectly understandable change of supervisor.

And the examiners will want to know what you know about your subject area and the state of the art.

  • Who else is doing research in this subject?
  • What are the most significant results in the last few years?
  • How does your approach differ from others?
  • Please characterise and summarise other approaches to your topic.

Then there will be some questions specific to your field. These questions might touch on the examiners’ knowledge, or on specific aspects of the literature that may or may not have been mentioned in the thesis.

  • Explain, in your own words, the following concepts -.
  • Compare the – and -. What are the fundamental differences?
  • Is all of your work relevant to other — challenges?
  • Why use —? Are there other approaches?
  • How does your work connect to — and — research?

And many examiners will want to know about the impact of the research so far, e.g. publications or demonstrators. If you do have any demonstrations (audio samples, videos, software, interfaces), it’s a good idea to present them, or at least be ready to present them.

  • Are the community aware of your work? Are people using your software?
  • Do you have any publications?
  • Which (other) results could you publish, and where?
  • Have you attended or presented at any conferences? What did you learn from them?

Ct_XEoLXgAAniid

Then typically, the examiners start diving into the fine details of the thesis. So you should know where to find anything in your own document. Its also a good idea to reread your whole document a couple of days before the examination, so that its all fresh in your mind. It could have been a long time since you wrote it!

Spider-viva-1

And best of luck to you!