Venturous Views on Virtual Vienna – a preview of AES 148


We try to write a preview of the technical track for almost all recent Audio Engineering Society (AES) Conventions, see our entries on the 142nd, 143rd, 144th, 145th and 147th Conventions. But this 148th Convention is very different.

It is, of course, an online event. The Convention planning committee have put huge effort into putting it all online and making it a really engaging and exciting experience (and in massively reducing costs). There will be a mix of live-streams, break out sessions, interactive chat rooms and so on. But the technical papers will mostly be on-demand viewing, with Q&A and online dialog with the authors. This is great in the sense that you can view it and interact with authors any time, but it means that its easy to overlook really interesting work.

So we’ve gathered together some information about a lot of the presented research that caught our eye as being unusual, exceptionally high quality, or just worth mentioning. And every paper mentioned here will appear soon in the AES E-Library, by the way. Currently though, you can browse all the abstracts by searching the full papers and engineering briefs on the Convention website.

Deep learning and neural networks are all the rage in machine learning nowadays. A few contributions to the field will be presented by Eugenio Donati with ‘Prediction of hearing loss through application of Deep Neural Network’, Simon Plain with ‘Pruning of an Audio Enhancing Deep Generative Neural Network’, Giovanni Pepe’s presentation of ‘Generative Adversarial Networks for Audio Equalization: an evaluation study’, Yiwen Wang presenting ‘Direction of arrival estimation based on transfer function learning using autoencoder network’, and the author of this post, Josh Reiss will present work done mainly by sound designer/researcher Guillermo Peters, ‘A deep learning approach to sound classification for film audio post-production’. Related to this, check out the Workshop on ‘Deep Learning for Audio Applications – Engineering Best Practices for Data’, run by Gabriele Bunkheila of MathWorks (Matlab), which will be live-streamed  on Friday.

There’s enough work being presented on spatial audio that there could be a whole conference on the subject within the convention. A lot of that is in Keynotes, Workshops, Tutorials, and the Heyser Memorial Lecture by Francis Rumsey. But a few papers in the area really stood out for me. Toru Kamekawa’s investigated a big question with ‘Are full-range loudspeakers necessary for the top layer of 3D audio?’ Marcel Nophut’s ‘Multichannel Acoustic Echo Cancellation for Ambisonics-based Immersive Distributed Performances’ has me intrigued because I know a bit about echo cancellation and a bit about ambisonics, but have no idea how to do the former for the latter.

And I’m intrigued by ‘Creating virtual height loudspeakers using VHAP’, presented by Kacper Borzym. I’ve never heard of VHAP, but the original VBAP paper is the most highly cited paper in the Journal of the AES (1367 citations at the time of writing this).

How good are you at understanding speech from native speakers? How about when there’s a lot of noise in the background? Do you think you’re as good as a computer? Gain some insight into related research when viewing the presentation by Eugenio Donati on ‘Comparing speech identification under degraded acoustic conditions between native and non-native English speakers’.

There’s a few papers exploring creative works, all of which look interesting and have great titles. David Poirier-Quinot will present ‘Emily’s World: behind the scenes of a binaural synthesis production’. Music technology has a fascinating history. Michael J. Murphy will explore the beginning of a revolution with ‘Reimagining Robb: The Sound of the World’s First Sample-based Electronic Musical Instrument circa 1927’. And if you’re into Scandinavian instrumental rock music (and who isn’t?), Zachary Bresler’s presentation of ‘Music and Space: A case of live immersive music performance with the Norwegian post-rock band Spurv’ is a must.


Frank Morse Robb, inventor of the first sample-based electronic musical instrument.

But sound creation comes first, and new technologies are emerging to do it. Damian T. Dziwis will present ‘Body-controlled sound field manipulation as a performance practice’. And particularly relevant given the worldwide isolation going on is ‘Quality of Musicians’ Experience in Network Music Performance: A Subjective Evaluation,’ presented by Konstantinos Tsioutas.

Portraiture looks at how to represent or capture the essence and rich details of a person. Maree Sheehan explores how this is achieved sonically, focusing on Maori women, in an intriguing presentation on ‘Audio portraiture sound design- the development and creation of audio portraiture within immersive and binaural audio environments.’

We talked about exciting research on metamaterials for headphones and loudspeakers when giving previews of previous AES Conventions, and there’s another development in this area presented by Sebastien Degraeve in ‘Metamaterial Absorber for Loudspeaker Enclosures’

Paul Ferguson and colleagues look set to break some speed records, but any such feats require careful testing first, as in ‘Trans-Europe Express Audio: testing 1000 mile low-latency uncompressed audio between Edinburgh and Berlin using GPS-derived word clock’

Our own research has focused a lot on intelligent music production, and especially automatic mixing. A novel contribution to the field, and a fresh perspective, is given in Nyssim Lefford’s presentation of ‘Mixing with Intelligent Mixing Systems: Evolving Practices and Lessons from Computer Assisted Design’.

Subjective evaluation, usually in the form of listening tests, is the primary form of testing audio engineering theory and technology. As Feynman said, ‘if it disagrees with experiment, its wrong!’

And thus, there are quite a few top-notch research presentations focused on experiments with listeners. Minh Voong looks at an interesting aspect of bone conduction with ‘Influence of individual HRTF preference on localization accuracy – a comparison between regular and bone conducting headphones. Realistic reverb in games is incredibly challenging because characters are always moving, so Zoran Cvetkovic tackles this with ‘Perceptual Evaluation of Artificial Reverberation Methods for Computer Games.’ The abstract for Lawrence Pardoe’s ‘Investigating user interface preferences for controlling background-foreground balance on connected TVs’ suggests that there’s more than one answer to that preference question. That highlights the need for looking deep into any data, and not just considering the mean and standard deviation, which often leads to Simpson’s Paradox. And finally, Peter Critchell will present ‘A new approach to predicting listener’s preference based on acoustical parameters,’ which addresses the need to accurately simulate and understand listening test results.

There are some talks about really rigorous signal processing approaches. Jens Ahren will present ‘Tutorial on Scaling of the Discrete Fourier Transform and the Implied Physical Units of the Spectra of Time-Discrete Signals.’ I’m excited about this because it may shed some light on a possible explanation for why we hear a difference between CD quality and very high sample rate audio formats.

The Constant-Q Transform represents a signal in frequency domain, but with logarithmically spaced bins. So potentially very useful for audio. The last decade has seen a couple of breakthroughs that may make it far more practical.  I was sitting next to Gino Velasco when he won the “best student paper” award for Velasco et al.’s “Constructing an invertible constant-Q transform with nonstationary Gabor frames.” Schörkhuber and Klapuri also made excellent contributions, mainly around implementing a fast version of the transform, culminating in a JAES paper. and the teams collaborated together on a popular Matlab toolbox. Now there’s another advance with Felix Holzmüller presenting ‘Computational efficient real-time capable constant-Q spectrum analyzer’.

The abstract for Dan Turner’s ‘Content matching for sound generating objects within a visual scene using a computer vision approach’ suggests that it has implications for selection of sound effect samples in immersive sound design. But I’m a big fan of procedural audio, and think this could have even higher potential for sound synthesis and generative audio systems.

And finally, there’s some really interesting talks about innovative ways to conduct audio research based on practical challenges. Nils Meyer-Kahlen presents ‘DIY Modifications for Acoustically Transparent Headphones’. The abstract for Valerian Drack’s ‘A personal, 3D printable compact spherical loudspeaker array’, also mentions its use in a DIY approach. Joan La Roda’s own experience of festival shows led to his presentation of ‘Barrier Effect at Open-air Concerts, Part 1’. Another presentation with deep insights derived from personal experience is Fabio Kaiser’s ‘Working with room acoustics as a sound engineer using active acoustics.’ And the lecturers amongst us will be very interested in Sebastian Duran’s ‘Impact of room acoustics on perceived vocal fatigue of staff-members in Higher-education environments: a pilot study.’

Remember to check the AES E-Library which will soon have all the full papers for all the presentations mentioned here, including listing all authors not just presenters. And feel free to get in touch with us. Josh Reiss (author of this blog entry), J. T. Colonel, and Angeliki Mourgela from the Audio Engineering research team within the Centre for Digital Music, will all be (virtually) there.

Awesome student projects in sound design and audio effects

I teach classes in Sound Design and Digital Audio Effects. In both classes, the final assignment involves creating an original work that involves audio programming and using concepts taught in class. But the students also have a lot of free reign to experiment and explore their own ideas. The results are always great. Lots of really cool ideas, many of which could lead to a publication, or would be great to listen to regardless of the fact that it was an assignment.

The last couple of years, I posted about it here and here.  Here’s a few of the projects this year.

From the Sound Design class;

  • A procedural audio model of a waterfall. The code was small, involving some filtered noise sources with random gain changes, but the result was great.waterfall2
  • An interactive animation of a girl writing at a desk during a storm. There were some really neat tricks to get a realistic thunder sound.
  • A procedurally generated sound scene for a walk through the countryside. The student found lots of clever ways to generate the sounds of birds, bees, a river and the whoosh of a passing car.
  • New sound design replacing the audio track in a film scene. Check it out.

And from the Digital Audio Effects class;

  • I don’t need to mention anything about the next one. Just read the student’s tweet.


  • Rainmaker, a VST plugin that takes an incoming signal and transforms it into a ‘rain’ like sound, starting above the listener and then floating down below.

  • A plugin implementation of the Karplus-Strong algorithm, except an audio sample is used to excite the string instead of a noise burst. It gives really interesting timbral qualities.

  • Stormify, an audio plugin that enables users to add varying levels of rain and wind to the background of their audio, making it appear that the recording took place in inclement weather.
  • An all-in-one plugin for synthesising and sculpting drum-like sounds.
  • The Binaural Phase Vocoder, a VST/AU plugin whereby users can position a virtual sound source in a 3D space and process the sound through an overlap-add phase vocoder.
  • A multiband multi-effect consisting of three frequency bands and three effects on each band: delay, distortion, and tremolo. Despite the seeming complexity, the interface was straightforward and easy to use.


There were many other interesting assignments, including several sonifications of images. But this selection really shows both the talent of the students and the possibilities to create new and interesting sounds.

Funded PhD studentships available in Data-informed Audience-centric Media Engineering

So its been a while since I’ve written a blog post. Life, work, and of course, the Covid crisis has made my time limited. But hopefully I’ll write more frequently in future.

The good news is that there are fully funded PhD studentships which you or others you know might be interested in. They are all around the concept of Data-informed Audience-centric Media Engineering (DAME). See for details.

Three studentships are available. They are all fully-funded, for four years of study, based at Queen Mary University of London, and starting January 2021. Two of the proposed topics, ‘Media engineering for hearing-impaired audiences’ and ‘Intelligent systems for radio drama production’, are supported by BBC and build on prior and ongoing work by my research team.

  • Media engineering for hearing-impaired audiences: This research proposes the exploration of ways in which media content can be automatically processed to deliver the content optimally for audiences with hearing loss. It builds on prior work by our group and the collaborator, BBC, in development of effective audio mixing techniques for broadcast audio enhancement [1,2,3]. It will form a deeper understanding of the effects of hearing loss on media content perception and enjoyment, as well as utilize this knowledge towards the development of intelligent audio production techniques and applications that could improve audio quality by providing efficient and customisable compensation. It aims to advance beyond current research [4], which does not yet fully take into account the artistic intent of the material, and requires an ‘ideal mix’ for normal hearing listeners. So a new approach that both removes constraints and is more focused on the meaning of the content is required. This approach will be derived from natural language processing and audio informatics, to prioritise sources and establish requirements for the preferred mix.
  • Intelligent systems for radio drama production: This research topic proposes methods for assisting a human creator in producing radio dramas. Radio drama consists of both literary aspects, such as plot, story characters, or environments, as well as production aspects, such as speech, music, and sound effects. This project builds on recent, high impact collaboration with BBC [3, 5], to greatly advance the understanding of radio drama production, with the goal of devising and assessing intelligent technologies to aid in its creation. The project will first be concerned with investigating rules-based systems for generating production scripts from story outlines, and producing draft content from such scripts. It will consider existing workflows for content production and where such approaches rely on heavy manual labour. Evaluation will be with expert content producers, with the goal of creating new technologies that streamline workflows and facilitate the creative process.

If you or anyone you know is interested, please look at . Consider applying and feel free to ask me any questions.

[1] A. Mourgela, T. Agus and J. D. Reiss, “Perceptually Motivated Hearing Loss Simulation for Audio Mixing Reference,” 147th AES Convention, 2019.

[2] Ward, Lauren, et al. “Casualty Accessible and Enhanced (A&E) Audio: Trialling Object-Based Accessible TV Audio.” Audio Engineering Society Convention 147. 2019.

[3] E. T. Chourdakis, L. Ward, M. Paradis and J. D. Reiss, “Modelling Experts’ Decisions on Assigning Narrative Importances of Objects in a Radio Drama Mix,” Digital Audio Effects Conference (DAFx), 2019.

[4] L. Ward and B. Shirley, Personalization in object-based audio for accessibility: a review of advancements for hearing impaired listeners. Journal of the Audio Engineering Society, 67(7/8), 584-597, 2019.

[5] E. T. Chourdakis and J. D. Reiss, ‘From my pen to your ears: automatic production of radio plays from unstructured story text,’ 15th Sound and Music Computing Conference (SMC), Limassol, Cyprus, 4-7 July, 2018

Intelligent Music Production book is published


Ryan Stables is an occasional collaborator and all around brilliant person. He started the annual Workshop on Intelligent Music Production (WIMP) in 2015. Its been going strong ever since, with the 5th WIMP co-located with DAFx, this past September. The workshop series focuses on the application of intelligent systems (including expert systems, machine learning, AI) to music recording, mixing, mastering and related aspects of audio production or sound engineering.

Ryan had the idea for a book about the subject, and myself (Josh Reiss) and Brecht De Man (another all around brilliant person) were recruited as co-authors. What resulted was a massive amount of writing, editing, refining, re-editing and so on. We all contributed big chunks of content, but Brecht pulled it all together and turned it into something really high quality giving a comprehensive overview of the field, suitable for a wide range of audiences.

And the book is finally published today, October 31st! Its part of the AES Presents series by Focal Press, a division of Routledge. You can get it from the publisher, from Amazon or any of the other usual places.

And here’s the official blurb

Intelligent Music Production presents the state of the art in approaches, methodologies and systems from the emerging field of automation in music mixing and mastering. This book collects the relevant works in the domain of innovation in music production, and orders them in a way that outlines the way forward: first, covering our knowledge of the music production processes; then by reviewing the methodologies in classification, data collection and perceptual evaluation; and finally by presenting recent advances on introducing intelligence in audio effects, sound engineering processes and music production interfaces.

Intelligent Music Production is a comprehensive guide, providing an introductory read for beginners, as well as a crucial reference point for experienced researchers, producers, engineers and developers.


Fellow of the Audio Engineering Society

The Audio Engineering Society’s Fellowship Award is given to ‘a member who had rendered conspicuous service or is recognized to have made a valuable contribution to the advancement in or dissemination of knowledge of audio engineering or in the promotion of its application in practice’.

Today at the 147th AES Convention, I was given the Fellowship Award for valuable contributions to, and for encouraging and guiding the next generation of researchers in, the development of audio and musical signal processing.

This is quite an honour, of which I’m very proud. And it puts me in some excellent company. A lot of greats have become Fellows of the AES (Manfred SchroederVesa Valimaki, Poppy Crum, Bob Moog, Richard Heyser, Leslie Ann Jones, Gunther Thiele and Richard Small…) which also means I have a lot to live up to.

And thanks to the AES,

Josh Reiss

Radical and rigorous research at the upcoming Audio Engineering Society Convention


We previewed the 142nd, 143rd, 144th  and 145th Audio Engineering Society (AES) Conventions, which we also followed with wrap-up discussions. Then we took a break, but now we’re back to preview the 147th AES  convention, October 16 to 19 in New York. As before, the Audio Engineering research team here aim to be quite active at the convention.

We’ve gathered together some information about a lot of the research-oriented events that caught our eye as being unusual, exceptionally high quality, involved in, attending, or just worth mentioning. And this Convention will certainly live up to the hype.

Wednesday October 16th

When I first read the title of the paper ‘Evaluation of Multichannel Audio in Automobiles versus Mobile Phones‘, presented at 10:30, I thought it was a comparison of multichannel automotive audio versus the tinny, quiet mono or barely stereo from a phone. But its actually comparing results of a listening test for stereo vs multichannel in a car, with results of a listening test for stereo vs multichannel for the same audio, but from a phone and rendered over headphones. And the results look quite interesting.

Deep neural networks are all the rage. We’ve been using DNNs to profile a wide variety of audio effects. Scott Hawley will be presenting some impressive related work at 9:30, ‘Profiling Audio Compressors with Deep Neural Networks.’

We previously presented work on digital filters that closely match their analog equivalents. We pointed out that such filters can have cut-off frequencies beyond Nyquist, but did not explore that aspect. ‘Digital Parametric Filters Beyond Nyquist Frequency‘, at 10 am, investigates this idea in depth.

I like a bit of high quality mathematical theory, and that’s what you get in Tamara Smyth’s 11:30 paper ‘On the Similarity between Feedback/Loopback Amplitude and Frequency Modulation‘, which shows a rather surprising (at least at first glance) equivalence between two types of feedback modulation.

There’s an interesting paper at 2pm, ‘What’s Old Is New Again: Using a Physical Scale Model Echo Chamber as a Real-Time Reverberator‘, where reverb is simulated not with impulse response recordings, or classic algorithms, but using scaled models of echo chambers.

At 4 o’clock, ‘A Comparison of Test Methodologies to Personalize Headphone Sound Quality‘ promises to offer great insights not just for headphones, but into subjective evaluation of audio in general.

There’s so many deep learning papers, but the 3-4:30 poster ‘Modal Representations for Audio Deep Learning‘ stands out from the pack. Deep learning for audio most often works with raw spectrogram data. But this work proposes learning modal filterbank coefficients directly, and they find it gives strong results for classification and generative tasks. Also in that session, ‘Analysis of the Sound Emitted by Honey Bees in a Beehive‘ promises to be an interesting and unusual piece of work. We talked about their preliminary results in a previous entry, but now they’ve used some rigorous audio analysis to make deep and meaningful conclusions about bee behaviour.

Immerse yourself in the world of virtual and augmented reality audio technology today, with some amazing workshops, like Music Production in VR and AR, Interactive AR Audio Using Spark, Music Production in Immersive Formats, ISSP: Immersive Sound System Panning, and Real-time Mixing and Monitoring Best Practices for Virtual, Mixed, and Augmented Reality. See the Calendar for full details.

Thursday, October 17th

An Automated Approach to the Application of Reverberation‘, at 9:30, is the first of several papers from our team, and essentially does something to algorithmic reverb similar to what “Parameter Automation in a Dynamic Range Compressor” did for a dynamic range compressor.

Why do public address (PA) systems sound for large venues sound so terrible? They actually have regulations for speech intelligibility. But this is only measured in empty stadiums. At 11 am, ‘The Effects of Spectators on the Speech Intelligibility Performance of Sound Systems in Stadia and Other Large Venues‘ looks at the real world challenges when the venue is occupied.

Two highlights of the 9-10:30 poster session, ‘Analyzing Loudness Aspects of 4.2 Million Musical Albums in Search of an Optimal Loudness Target for Music Streaming‘ is interesting, not just for the results, applications and research questions, but also for the fact that involved 4.2 million albums. Wow! And there’s a lot more to audio engineering research than what one might think. How about using acoustic sensors to enhance autonomous driving systems, which is a core application of ‘Audio Data Augmentation for Road Objects Classification‘.

Audio forensics is a fascinating world, where audio engineering is often applied to unusually but crucially. One such situation is explored at 2:15 in ‘Forensic Comparison of Simultaneous Recordings of Gunshots at a Crime Scene‘, which involves looking at several high profile, real world examples.

Friday, October 18th

There are two papers looking at new interfaces for virtual reality and immersive audio mixing, ‘Physical Controllers vs. Hand-and-Gesture Tracking: Control Scheme Evaluation for VR Audio Mixing‘ at 10:30, and ‘Exploratory Research into the Suitability of Various 3D Input Devices for an Immersive Mixing Task‘ at 3:15.

At 9:15, J. T. Colonel from our group looks into the features that relate, or don’t relate, to preference for multitrack mixes in ‘Exploring Preference for Multitrack Mixes Using Statistical Analysis of MIR and Textual Features‘, with some interesting results that invalidate some previous research. But don’t let negative results discourage ambitious approaches to intelligent mixing systems, like Dave Moffat’s (also from here) ‘Machine Learning Multitrack Gain Mixing of Drums‘, which follows at 9:30.

Continuing this theme of mixing analysis and automation is the poster ‘A Case Study of Cultural Influences on Mixing Preference—Targeting Japanese Acoustic Major Students‘, shown from 3:30-5, which does a bit of meta-analysis by merging their data with that of other studies.

Just below, I mention the need for multitrack audio data sets. Closely related, and also much needed, is this work on ‘A Dataset of High-Quality Object-Based Productions‘, also in the 3:30-5 poster session.

Saturday, October 19th

We’re approaching a world where almost every surface can be a visual display. Imagine if every surface could be a loudspeaker too. Such is the potential of metamaterials, discussed in ‘Acoustic Metamaterial in Loudspeaker Systems Design‘ at 10:45.

Another session, 9 to 11:30 has lots of interesting presentations about music production best practices. At 9, Amandine Pras presents ‘Production Processes of Pop Music Arrangers in Bamako, Mali‘. I doubt there will be many people at the convention who’ve thought about how production is done there, but I’m sure there will be lots of fascinating insights. This is followed at 9:30 by ‘Towards a Pedagogy of Multitrack Audio Resources for Sound Recording Education‘. We’ve published a few papers on multitrack audio collections, sorely needed for researchers and educators, so its good to see more advances.

I always appreciate filling the gaps in my knowledge. And though I know a lot about sound enhancement, I’ve never dived into how its done and how effective it is in soundbars, now widely used in home entertainment. So I’m looking forward to the poster ‘A Qualitative Investigation of Soundbar Theory‘, shown 10:30-12. From the title and abstract though, this feels like it might work better as an oral presentation. Also in that session, the poster ‘Sound Design and Reproduction Techniques for Co-Located Narrative VR Experiences‘ deserves special mention, since it won the Convention’s Best Peer-Reviewed Paper Award, and promises to be an important contribution to the growing field of immersive audio.

Its wonderful to see research make it into ‘product’, and ‘Casualty Accessible and Enhanced (A&E) Audio: Trialling Object-Based Accessible TV Audio‘, presented at 3:45, is a great example. Here, new technology to enhance broadcast audio for those with hearing loss iwas trialed for a popular BBC drama, Casualty. This is of extra interest to me since one of the researchers here, Angeliki Mourgela, does related research, also in collaboration with BBC. And one of my neighbours is an actress who appears on that TV show.

I encourage the project students working with me to aim for publishable research. Jorge Zuniga’s ‘Realistic Procedural Sound Synthesis of Bird Song Using Particle Swarm Optimization‘, presented at 2:30, is a stellar example. He created a machine learning system that uses bird sound recordings to find settings for a procedural audio model. Its a great improvement over other methods, and opens up a whole field of machine learning applied to sound synthesis.

At 3 o’clock in the same session is another paper from our team, Angeliki Mourgela presenting ‘Perceptually Motivated Hearing Loss Simulation for Audio Mixing Reference‘. Roughly 1 in 6 people suffer from some form of hearing loss, yet amazingly, sound engineers don’t know what the content will sound like to them. Wouldn’t it be great if the engineer could quickly audition any content as it would sound to hearing impaired listeners? That’s the aim of this research.

About three years ago, I published a meta-analysis on perception of high resolution audio, which received considerable attention. But almost all prior studies dealt with music content, and there are good reasons to consider more controlled stimuli too (noise, tones, etc). The poster ‘Discrimination of High-Resolution Audio without Music‘ does just that. Similarly, perceptual aspects of dynamic range compression is an oft debated topic, for which we have performed listening tests, and this is rigorously investigated in ‘Just Noticeable Difference for Dynamic Range Compression via “Limiting” of a Stereophonic Mix‘. Both posters are in the 3-4:30 session.

The full program can be explored on the Convention Calendar or the Convention website. Come say hi to us if you’re there! Josh Reiss (author of this blog entry), J. T. Colonel, Angeliki Mourgela and Dave Moffat from the Audio Engineering research team within the Centre for Digital Music, will all be there.

Congratulations, Dr. Will Wilkinson

This afternoon one of our PhD student researchers, Will Wilkinson, successfully defended his PhD. The form of these exams, or vivas, varies from country to country, and even institution to institution, which we discussed previously. Here, its pretty gruelling; behind closed doors, with two expert examiners probing every aspect of the PhD.
Will’s PhD was on ‘Gaussian Process Modelling for Audio Signals.’

Audio signals are characterised and perceived based on how their spectral make-up changes with time. Latent force modelling assumes these characteristics come about as a result of a common input function passing through some input-output process. Uncovering the behaviour of these hidden spectral components is at the heart of many applications involving sound, but is an extremely difficult task given the infinite number of ways any signal can be decomposed.

Will’s thesis studies the application of Gaussian processes to audio, which offer a way to specify probabilities for these functions whilst encoding prior knowledge about sound, the way it behaves, and the way it is perceived. Will advanced the theory considerably, and tested his approach for applications in sound synthesis, denoising and source separation tasks, among others. – demonstrates some of his research applied to sound synthesis, and is a real-time demonstration of his Masters work on sound synthesis for mammalian vocalisations.

Here’s a list of all Will’s papers while a member of the Intelligent Sound Engineering team and the Machine Listening Lab.

Back end developer needed for sound synthesis start-up

logo black on white

FXive ( is a real-time sound effect synthesis framework in the browser, spun-out from research developed at Queen Mary University of London by the team behind this blog. It is currently front-end only. We’d like to subcontract a backend developer to implement:

  • Sign-up, log-in and subscription system
  • Payment system for subscription, which offers unlimited sound downloads, and purchasing sounds individually

Additional functionalities can be discussed.

If you’re interested or know a web developer that might be interested, please get in touch with us from .

You can check out some of the sound effect synthesis models used in FXive in previous blog entries.


@c4dm @QMUL #backend #webdeveloper #nodejs


Cool sound design and audio effects projects

Every year, I teach two classes (modules), Sound Design and Digital Audio Effects. In both classes, the final assignment involves creating an original work that involves audio programming and using concepts taught in class. But the students also have a lot of free reign to experiment and explore their own ideas. Last year, I had a well received blog entry about the projects.

The results are always great. Lots of really cool ideas, many of which could lead to a publication, or would be great to listen to regardless of the fact that it was an assignment. Here’s a few of the projects this year.

From the Sound Design class;

  • A truly novel abstract sound synthesis (amplitude and frequency modulation) where parameters are controlled by pitch recognition and face recognition machine learning models, using the microphone and the webcam. Users could use their voice and move their face around to affect the sound.
  • An impressive one had six sound models: rain, bouncing ball, sea waves, fire, wind and explosions. It also had a website where each synthesised sound could be compared against real recordings. We couldn’t always tell which was real and which was synthesised!


  • An auditory model of a London Underground train, from the perspective of a passenger on a train, or waiting at a platform. It had a great animation.


  • Two projects involved creating interactive soundscapes auralising an image. One involved a famous photo taken by the photographer, Gregory Crewdson. encapsulating  a dark side of suburban America through surreal, cinematic imagery. The other was an estate area, where there are no bodies visible , giving the impression of an eerie atmosphere where background noises and small sounds are given prominence.

And from the Digital Audio Effects class;

  • A create-your-own distortion effect, where the user can interactively modify the wave shaping curve.
  • Input dependant modulation signal based on the physical mass/ spring system
  • A Swedish death metal guitar effect combining lots of effects for a very distinctive sound
  • A very creative all-in-one audio toy, ‘Ring delay’. This  augmented ping-pong delay effect gives controls over the panning of the delays, the equalization of the audio input and delays, and the output gain. Delays can be played backwards, and the output can be set out-of-phase. Finally, a ring modulator can modulate the audio input to create new sounds to be delayed.
  • Chordify, which transforms an incoming signal, ideally individual notes, into a chord of three different pitches.


  • An audio effects chain inspired by interning at a local radio station. The student helped the owner produce tracks using effects chain presets. But this producers understanding of compressors, EQ, distortion effects… was fairly limited. So the student recreated one of the effects chains into a plugin that only has two adjustable parameters which control multiple parameters inside. 
  • Old Styler, a plug-in that applies sort of a ‘vintage’ effect so that it sounds like from an old radio or an old, black and white movie. Here’s how it sounds.
  • There were some advanced reverbs, including a VST implementation of a state-of-the-art reverberation algorithm known as a Scattering Delay Network (SDN), and a Church reverb incorporating some additional effects to get that ‘church sound’ just right.
  • A pretty amazing cave simulator, with both reverb and random water droplet sounds as part of the VST plug-in.


  • A bit crusher, which also had noise, downsampling and filtering to allow lots of ways to degrade the signal.
  • A VST implementation of the Euclidian Algorithm for world rhythms as described by Goddfried Toussaint in his paper The Euclidean Algorithm Generates Traditional Musical Rhythms.
  • A mid/side processor, with excellent analysis to verify that the student got the implementation just right.
  • Multi-functional distortion pedal. Guitarists often compose music in their bedroom and would benefit from having an effect to facilitate filling the song with a range of sounds, traditionally belonging to other instruments. That’s what this plug-in did, using a lot of clever tricks to widen the soundstage of the guitar.
  • Related to the multi-functional distortion, two students created multiband distortion effects.
  • A Python project that separates a track into harmonic, percussive, and residual components which can be adjusted individually.
  • An effect that attempts to resynthesise any audio input with sine wave oscillators that take their frequencies from the well-tempered scale. This goes far beyond auto-tune, yet can be quite subtle.
  • A source separator plug-in based on Dan Barry’s ADRESS algorithm, described here and here. Along with Mikel Gainza, Dan Barry cofounded the company Sonic Ladder, which released the successful software Riffstation, based on their research.

There were many other interesting assignments, including several variations on tape emulation. But this selection really shows both the talent of the students and the possibilities to create new and interesting sounds.

What’s up with the Web Audio API?

Recently, we’ve been doing a lot of audio development for applications running in the browser, like with the procedural audio and sound synthesis system FXive, or the Web Audio Evaluation Tool (WAET). The Web Audio API is part of HTML5 and its a high level Application Programming Interface with a lot of built-in functions for processing and generating sound. The idea is that its what you need to have any audio application (audio effects, virtual instruments, editing and analysis tools…) running as javascript in a web browser.

It uses a dataflow model like LabView and media-focused languages like Max/MSP, Pure Data and Reaktor,. So you create oscillators, connect them to filters, combine them and then connect that to output to play out the sound. But unlike the others, its not graphical, since you write it as JavaScript like most code that runs client-side on a web browser.

Sounds great, right? And it is. But there were a lot of strange choices that went into the API. They don’t make it unusable or anything like that, but it does sometimes leave you kicking in frustration and thinking the coding would be so much easier if only… Here’s a few of them.

  • There’s no built-in noise signal generator. You can create sine waves, sawtooth waves, square waves… but not noise. Generating audio rate random numbers is built in to pretty much every other audio development environment, and in almost every web audio application I’ve seen, the developers have redone it themselves, with ScriptProcessors, AudioWorklets, buffered noise Classes or methods.
  • The low pass, high pass, low shelving and high shelving filters in the Web Audio API are not the standard first order designs, as taught in signal processing and described in [1, 2] and lots of references within. The low pass and high pass are resonant second order filters, and the shelving filters are the less common alternatives to the first order designs. This is ok for a lot of cases where you are developing a new application with a bit of filtering, but its a major pain if you’re writing a web version of something written in MATLAB, Puredata or lots and lots of other environments where the basic low and high filters are standard first order designs.
  • The oscillators come with a detune property that represents detuning of oscillation in hundredths of a semitone, or cents. I suppose its a nice feature if you are using cents on the interface and dealing with musical intervals. But its the same as changing the frequency parameter and doesn’t save a single line of code. There are other useful parameters which they didn’t give the ability to change, like phase, or the duty rate of a square wave. is an alternative implementation that addresses this.
  • The square, sawtooth and triangle waves are not what you think they are. Instead of the triangle wave being a periodic ramp up and ramp down, they are the sum of a few terms in the Fourier series that approximate this. This is nice if you want to avoid aliasing, but wrong for every other use. It took me a long time to figure this out when I tried modulating a signal by a square wave to turn it on and off. Again, gives an alternative implementation with the actual waveforms.
  • The API comes with a biquad filter that allows you to create almost arbitrary infinite impulse response filters. But you can’t change the coefficients once its created. So its useless for most web audio applications, which involve some control or interaction.

Despite all that, its pretty amazing. And you can get around all these issues since you can always write your own audio worklets for any audio processing and generation. But you shouldn’t have to.

We’ve published a few papers on the Web Audio API and what you can do with it, so please check them out if you are doing some R&D involving it.


[1] J. D. Reiss and A. P. McPherson, “Audio Effects: Theory, Implementation and Application“, CRC Press, 2014.

[2] V. Valimaki and J. D. Reiss, ‘All About Audio Equalization: Solutions and Frontiers,’ Applied Sciences, special issue on Audio Signal Processing, 6 (5), May 2016.

[3] P. Bahadoran, A. Benito, T. Vassallo, J. D. Reiss, FXive: A Web Platform for Procedural Sound Synthesis, Audio Engineering Society Convention 144, May 2018

[4] N. Jillings, Y. Wang, R. Stables and J. D. Reiss, ‘Intelligent audio plugin framework for the Web Audio API,’ Web Audio Conference, London, 2017

[5] N. Jillings, Y. Wang, J. D. Reiss and R. Stables, “JSAP: A Plugin Standard for the Web Audio API with Intelligent Functionality,” 141st Audio Engineering Society Convention, Los Angeles, USA, 2016.

[6] N. Jillings, D. Moffat, B. De Man, J. D. Reiss, R. Stables, ‘Web Audio Evaluation Tool: A framework for subjective assessment of audio,’ 2nd Web Audio Conf., Atlanta, 2016

[7] N. Jillings, B. De Man, D. Moffat and J. D. Reiss, ‘Web Audio Evaluation Tool: A Browser-Based Listening Test Environment,’ Sound and Music Computing (SMC), July 26 – Aug. 1, 2015