Intelligent Music Production book is published

9781138055193

Ryan Stables is an occasional collaborator and all around brilliant person. He started the annual Workshop on Intelligent Music Production (WIMP) in 2015. Its been going strong ever since, with the 5th WIMP co-located with DAFx, this past September. The workshop series focuses on the application of intelligent systems (including expert systems, machine learning, AI) to music recording, mixing, mastering and related aspects of audio production or sound engineering.

Ryan had the idea for a book about the subject, and myself (Josh Reiss) and Brecht De Man (another all around brilliant person) were recruited as co-authors. What resulted was a massive amount of writing, editing, refining, re-editing and so on. We all contributed big chunks of content, but Brecht pulled it all together and turned it into something really high quality giving a comprehensive overview of the field, suitable for a wide range of audiences.

And the book is finally published today, October 31st! Its part of the AES Presents series by Focal Press, a division of Routledge. You can get it from the publisher, from Amazon or any of the other usual places.

And here’s the official blurb

Intelligent Music Production presents the state of the art in approaches, methodologies and systems from the emerging field of automation in music mixing and mastering. This book collects the relevant works in the domain of innovation in music production, and orders them in a way that outlines the way forward: first, covering our knowledge of the music production processes; then by reviewing the methodologies in classification, data collection and perceptual evaluation; and finally by presenting recent advances on introducing intelligence in audio effects, sound engineering processes and music production interfaces.

Intelligent Music Production is a comprehensive guide, providing an introductory read for beginners, as well as a crucial reference point for experienced researchers, producers, engineers and developers.

 

Cool sound design and audio effects projects

Every year, I teach two classes (modules), Sound Design and Digital Audio Effects. In both classes, the final assignment involves creating an original work that involves audio programming and using concepts taught in class. But the students also have a lot of free reign to experiment and explore their own ideas. Last year, I had a well received blog entry about the projects.

The results are always great. Lots of really cool ideas, many of which could lead to a publication, or would be great to listen to regardless of the fact that it was an assignment. Here’s a few of the projects this year.

From the Sound Design class;

  • A truly novel abstract sound synthesis (amplitude and frequency modulation) where parameters are controlled by pitch recognition and face recognition machine learning models, using the microphone and the webcam. Users could use their voice and move their face around to affect the sound.
  • An impressive one had six sound models: rain, bouncing ball, sea waves, fire, wind and explosions. It also had a website where each synthesised sound could be compared against real recordings. We couldn’t always tell which was real and which was synthesised!

SoundSelect.png

  • An auditory model of a London Underground train, from the perspective of a passenger on a train, or waiting at a platform. It had a great animation.

train

  • Two projects involved creating interactive soundscapes auralising an image. One involved a famous photo taken by the photographer, Gregory Crewdson. encapsulating  a dark side of suburban America through surreal, cinematic imagery. The other was an estate area, where there are no bodies visible , giving the impression of an eerie atmosphere where background noises and small sounds are given prominence.

And from the Digital Audio Effects class;

  • A create-your-own distortion effect, where the user can interactively modify the wave shaping curve.
  • Input dependant modulation signal based on the physical mass/ spring system
  • A Swedish death metal guitar effect combining lots of effects for a very distinctive sound
  • A very creative all-in-one audio toy, ‘Ring delay’. This  augmented ping-pong delay effect gives controls over the panning of the delays, the equalization of the audio input and delays, and the output gain. Delays can be played backwards, and the output can be set out-of-phase. Finally, a ring modulator can modulate the audio input to create new sounds to be delayed.
  • Chordify, which transforms an incoming signal, ideally individual notes, into a chord of three different pitches.

chordify

  • An audio effects chain inspired by interning at a local radio station. The student helped the owner produce tracks using effects chain presets. But this producers understanding of compressors, EQ, distortion effects… was fairly limited. So the student recreated one of the effects chains into a plugin that only has two adjustable parameters which control multiple parameters inside. 
  • Old Styler, a plug-in that applies sort of a ‘vintage’ effect so that it sounds like from an old radio or an old, black and white movie. Here’s how it sounds.
  • There were some advanced reverbs, including a VST implementation of a state-of-the-art reverberation algorithm known as a Scattering Delay Network (SDN), and a Church reverb incorporating some additional effects to get that ‘church sound’ just right.
  • A pretty amazing cave simulator, with both reverb and random water droplet sounds as part of the VST plug-in.

CaveSimulator

  • A bit crusher, which also had noise, downsampling and filtering to allow lots of ways to degrade the signal.
  • A VST implementation of the Euclidian Algorithm for world rhythms as described by Goddfried Toussaint in his paper The Euclidean Algorithm Generates Traditional Musical Rhythms.
  • A mid/side processor, with excellent analysis to verify that the student got the implementation just right.
  • Multi-functional distortion pedal. Guitarists often compose music in their bedroom and would benefit from having an effect to facilitate filling the song with a range of sounds, traditionally belonging to other instruments. That’s what this plug-in did, using a lot of clever tricks to widen the soundstage of the guitar.
  • Related to the multi-functional distortion, two students created multiband distortion effects.
  • A Python project that separates a track into harmonic, percussive, and residual components which can be adjusted individually.
  • An effect that attempts to resynthesise any audio input with sine wave oscillators that take their frequencies from the well-tempered scale. This goes far beyond auto-tune, yet can be quite subtle.
  • A source separator plug-in based on Dan Barry’s ADRESS algorithm, described here and here. Along with Mikel Gainza, Dan Barry cofounded the company Sonic Ladder, which released the successful software Riffstation, based on their research.

There were many other interesting assignments, including several variations on tape emulation. But this selection really shows both the talent of the students and the possibilities to create new and interesting sounds.

What’s up with the Web Audio API?

Recently, we’ve been doing a lot of audio development for applications running in the browser, like with the procedural audio and sound synthesis system FXive, or the Web Audio Evaluation Tool (WAET). The Web Audio API is part of HTML5 and its a high level Application Programming Interface with a lot of built-in functions for processing and generating sound. The idea is that its what you need to have any audio application (audio effects, virtual instruments, editing and analysis tools…) running as javascript in a web browser.

It uses a dataflow model like LabView and media-focused languages like Max/MSP, Pure Data and Reaktor,. So you create oscillators, connect them to filters, combine them and then connect that to output to play out the sound. But unlike the others, its not graphical, since you write it as JavaScript like most code that runs client-side on a web browser.

Sounds great, right? And it is. But there were a lot of strange choices that went into the API. They don’t make it unusable or anything like that, but it does sometimes leave you kicking in frustration and thinking the coding would be so much easier if only… Here’s a few of them.

  • There’s no built-in noise signal generator. You can create sine waves, sawtooth waves, square waves… but not noise. Generating audio rate random numbers is built in to pretty much every other audio development environment, and in almost every web audio application I’ve seen, the developers have redone it themselves, with ScriptProcessors, AudioWorklets, buffered noise Classes or methods.
  • The low pass, high pass, low shelving and high shelving filters in the Web Audio API are not the standard first order designs, as taught in signal processing and described in [1, 2] and lots of references within. The low pass and high pass are resonant second order filters, and the shelving filters are the less common alternatives to the first order designs. This is ok for a lot of cases where you are developing a new application with a bit of filtering, but its a major pain if you’re writing a web version of something written in MATLAB, Puredata or lots and lots of other environments where the basic low and high filters are standard first order designs.
  • The oscillators come with a detune property that represents detuning of oscillation in hundredths of a semitone, or cents. I suppose its a nice feature if you are using cents on the interface and dealing with musical intervals. But its the same as changing the frequency parameter and doesn’t save a single line of code. There are other useful parameters which they didn’t give the ability to change, like phase, or the duty rate of a square wave. https://github.com/Flarp/better-oscillator is an alternative implementation that addresses this.
  • The square, sawtooth and triangle waves are not what you think they are. Instead of the triangle wave being a periodic ramp up and ramp down, they are the sum of a few terms in the Fourier series that approximate this. This is nice if you want to avoid aliasing, but wrong for every other use. It took me a long time to figure this out when I tried modulating a signal by a square wave to turn it on and off. Again, https://github.com/Flarp/better-oscillator gives an alternative implementation with the actual waveforms.
  • The API comes with a biquad filter that allows you to create almost arbitrary infinite impulse response filters. But you can’t change the coefficients once its created. So its useless for most web audio applications, which involve some control or interaction.

Despite all that, its pretty amazing. And you can get around all these issues since you can always write your own audio worklets for any audio processing and generation. But you shouldn’t have to.

We’ve published a few papers on the Web Audio API and what you can do with it, so please check them out if you are doing some R&D involving it.

 

[1] J. D. Reiss and A. P. McPherson, “Audio Effects: Theory, Implementation and Application“, CRC Press, 2014.

[2] V. Valimaki and J. D. Reiss, ‘All About Audio Equalization: Solutions and Frontiers,’ Applied Sciences, special issue on Audio Signal Processing, 6 (5), May 2016.

[3] P. Bahadoran, A. Benito, T. Vassallo, J. D. Reiss, FXive: A Web Platform for Procedural Sound Synthesis, Audio Engineering Society Convention 144, May 2018

[4] N. Jillings, Y. Wang, R. Stables and J. D. Reiss, ‘Intelligent audio plugin framework for the Web Audio API,’ Web Audio Conference, London, 2017

[5] N. Jillings, Y. Wang, J. D. Reiss and R. Stables, “JSAP: A Plugin Standard for the Web Audio API with Intelligent Functionality,” 141st Audio Engineering Society Convention, Los Angeles, USA, 2016.

[6] N. Jillings, D. Moffat, B. De Man, J. D. Reiss, R. Stables, ‘Web Audio Evaluation Tool: A framework for subjective assessment of audio,’ 2nd Web Audio Conf., Atlanta, 2016

[7] N. Jillings, B. De Man, D. Moffat and J. D. Reiss, ‘Web Audio Evaluation Tool: A Browser-Based Listening Test Environment,’ Sound and Music Computing (SMC), July 26 – Aug. 1, 2015

Cross-adaptive audio effects: automatic mixing, live performance and everything in between

Our paper on Applications of cross-adaptive audio effects: automatic mixing, live performance and everything in between has just been published in Frontiers in Digital Humanities. It is a systematic review of cross-adaptive audio effects and their applications.

Cross-adaptive effects extend the boundaries of traditional audio effects by having many inputs and outputs, and deriving their behavior based on analysis of the signals and their interaction. This allows the audio effects to adapt to different material, seemingly being aware of what they do and listening to the signals. Here’s a block diagram showing how a cross-adaptive audio effect modifies a signal.

cross-adaptive architecture

Last year, we published a paper reviewing the history of automatic mixing, almost exactly ten years to the day from when automatic mixing was first extended beyond simple gain changes for speech applications. These automatic mixing applications rely on cross-adaptive effects, but the effects can do so much more.

Here’s an example automatic mixing system from our youtube channel, IntelligentSoundEng.

When a musician uses the signals of other performers directly to inform the timbral character of her own instrument, it enables a radical expansion of interaction during music making. Exploring this was the goal of the Cross-adaptive processing for musical intervention project, led by Oeyvind Brandtsegg, which we discussed in an earlier blog entry. Using cross-adaptive audio effects, musicians can exert control over each the instruments and performance of other musicians, both leading to new competitive aspects and new synergies.

Here’s a short video demonstrating this.

Despite various projects, research and applications involving cross-adaptive audio effects, there is still a fair amount of confusion surrounding the topic. There are multiple definitions, sometimes even by the same authors. So this paper gives a brief history of applications as well as a classification of effects types and clarifies issues that have come up in earlier literature. It further defines the field, lays a formal framework, explores technical aspects and applications, and considers the future from artistic, perceptual, scientific and engineering perspectives.

Check it out!

Analogue matched digital EQ: How far can you go linearly?

(Background post for the paper “Improving the frequency response magnitude and phase of
analogue-matched digital filters” by John Flynn & Josh Reiss for AES Milan 2018)

Professional audio mastering is a field that is still dominated by analogue hardware. Many mastering engineers still favour their go-to outboard compressors and equalisers over digital emulations. As a practising mastering engineer myself, I empathise. Quality analogue gear has a proven track record in terms of sonic quality spanning about a century. Even though digital approximations of analogue tools have gotten better, particularly over the past decade, I too have tended to reach for analogue hardware. However, through my research at Queen Mary with Professor Josh Reiss, that is changing.

When modelling an analogue EQ, a lot of focus has been in modelling distortions and other non-linearities, we chose to look at the linear component. Have we reached a ceiling in terms of modelling an analogue prototype filter in the digital domain? Can we do better? We found that yes there was room for improvement and yes we can do better.

The milestone of research in this area is Orfanidis’ 1997 paper “Digital parametric equalizer design with prescribed Nyquist-frequency gain“, the first major improvement over the bilinear transform which has a reknowned ‘cramped’ sound in the high frequencies. Basically, the bilinear transform is what all first generation digital equalisers is based on. It’s high frequencies towards 20kHz drops sharply, giving a ‘closed/cramped’ sound. Orfanidis and later improvements by Massberg [9] & Gunness/Chauhan [10] give a much better approximation of an analogue prototype.

blt

However [9],[10] improve magnitude, they don’t capture analogue phase. Bizarrely, the bilinear transform performs reasonably well on phase. So we knew it was possible.

So the problem is: how do you get a more accurate magnitude match to analogue than [9],[10]? While also getting a good match to phase? Many attempts, including complicated iterative Parks/McClellen filter design approaches, fell flat. It turned out that Occam was right, in this case a simple answer was the better answer.

By combining a matched-z transform, frequency sampling filter design and a little bit of clever coefficient manipulation, we achieved excellent results. A match to the analogue prototype to an arbitrary degree. At low filter lengths you get a filter that performs as well as [9],[10] in magnitude but also matches analogue phase. By using longer filter lengths the match to analogue is extremely precise, in both magnitude and phase (lower error is more accurate)

error-vs

 

Since submitting the post I have released the algorithm in a plugin with my mastering company and been getting informal feedback from other mastering engineers about how this sounds in use.

balance-mastering-analog-magpha-eq-plugin-small-new

Overall the word back has been overwhelmingly positive, with one engineer claiming it to be the “the best sounding plugin EQ on the market to date”. It’s nice know that those long hours staring at decibel error charts have not been in vain.

Are you heading to AES Milan next month? Come up and say hello!

 

Creative projects in sound design and audio effects

This past semester I taught two classes (modules), Sound Design and Digital Audio Effects. In both classes, the final assignment involves creating an original work that involves audio programming and using concepts taught in class. But the students also have a lot of free reign to experiment and explore their own ideas.

The results are always great. Lots of really cool ideas, many of which could lead to a publication, or would be great to listen to regardless of the fact that it was an assignment. Here’s a few examples.

From the Sound Design class;

  • Synthesizing THX’s audio trademark, Deep Note. This is a complex sound, ‘a distinctive synthesized crescendo that glissandos from a low rumble to a high pitch’. It was created by the legendary James Moorer, who is responsible for some of the greatest papers ever published in the Journal of the Audio Engineering Society.
  • Recreating the sound of a Space Shuttle launch, with separate components for ‘Air Burning/Lapping’ and ‘Flame Eruption/Flame Exposing’ by generating the sounds of the Combustion chain and the Exhaust chain.
  • A student created a soundscape inspired by the 1968 Romanian play ‘Jonah (A four scenes tragedy)’,  written by Marin Sorescu. Published in 1968, when Romania was ruled by the communist regime. By carefully modulating the volume of filtered noise, she was able to achieve some great synthesis of waves crashing on a shore.
  • One student made a great drum and bass track, manipulating samples and mixing in some of his own recorded sounds. These included a nice ‘thud’ by filtering the sound of a tightened towel, percussive sounds by shaking rice in a plastic container. and the sizzling sound of frying bacon for tape hiss.
  • Synthesizing the sound of a motorbike, including engine startup, gears and driving sound, gear lever click and indicator.
  • A short audio piece to accompany a ghost story, using synthesised and recorded sounds. What I really like is that the student storyboarded it.

storyboard

  • A train on a stormy day, which had the neat trick of converting a footstep synthesis model into the chugging of a train.
  • The sounds of the London Underground, doors sliding and beeping, bumps and breaks… all fully synthesized.

And from the Digital Audio Effects class;

  • An autotune specifically for bass guitar. We discussed auto-tune and its unusual history previously.
  • Sound wave propagation causes temperature variation, but speed of sound is a function of temperature. Notably, the positive half cycle of a wave (compression) causes an increase in temperature and velocity, while the negative half (rarefaction) causes a decrease in temperature and velocity, turning a sine wave into something like a sawtooth. This effect is only significant in high pressure sound waves. Its also frequency dependent; high frequency components travel faster than low frequency components.
    Mark Daunt created a MIDI instrument as a VST Plug-in that generates sounds based on this shock-wave formation formula. Sliders allow the user to adjust parameters in the formula and use a MIDI keyboard to play tones that express characteristics of the calculated waveforms.

  • Synthesizing applause, a subject which we have discussed here before. The student has been working in this area for another project, but made significant improvements for the assignment, including adding presets for various conditions.
  • A student devised a distortion effect based on waveshaping in the form of a weighted sum of Legendre polynomials. These are interesting functions and her resulting sounds are surprising and pleasing. Its the type of work that could be taken a lot further.
  • One student had a bug in an implementation of a filter. Noticing that it created some interesting sounds, he managed to turn it into a cool original distortion effect.
  • There’s an Octagon-shaped room with strange acoustics here on campus. Using a database of impulse response measurements from the room, one student created a VST plug-in that allows the user to hear how audio sounds for any source and microphone positions. In earlier blog entries, we discussed related topics, acoustic reverberators and anechoic chambers.

Screen Shot 2018-03-22 at 20.21.58-14

  • Another excellent sounding audio effect was a spectral delay using the phase vocoder, with delays applied differently depending on frequency bin. This created a sound like ‘stars falling from the sky’. Here’s a sine sweep before and after the effect is applied.

https://soundcloud.com/justjosh71/sine-sweep-original

There were many other interesting assignments (plucked string effect for piano synthesizer, enhanced chorus effects, inharmonic resonator, an all-in-one plug-in to recreate 80s rock/pop guitar effects…). But this selection really shows both the talent of the students and the possibilities to create new and interesting sounds.

Ten Years of Automatic Mixing

tenyears

Automatic microphone mixers have been around since 1975. These are devices that lower the levels of microphones that are not in use, thus reducing background noise and preventing acoustic feedback. They’re great for things like conference settings, where there may be many microphones but only a few speakers should be heard at any time.

Over the next three decades, various designs appeared, but it didn’t really grow much from Dan Dugan’s original Dan Dugan’s original concept.

Enter Enrique Perez Gonzalez, a PhD student researcher and experienced sound engineer. On September 11th, 2007, exactly ten years ago from the publication of this blog post, he presented a paper “Automatic Mixing: Live Downmixing Stereo Panner.” With this work, he showed that it may be possible to automate not just fader levels in speech applications, but other tasks and for other applications. Over the course of his PhD research, he proposed methods for autonomous operation of many aspects of the music mixing process; stereo positioning, equalisation, time alignment, polarity correction, feedback prevention, selective masking minimization, etc. He also laid out a framework for further automatic mixing systems.

Enrique established a new field of research, and its been growing ever since. People have used machine learning techniques for automatic mixing, applied auditory neuroscience to the problem, and explored where the boundaries lie between the creative and technical aspects of mixing. Commercial products have arisen based on the concept. And yet all this is still only scratching the surface.

I had the privilege to supervise Enrique and have many anecdotes from that time. I remember Enrique and I going to a talk that Dan Dugan gave at an AES convention panel session and one of us asked Dan about automating other aspects of the mix besides mic levels. He had a puzzled look and basically said that he’d never considered it. It was also interesting to see the hostile reactions from some (but certainly not all) practitioners, which brings up lots of interesting questions about disruptive innovations and the threat of automation.

wimp3

Next week, Salford University will host the 3rd Workshop on Intelligent Music Production, which also builds on this early research. There, Brecht De Man will present the paper ‘Ten Years of Automatic Mixing’, describing the evolution of the field, the approaches taken, the gaps in our knowledge and what appears to be the most exciting new research directions. Enrique, who is now CTO of Solid State Logic, will also be a panellist at the Workshop.

Here’s a video of one of the early Automatic Mixing demonstrators.

And here’s a list of all the early Automatic Mixing papers.

  • E. Perez Gonzalez and J. D. Reiss, A real-time semi-autonomous audio panning system for music mixing, EURASIP Journal on Advances in Signal Processing, v2010, Article ID 436895, p. 1-10, 2010.
  • Perez-Gonzalez, E. and Reiss, J. D. (2011) Automatic Mixing, in DAFX: Digital Audio Effects, Second Edition (ed U. Zölzer), John Wiley & Sons, Ltd, Chichester, UK. doi: 10.1002/9781119991298. ch13, p. 523-550.
  • E. Perez Gonzalez and J. D. Reiss, “Automatic equalization of multi-channel audio using cross-adaptive methods”, Proceedings of the 127th AES Convention, New York, October 2009
  • E. Perez Gonzalez, J. D. Reiss “Automatic Gain and Fader Control For Live Mixing”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, New York, October 18-21, 2009
  • E. Perez Gonzalez, J. D. Reiss “Determination and correction of individual channel time offsets for signals involved in an audio mixture”, 125th AES Convention, San Francisco, USA, October 2008
  • E. Perez Gonzalez, J. D. Reiss “An automatic maximum gain normalization technique with applications to audio mixing.”, 124th AES Convention, Amsterdam, Netherlands, May 2008
  • E. Perez Gonzalez, J. D. Reiss, “Improved control for selective minimization of masking using interchannel dependency effects”, 11th International Conference on Digital Audio Effects (DAFx), September 2008
  • E. Perez Gonzalez, J. D. Reiss, “Automatic Mixing: Live Downmixing Stereo Panner”, 10th International Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, September 10-15, 2007