Ten Years of Automatic Mixing

tenyears

Automatic microphone mixers have been around since 1975. These are devices that lower the levels of microphones that are not in use, thus reducing background noise and preventing acoustic feedback. They’re great for things like conference settings, where there may be many microphones but only a few speakers should be heard at any time.

Over the next three decades, various designs appeared, but it didn’t really grow much from Dan Dugan’s original Dan Dugan’s original concept.

Enter Enrique Perez Gonzalez, a PhD student researcher and experienced sound engineer. On September 11th, 2007, exactly ten years ago from the publication of this blog post, he presented a paper “Automatic Mixing: Live Downmixing Stereo Panner.” With this work, he showed that it may be possible to automate not just fader levels in speech applications, but other tasks and for other applications. Over the course of his PhD research, he proposed methods for autonomous operation of many aspects of the music mixing process; stereo positioning, equalisation, time alignment, polarity correction, feedback prevention, selective masking minimization, etc. He also laid out a framework for further automatic mixing systems.

Enrique established a new field of research, and its been growing ever since. People have used machine learning techniques for automatic mixing, applied auditory neuroscience to the problem, and explored where the boundaries lie between the creative and technical aspects of mixing. Commercial products have arisen based on the concept. And yet all this is still only scratching the surface.

I had the privilege to supervise Enrique and have many anecdotes from that time. I remember Enrique and I going to a talk that Dan Dugan gave at an AES convention panel session and one of us asked Dan about automating other aspects of the mix besides mic levels. He had a puzzled look and basically said that he’d never considered it. It was also interesting to see the hostile reactions from some (but certainly not all) practitioners, which brings up lots of interesting questions about disruptive innovations and the threat of automation.

wimp3

Next week, Salford University will host the 3rd Workshop on Intelligent Music Production, which also builds on this early research. There, Brecht De Man will present the paper ‘Ten Years of Automatic Mixing’, describing the evolution of the field, the approaches taken, the gaps in our knowledge and what appears to be the most exciting new research directions. Enrique, who is now CTO of Solid State Logic, will also be a panellist at the Workshop.

Here’s a video of one of the early Automatic Mixing demonstrators.

And here’s a list of all the early Automatic Mixing papers.

  • E. Perez Gonzalez and J. D. Reiss, A real-time semi-autonomous audio panning system for music mixing, EURASIP Journal on Advances in Signal Processing, v2010, Article ID 436895, p. 1-10, 2010.
  • Perez-Gonzalez, E. and Reiss, J. D. (2011) Automatic Mixing, in DAFX: Digital Audio Effects, Second Edition (ed U. Zölzer), John Wiley & Sons, Ltd, Chichester, UK. doi: 10.1002/9781119991298. ch13, p. 523-550.
  • E. Perez Gonzalez and J. D. Reiss, “Automatic equalization of multi-channel audio using cross-adaptive methods”, Proceedings of the 127th AES Convention, New York, October 2009
  • E. Perez Gonzalez, J. D. Reiss “Automatic Gain and Fader Control For Live Mixing”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, New York, October 18-21, 2009
  • E. Perez Gonzalez, J. D. Reiss “Determination and correction of individual channel time offsets for signals involved in an audio mixture”, 125th AES Convention, San Francisco, USA, October 2008
  • E. Perez Gonzalez, J. D. Reiss “An automatic maximum gain normalization technique with applications to audio mixing.”, 124th AES Convention, Amsterdam, Netherlands, May 2008
  • E. Perez Gonzalez, J. D. Reiss, “Improved control for selective minimization of masking using interchannel dependency effects”, 11th International Conference on Digital Audio Effects (DAFx), September 2008
  • E. Perez Gonzalez, J. D. Reiss, “Automatic Mixing: Live Downmixing Stereo Panner”, 10th International Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, September 10-15, 2007
Advertisements

What the f*** are DFA faders?

I’ve been meaning to write this blog entry for a while, and I’ve finally gotten around to it. At the 142nd AES Convention, there were two papers that really stood out which weren’t discussed in our convention preview or convention wrap-up. One was about Acoustic Energy Harvesting, which we discussed a few weeks ago, and the other was titled ‘The DFA Fader: Exploring the Power of Suggestion in Loudness The DFA Fader: Exploring the Power of Suggestion in Loudness Judgments.’ When I mentioned this paper to others, their response was always the same, “What’s a DFA Fader?” . Well, the answer is hinted at in the title of this blog entry.

The basic idea is that musicians often give instructions to the sound engineer that he or she can’t or doesn’t want to follow. For instance, a vocalist might say “Turn me up” in a soundcheck, but the sound engineer knows that the vocals are at a nice level already and any more amplification might cause feedback. Sometimes, this sort of thing can be communicated back to the musician in a nice way. But there’s also the fallback option; a fader on the mixing console that “Does F*** All”, aka DFA. The engineer can slide the fader or twiddle an unconnected dial, smile back and say ‘Ok, does this sound a bit better?’.

A couple of companies have had fun with this idea. Funk Logic’s Palindrometer, shown below, is nothing more than a filler for empty rack space. Its an interface that looks like it might do something, but at best, it just flashes some LEDs when one toggles switches and turns the knobs.

pal_main

RANE have the PI 14 Pseudoacoustic Infector . Its worth checking out the full description, complete with product review and data sheets. I especially like the schematic, copied below.

pi14bd.png

And in 2014, our own Brecht De Man  released The Wire, a freely available VST and AudioUnit plug-in that emulates a gold-plated, balanced, 100% lossless audio connector.

TheWire

Anyway, the authors of this paper had the bright idea of doing legitimate subjective evaluation of DFA faders. They didn’t make jokes in the paper, not even to explain the DFA acronym. They took 22 participants and divided them into an 11 person control group and an 11 person test group. In the control group, each subject participated in twenty trials where two identical musical excerpts were presented and the subject had to rate the difference in loudness of vocals between the two excerpts. Only ten excerpts were used, so each pair was used in two trials. In the test group, a sound engineer was present and he made scripted suggestions that he was adjusting the levels in each trial. He could be seen, but participants couldn’t see his hands moving on the console.

Not surprisingly, most trials showed a statistically significant difference between test and control groups, confirming the effectiveness of verbal suggestions associated with the DFA fader. And the authors picked up on an interesting point; results were far more significant for stimuli where vocals were masked by other instruments. This links the work to psychoacoustic studies. Not only is our perception of loudness and timbre influenced by the presence of a masker, but we have a more difficult time judging loudness and hence are more likely to accept the suggestion from an expert.

The authors did an excellent job of critiquing their results. But unfortunately, the full data was not made available with the paper. So we are left with a lot of questions. What were these scripted suggestions? It could make a big difference if the engineer said “I’m going to turn the vocals way up” versus “Let me try something. Does it sound any different now?” And were some participants immune to the suggestions? And because participants couldn’t see a fader being adjusted (interviews with sound engineers had stressed the importance of verbal suggestions), we don’t know how that could influence results.

There is something else that’s very interesting about this. It’s a ‘false experiment’. The whole listening test is a trick since for all participants and in all trials, there was never any loudness differences between the two presented stimuli. So indirectly, it looks at an ‘auditory placebo effect’ that is more fundamental than DFA faders. What were the ratings for loudness differences that participants gave? For the control group especially, did they judge these differences to be small because they trusted their ears, or large because they knew that loudness judging is the nature of the test? Perhaps there is a natural uncertainty in loudness perception regardless of bias. How much weaker does a listener’s judgment become when repeatedly asked to make very subtle choices in a listening test? There’s been some prior work tackling some of these questions, but I think this DFA Faders paper opened up a lot of avenues of interesting research.

sónar innovation challenge 2017: the enhanced dj assistant

Screen Shot 2017-06-27 at 19.17.01

The Audio Engineering team (C4DMwas present in this year’s edition of Sónar+D in Barcelona. Sónar+D is an international conference integrated to Sónar festival that focus on the interdisciplinary approach between creativity and technology.

The Sónar Innovation Challenge (SIC), co-organized by the MTG, <<is an online and on site platform for the creative minds that want to be one step ahead and experiment with the future of technology. It brings together innovative tech companies and creators, collaborating to solve challenges that will lead to disruptive prototypes showcased in Sónar+D.>>

In this year’s challenge, Marco Martínez was part of the enhanced dj assistant by the Music Technology Group at Universitat Pompeu Fabra, which challenged participants to create a user-friendly, visually appealing and musically motivated system that DJs can use to remix music collections in exciting new ways.

Screen Shot 2017-06-27 at 19.00.34

Thus, after nearly one month of online meetings, the challengers and mentors finally met at Sónar, and during 4 days of intensive brain-storming-programming-prototyping at more than 30°C the team came with ATOMIX:

Screen Shot 2017-06-27 at 19.13.19

Visualize, explore and manipulate atoms of sound from
multitrack recordings, enhancing the creative
possibilities for live artists and DJs.

From multitrack recording (stems) and using advanced algorithms and cutting edge technologies in feature extraction, clustering, synthesis and visualisation. It segments a collection of stems into atoms of sound and groups them by timbre similarity. Thus, through concatenative synthesis, ATOMIX allows you to manipulate and exchange atoms of sound in real-time with professional DAW controls, achieving a one-of-a-kind live music exploration.

The project is still in a prototype stage and we hope to hear news of development very soon.

The beginning of stereo

5a9cc9_6da9661bf6bc4c6bbc8d49e310139509 Alan and Doreen Blumlein wedding photo

The sound reproduction systems for the early ‘talkie’ movies  often had only a single loudspeaker. Because of this, the actors all sounded like they were in the same place, regardless of their position on screen.

In 1931, the electronics and sound engineer Alan Blumlein and his wife Doreen went to see a movie where this monaural sound reproduction occured. According to Doreen, as they were leaving the cinema, Alan said to her ‘Do you realise the sound only comes from one person?’  And she replied, ‘Oh does it?’  ‘Yes.’ he said, ‘And I’ve got a way to make it follow the person’.

The genesis of these ideas is uncertain (though it might have been while watching the movie), but he described them to Isaac Shoenberg, managing director at EMI and Alan’s mentor, in the late summer of 1931. Blumlein detailed his stereo technology in the British patent “Improvements in and relating to Sound-transmission, Sound-recording and Sound-reproducing systems,” which was accepted June 14, 1933.

 

The serendipitous invention of the wah-wah pedal

thEV0K9NAO

In a previous post, we discussed some creative uses of the wah-wah creative uses of the wah-wah effect.

The first wah-wah pedal is attributed to Brad Plunkett in 1966, who worked at Warwick Electronics Inc., which owned Thomas Organ Company. Warwick Electronics acquired the Vox name due to the brand name’s popularity and association with the Beatles. Their subsidiary, Thomas Organ Company, needed a modified design for the Vox amplifier, which had a midrange boost, so that it would be less expensive to manufacture.

In a 2005 interview (M. Vdovin, “Artist Interview: Brad Plunkett,” Universal Audio WebZine, vol. 3, October 2005) Brad Plunkett said, I “came up with a circuit that would allow me to move this midrange boost … As it turned out, it sounded absolutely marvelous while you were moving it. It was okay when it was standing still, but the real effect was when you were moving it and getting a continuous change in harmonic content. We turned that on in the lab and played the guitar through it… I turned the potentiometer and he played a couple licks on the guitar, and we went crazy.

A couple of years later… somebody said to me one time, ‘You know Brad, I think that thing you invented changed music.’”

Acoustic reverberators

Today, reverb is most often added to a recording using artificial reverberators, such as software plug-ins or digital reverb hardware units. But there are a lot of other approaches.

Many recording studios have used special rooms known as reverberation chambers to add reverb to a performance. Elevator shafts and stairwells (as in New York City’s Avatar Recording Studio) work well as highly reverberant rooms. The reverb can also be controlled by adding absorptive materials like curtains and rugs.

Spring reverbs are found in many guitar amplifiers and have been used in Hammond organs. The audio signal is coupled to one end of the spring by a transducer that creates waves traveling through the spring. At the far end of the spring, another transducer converts the motion of the string into an electrical signal, which is then added to the original sound. When a wave arrives at an end of the spring, part of the wave’s energy is reflected. However, these reflections have different delays and attenuations from what would be found in a natural acoustic environment, and there may be some interaction between the waves in a spring, thus this results in a slightly unusual (though not unpleasant) reverb sound.

Often several springs with different lengths and tensions are enclosed in a metal box, known as the reverb pan, and used together. This avoids uniform behavior and creates a more realistic, pseudorandom series of echoes. In most reverb units though, the spring lengths and tensions are fixed in the design process, and not left to the user to control.

The plate reverb is similar to a spring reverb, but instead of springs, the  transducers are attached at several locations on a metal plate. These transducers send vibrations through the plate, and reflections are produced whenever a wave reaches the plate’s edge. The location of the transducers and the damping of the plate can be adjusted to control the reverb. However, plate reverbs are expensive and bulky, and hence not widely used.

Water tank reverberators have also been used. Here, the audio signal is modulated with an ultrasonic signal and transmitted through a tank of water. The output is then demodulated, resulting in the reverberant output sound. Other reverberators include pipes with microphones placed at various points.

These acoustic and analogue reverberators can be interesting to create and use, but they lack the simplicity and ease of use of digital reverberators. Ultimately, the choice of implementation is a matter of taste.

Doppler, Leslie and Hammond

Donald Leslie (1913–2004) bought a Hammond organ in 1937, as a substitute for a pipe organ. But at home in a small room, it could not reproduce the grand sound of an organ. Since the pipe organ has different locations for each pipe, he designed a moving loudspeaker.

The Leslie speaker uses an electric motor to move an acoustic horn in a circle around a loudspeaker. Thus we have a moving sound source and a stationary listener, which is a well-known situation that produces the Doppler effect.

It exploits the Doppler effect to produce frequency modulation. The classic Leslie speaker has a crossover that divides the low and high frequencies. It consists of a fixed treble unit with spinning horns, a fixed woofer and spinning rotor. Both the horns (actually, one horn and a dummy used as a counterbalance) and a bass sound baffle rotate, thus creating vibrato due to the changing velocity in the direction of the listener, and tremolo due to the changing distance. The rotating elements can move at varied speeds, or stopped completely. Furthermore, the system is partially enclosed and it uses a rotating speaker port. So the listener hears multiple reflections at different Doppler shifts to produce a chorus-like effect.

The Leslie speaker has been widely used in popular music, especially when the Hammond B-3 organ was played out through a Leslie speaker. This combination can be heard on many classic and progressive rock songs, including hits by Boston, Santana, Steppenwolf, Deep Purple and The Doors. And the Leslie speaker has also found extensive use in modifying guitar and vocal sounds.

Ironically, Donald Leslie had originally tried to license his loudspeaker to the Hammond company, and even gave the Hammond company a special demonstration. But at the time, Laurens Hammond (founder of the Hammond organ company) did not like the concept at all.