AI powered audio mixing start-up RoEx closes investment round

We’ve been involved in the founding of several start-ups based in part on our research; LandR, Waveshaper AI, and Nemisindo. More recently, we co-founded RoEx, led by alumnus Dave Ronan. RoEx is a music tech start-up which offers AI-powered audio mixing services. They provide smart solutions for audio production to content creators and companies of all sizes.

As a musician, content creator or bedroom producer, you might have already faced an unpleasant situation when you realised that the mixing process is hard, takes time and money, and distracts you from the creative process. Hence RoEx decided to help people like you by creating intelligent audio production tools to assist in the music creation process. Not only do they want to save your time, but they also want to ensure that their solutions will help you get as close as possible to the sound of professional content.

You can try out some of these tools directly on their website.​

We’re thrilled to announce that RoEx has closed its investment seed round and they are ready to take the next step in growing their business and expanding the team.

RoEx’s goal has always been to bring cutting-edge audio technology to the music industry, and this funding will allow them to accelerate our progress and bring our innovative solutions to a wider audience.

RoEx is grateful to our investors Haatch and Queen Mary University of London for their confidence in the team and the vision, and they’re excited to continue growing and making a positive impact on the music world.

As RoEx move forward, they will be focusing on expanding the team and building partnerships to bring their technology to the next level. They can’t wait to share updates on their progress and they appreciate your support as they take this journey.

#musictech #startup #ai #innovation #musicindustry #mixingandmastering #teamgrowth

No alternative text description for this image

Based on the original post by Dave Ronan.

Engine sounds for game engines

Engines and motors feature heavily in video games, but they can be very problematic. As game car audio expert Greg Hill said, “For engine sounds most people think we just record a car and throw the sound into a folder and it magically just works in the game. Game sounds are interactive – unlike passive sound like movies where the sound is baked onto an audio track… sound has to be recorded a certain way so it can be decomposed and reconstructed into a ‘sonic model’ and linked to the physics so the player has full control over every parameter.”

And therein lies the issue; recorded sounds are fixed, but the game sounds need to be adaptable and controllable. One can get around this, but only with huge effort.

So Nemisindo (the Zulu word for Sound Effects, our start-up company) have brought their deep and advanced engine sound generator to Unity as an audio plugin.

The Nemisindo Engine implements a flexible, controllable and realistic engine for all your game vehicles. It is a Unity native audio plugin and can generate sound effects in real time, completely procedurally. The plugin offers an interactive interface with 14 parameters and 16 realistic presets (Formula One, monster truck, motorbike…). It also offers various functions to change the parameters of the sound effect at runtime. Hence the parameters can be linked to any in-game events which can lead to very organic and interesting results.

Here’s a demonstration video.

The engine is available as a native Unity audio plugin at https://assetstore.unity.com/packages/tools/audio/nemisindo-engine-procedural-sound-effects-222246

You can also try out the engine model on Nemisindo’s online sound design platform at

https://nemisindo.com/models/engine-advanced.html

Submit your research paper to the 152nd AES Convention

The next Audio Engineering Society Convention will be in May, in the Hague, the Netherlands. Its expected to be the first major AES event with an in-person presence (though it has an online component too) since the whole Covid situation began. It will cover the whole field of audio engineering, with workshops, panel discussions, tutorials, keynotes, recording competitions and more. And attendees cover the full range of students, educators, researchers, audiophiles, professional engineers and industry representatives.
I’m always focused on the Technical Program for these events, where lots of new research is published and presented, and I expect this one to be great. Just based on some expected submissions that I know of, there’s sure to be some great papers on sound synthesis, game audio, immersive and spatial audio, higher quality and deeper understanding of audio effects, plenty of machine learning and neural network, novel mixing and mastering tools, and lots of new psychoacoustics research.
And that’s just the ones I’ve heard about!
Its definitely not too late to submit your own work, see the Call for Submissions. The deadline for full paper submissions (Category 1) or abstract + precis submissions (Category 2) is February 15th. And the deadline for abstract-only submissions (Category 3) is March 1st. In all cases, you submit a full paper for the final version if accepted (though for Category 3 this is optional). So the main difference between the 3 categories is the depth of reviewing, from full peer review for initial paper submissions to ‘light touch’ reviewing for an initial abstract submission.
For those who aren’t familiar with it, great research has been, and continues to be, presented at AES conventions. The very first music composition on a digital computer was presented at the 9th AES Convention in 1957. Schroeder’s reverberator first appeared there, the invention of the parametric equalizer was announced and explained there in 1972, Farina’s work on the swept sine technique for room response estimation was unveiled there, and has received over 1365 citations. Other famous firsts from the Technical program include the introduction of Feedback Delay Networks, Gardner’s famous paper on zero delay convolution, now used in almost all fast convolution algorithms, the unveiling of Spatial audio object coding, and the Gerzon-Craven noise shaping theorem, which is at the heart of many A to D and D to A converters.
So please consider submitting your research there, and I hope to see you there too, whether virtually or in person.

Nemisindo, our new spin-out, launches online sound design service

We haven’t done a lot of blogging recently, but for a good reason; there’s an inverse relationship between how often we post blog entries and how busy we are trying to do something interesting. Now we’ve done it, we can talk about it, and today, we can launch it!

Procedural audio is a big area of research for us, which we have discussed in previous blog entries about aeroacoustics, whistles, swinging swords , propellers and thunder. This is sound synthesis, but with some additional requirements. Its usually intended for use in interactive content (games), so it needs to generate sound in real-time, and adapt to changing inputs. 

There are some existing efforts to offer procedural audio. However, they usually focus on a few specific sounds, which means sound designers still need sound effect libraries for most sound effects. And some efforts still involve manipulating sound samples. Which means they aren’t truly procedural. But if you can create any sound effect, then you can do away with the sample libraries (almost) entirely, and procedurally generate entire auditory worlds.

And we’ve created a company that aims to do just that. Nemisindo, named after the Zulu for “sounds/noise” offer sound design services based on their innovative procedural audio technology. They are launching a new online service, https://nemisindo.com, that allows users to create sound effects for games, film and VR without the need for vast libraries of sounds.

The following video gives a taste of the technology and the range of services they offer.

Nemisindo’s new platform provides a browser-based service with tools to create sounds from over 70 classes (engines, footsteps, explosions…) and over 700 preselected settings (diesel generator engine, motorbike, Jetsons jet…). It can be used to create almost any sound effect from scratch, and in real-time, based on intuitive controls guided by the user.

If someone wants a ‘whoosh’ sound for their game, or footsteps, gunshots, a raging fire or a gentle summer shower, they just tell the system what they’re looking for and adjust the sound while it’s being created. And unlike other technologies that simply use pre-recorded sounds, Nemisindo’s platform generates sounds that have never been recorded, a dragon roaring, for instance, light sabres swinging and space cannons firing. These sound effects can also be shaped and crafted at the point of creation by the user, breaking through limitations of sampled sounds.

Nemisindo has already caught the attention of Epic Games, with the spinout receiving an Epic MegaGrant to develop procedural audio for the Unreal game engine. 

The new service from Nemisindo launches today (18 August 2021) and can be accessed at nemisindo.com. For the first month, Nemisindo is offering a free trial period allowing registered users to download sounds for free. After the trial period ends, the system is still free to use, but sounds can be downloaded at a low individual cost or with a paid monthly subscription.

We encourage you to register and check it out.

The Nemisindo team can be reached at info@nemisindo.com .

Congratulations, Dr. Marco Martinez Ramirez

Today one of our PhD student researchers, Marco Martinez Ramirez, successfully defended his PhD. The form of these exams, or vivas, varies from country to country, and even institution to institution, which we discussed previously. Here, its pretty gruelling; behind closed doors, with two expert examiners probing every aspect of the PhD. And it was made even more challenging since it was all online due to the virus situation.
Marco’s PhD was on ‘Deep learning for audio effects modeling.’

Audio effects modeling is the process of emulating an audio effect unit and seeks to recreate the sound, behaviour and main perceptual features of an analog reference device. Both digital and analog audio effect units  transform characteristics of the sound source. These transformations can be linear or nonlinear, time-invariant or time-varying and with short-term and long-term memory. Most typical audio effect transformations are based on dynamics, such as compression; tone such as distortion; frequency such as equalization; and time such as artificial reverberation or modulation based audio effects.

Simulation of audio processors is normally done by designing mathematical models of these systems. Its very difficult because it seeks to accurately model all components within the effect unit, which usually contains mechanical elements together with nonlinear and time-varying analog electronics. Most audio effects models are either simplified or optimized for a specific circuit or  effect and cannot be efficiently translated to other effects.

Marco’s thesis explored deep learning architectures for audio processing in the context of audio effects modelling. He investigated deep neural networks as black-box modelling strategies to solve this task, i.e. by using only input-output measurements. He proposed several different DSP-informed deep learning models to emulate each type of audio effect transformations.

Marco then explored the performance of these models when modeling various analog audio effects, and analyzed how the given tasks are accomplished and what the models are actually learning. He investigated virtual analog models of nonlinear effects, such as a tube preamplifier; nonlinear effects with memory, such as a transistor-based limiter; and electromechanical nonlinear time-varying effects, such as a Leslie speaker cabinet and plate and spring reverberators.

Marco showed that the proposed deep learning architectures represent an improvement of the state-of-the-art in black-box modeling of audio effects and the respective directions of future work are given.

His research also led to a new start-up company, TONZ, which build on his machine learning techniques to provide new audio processing interactions for the next generation of musicians and music makers.

Here’s a list of some of Marco’s papers that relate to his PhD research while a member of the Intelligent Sound Engineering team.

Congratulations again, Marco!

Awesome student projects in sound design and audio effects

I teach classes in Sound Design and Digital Audio Effects. In both classes, the final assignment involves creating an original work that involves audio programming and using concepts taught in class. But the students also have a lot of free reign to experiment and explore their own ideas. The results are always great. Lots of really cool ideas, many of which could lead to a publication, or would be great to listen to regardless of the fact that it was an assignment.

The last couple of years, I posted about it here and here.  Here’s a few of the projects this year.

From the Sound Design class;

  • A procedural audio model of a waterfall. The code was small, involving some filtered noise sources with random gain changes, but the result was great.waterfall2
  • An interactive animation of a girl writing at a desk during a storm. There were some really neat tricks to get a realistic thunder sound.
  • A procedurally generated sound scene for a walk through the countryside. The student found lots of clever ways to generate the sounds of birds, bees, a river and the whoosh of a passing car.
  • New sound design replacing the audio track in a film scene. Check it out.

And from the Digital Audio Effects class;

  • I don’t need to mention anything about the next one. Just read the student’s tweet.

 

  • Rainmaker, a VST plugin that takes an incoming signal and transforms it into a ‘rain’ like sound, starting above the listener and then floating down below.

  • A plugin implementation of the Karplus-Strong algorithm, except an audio sample is used to excite the string instead of a noise burst. It gives really interesting timbral qualities.

  • Stormify, an audio plugin that enables users to add varying levels of rain and wind to the background of their audio, making it appear that the recording took place in inclement weather.
  • An all-in-one plugin for synthesising and sculpting drum-like sounds.
  • The Binaural Phase Vocoder, a VST/AU plugin whereby users can position a virtual sound source in a 3D space and process the sound through an overlap-add phase vocoder.
  • A multiband multi-effect consisting of three frequency bands and three effects on each band: delay, distortion, and tremolo. Despite the seeming complexity, the interface was straightforward and easy to use.

multi-interface

There were many other interesting assignments, including several sonifications of images. But this selection really shows both the talent of the students and the possibilities to create new and interesting sounds.

Intelligent Music Production book is published

9781138055193

Ryan Stables is an occasional collaborator and all around brilliant person. He started the annual Workshop on Intelligent Music Production (WIMP) in 2015. Its been going strong ever since, with the 5th WIMP co-located with DAFx, this past September. The workshop series focuses on the application of intelligent systems (including expert systems, machine learning, AI) to music recording, mixing, mastering and related aspects of audio production or sound engineering.

Ryan had the idea for a book about the subject, and myself (Josh Reiss) and Brecht De Man (another all around brilliant person) were recruited as co-authors. What resulted was a massive amount of writing, editing, refining, re-editing and so on. We all contributed big chunks of content, but Brecht pulled it all together and turned it into something really high quality giving a comprehensive overview of the field, suitable for a wide range of audiences.

And the book is finally published today, October 31st! Its part of the AES Presents series by Focal Press, a division of Routledge. You can get it from the publisher, from Amazon or any of the other usual places.

And here’s the official blurb

Intelligent Music Production presents the state of the art in approaches, methodologies and systems from the emerging field of automation in music mixing and mastering. This book collects the relevant works in the domain of innovation in music production, and orders them in a way that outlines the way forward: first, covering our knowledge of the music production processes; then by reviewing the methodologies in classification, data collection and perceptual evaluation; and finally by presenting recent advances on introducing intelligence in audio effects, sound engineering processes and music production interfaces.

Intelligent Music Production is a comprehensive guide, providing an introductory read for beginners, as well as a crucial reference point for experienced researchers, producers, engineers and developers.

 

Cool sound design and audio effects projects

Every year, I teach two classes (modules), Sound Design and Digital Audio Effects. In both classes, the final assignment involves creating an original work that involves audio programming and using concepts taught in class. But the students also have a lot of free reign to experiment and explore their own ideas. Last year, I had a well received blog entry about the projects.

The results are always great. Lots of really cool ideas, many of which could lead to a publication, or would be great to listen to regardless of the fact that it was an assignment. Here’s a few of the projects this year.

From the Sound Design class;

  • A truly novel abstract sound synthesis (amplitude and frequency modulation) where parameters are controlled by pitch recognition and face recognition machine learning models, using the microphone and the webcam. Users could use their voice and move their face around to affect the sound.
  • An impressive one had six sound models: rain, bouncing ball, sea waves, fire, wind and explosions. It also had a website where each synthesised sound could be compared against real recordings. We couldn’t always tell which was real and which was synthesised!

SoundSelect.png

  • An auditory model of a London Underground train, from the perspective of a passenger on a train, or waiting at a platform. It had a great animation.

train

  • Two projects involved creating interactive soundscapes auralising an image. One involved a famous photo taken by the photographer, Gregory Crewdson. encapsulating  a dark side of suburban America through surreal, cinematic imagery. The other was an estate area, where there are no bodies visible , giving the impression of an eerie atmosphere where background noises and small sounds are given prominence.

And from the Digital Audio Effects class;

  • A create-your-own distortion effect, where the user can interactively modify the wave shaping curve.
  • Input dependant modulation signal based on the physical mass/ spring system
  • A Swedish death metal guitar effect combining lots of effects for a very distinctive sound
  • A very creative all-in-one audio toy, ‘Ring delay’. This  augmented ping-pong delay effect gives controls over the panning of the delays, the equalization of the audio input and delays, and the output gain. Delays can be played backwards, and the output can be set out-of-phase. Finally, a ring modulator can modulate the audio input to create new sounds to be delayed.
  • Chordify, which transforms an incoming signal, ideally individual notes, into a chord of three different pitches.

chordify

  • An audio effects chain inspired by interning at a local radio station. The student helped the owner produce tracks using effects chain presets. But this producers understanding of compressors, EQ, distortion effects… was fairly limited. So the student recreated one of the effects chains into a plugin that only has two adjustable parameters which control multiple parameters inside. 
  • Old Styler, a plug-in that applies sort of a ‘vintage’ effect so that it sounds like from an old radio or an old, black and white movie. Here’s how it sounds.

  • There were some advanced reverbs, including a VST implementation of a state-of-the-art reverberation algorithm known as a Scattering Delay Network (SDN), and a Church reverb incorporating some additional effects to get that ‘church sound’ just right.
  • A pretty amazing cave simulator, with both reverb and random water droplet sounds as part of the VST plug-in.

CaveSimulator

  • A bit crusher, which also had noise, downsampling and filtering to allow lots of ways to degrade the signal.
  • A VST implementation of the Euclidian Algorithm for world rhythms as described by Goddfried Toussaint in his paper The Euclidean Algorithm Generates Traditional Musical Rhythms.
  • A mid/side processor, with excellent analysis to verify that the student got the implementation just right.
  • Multi-functional distortion pedal. Guitarists often compose music in their bedroom and would benefit from having an effect to facilitate filling the song with a range of sounds, traditionally belonging to other instruments. That’s what this plug-in did, using a lot of clever tricks to widen the soundstage of the guitar.
  • Related to the multi-functional distortion, two students created multiband distortion effects.
  • A Python project that separates a track into harmonic, percussive, and residual components which can be adjusted individually.
  • An effect that attempts to resynthesise any audio input with sine wave oscillators that take their frequencies from the well-tempered scale. This goes far beyond auto-tune, yet can be quite subtle.
  • A source separator plug-in based on Dan Barry’s ADRESS algorithm, described here and here. Along with Mikel Gainza, Dan Barry cofounded the company Sonic Ladder, which released the successful software Riffstation, based on their research.

There were many other interesting assignments, including several variations on tape emulation. But this selection really shows both the talent of the students and the possibilities to create new and interesting sounds.

What’s up with the Web Audio API?

Recently, we’ve been doing a lot of audio development for applications running in the browser, like with the procedural audio and sound synthesis system FXive, or the Web Audio Evaluation Tool (WAET). The Web Audio API is part of HTML5 and its a high level Application Programming Interface with a lot of built-in functions for processing and generating sound. The idea is that its what you need to have any audio application (audio effects, virtual instruments, editing and analysis tools…) running as javascript in a web browser.

It uses a dataflow model like LabView and media-focused languages like Max/MSP, Pure Data and Reaktor,. So you create oscillators, connect them to filters, combine them and then connect that to output to play out the sound. But unlike the others, its not graphical, since you write it as JavaScript like most code that runs client-side on a web browser.

Sounds great, right? And it is. But there were a lot of strange choices that went into the API. They don’t make it unusable or anything like that, but it does sometimes leave you kicking in frustration and thinking the coding would be so much easier if only… Here’s a few of them.

  • There’s no built-in noise signal generator. You can create sine waves, sawtooth waves, square waves… but not noise. Generating audio rate random numbers is built in to pretty much every other audio development environment, and in almost every web audio application I’ve seen, the developers have redone it themselves, with ScriptProcessors, AudioWorklets, buffered noise Classes or methods.
  • The low pass, high pass, low shelving and high shelving filters in the Web Audio API are not the standard first order designs, as taught in signal processing and described in [1, 2] and lots of references within. The low pass and high pass are resonant second order filters, and the shelving filters are the less common alternatives to the first order designs. This is ok for a lot of cases where you are developing a new application with a bit of filtering, but its a major pain if you’re writing a web version of something written in MATLAB, Puredata or lots and lots of other environments where the basic low and high filters are standard first order designs.
  • The oscillators come with a detune property that represents detuning of oscillation in hundredths of a semitone, or cents. I suppose its a nice feature if you are using cents on the interface and dealing with musical intervals. But its the same as changing the frequency parameter and doesn’t save a single line of code. There are other useful parameters which they didn’t give the ability to change, like phase, or the duty rate of a square wave. https://github.com/Flarp/better-oscillator is an alternative implementation that addresses this.
  • The square, sawtooth and triangle waves are not what you think they are. Instead of the triangle wave being a periodic ramp up and ramp down, they are the sum of a few terms in the Fourier series that approximate this. This is nice if you want to avoid aliasing, but wrong for every other use. It took me a long time to figure this out when I tried modulating a signal by a square wave to turn it on and off. Again, https://github.com/Flarp/better-oscillator gives an alternative implementation with the actual waveforms.
  • The API comes with a biquad filter that allows you to create almost arbitrary infinite impulse response filters. But you can’t change the coefficients once its created. So its useless for most web audio applications, which involve some control or interaction.

Despite all that, its pretty amazing. And you can get around all these issues since you can always write your own audio worklets for any audio processing and generation. But you shouldn’t have to.

We’ve published a few papers on the Web Audio API and what you can do with it, so please check them out if you are doing some R&D involving it.

 

[1] J. D. Reiss and A. P. McPherson, “Audio Effects: Theory, Implementation and Application“, CRC Press, 2014.

[2] V. Valimaki and J. D. Reiss, ‘All About Audio Equalization: Solutions and Frontiers,’ Applied Sciences, special issue on Audio Signal Processing, 6 (5), May 2016.

[3] P. Bahadoran, A. Benito, T. Vassallo, J. D. Reiss, FXive: A Web Platform for Procedural Sound Synthesis, Audio Engineering Society Convention 144, May 2018

[4] N. Jillings, Y. Wang, R. Stables and J. D. Reiss, ‘Intelligent audio plugin framework for the Web Audio API,’ Web Audio Conference, London, 2017

[5] N. Jillings, Y. Wang, J. D. Reiss and R. Stables, “JSAP: A Plugin Standard for the Web Audio API with Intelligent Functionality,” 141st Audio Engineering Society Convention, Los Angeles, USA, 2016.

[6] N. Jillings, D. Moffat, B. De Man, J. D. Reiss, R. Stables, ‘Web Audio Evaluation Tool: A framework for subjective assessment of audio,’ 2nd Web Audio Conf., Atlanta, 2016

[7] N. Jillings, B. De Man, D. Moffat and J. D. Reiss, ‘Web Audio Evaluation Tool: A Browser-Based Listening Test Environment,’ Sound and Music Computing (SMC), July 26 – Aug. 1, 2015

Cross-adaptive audio effects: automatic mixing, live performance and everything in between

Our paper on Applications of cross-adaptive audio effects: automatic mixing, live performance and everything in between has just been published in Frontiers in Digital Humanities. It is a systematic review of cross-adaptive audio effects and their applications.

Cross-adaptive effects extend the boundaries of traditional audio effects by having many inputs and outputs, and deriving their behavior based on analysis of the signals and their interaction. This allows the audio effects to adapt to different material, seemingly being aware of what they do and listening to the signals. Here’s a block diagram showing how a cross-adaptive audio effect modifies a signal.

cross-adaptive architecture

Last year, we published a paper reviewing the history of automatic mixing, almost exactly ten years to the day from when automatic mixing was first extended beyond simple gain changes for speech applications. These automatic mixing applications rely on cross-adaptive effects, but the effects can do so much more.

Here’s an example automatic mixing system from our youtube channel, IntelligentSoundEng.

When a musician uses the signals of other performers directly to inform the timbral character of her own instrument, it enables a radical expansion of interaction during music making. Exploring this was the goal of the Cross-adaptive processing for musical intervention project, led by Oeyvind Brandtsegg, which we discussed in an earlier blog entry. Using cross-adaptive audio effects, musicians can exert control over each the instruments and performance of other musicians, both leading to new competitive aspects and new synergies.

Here’s a short video demonstrating this.

Despite various projects, research and applications involving cross-adaptive audio effects, there is still a fair amount of confusion surrounding the topic. There are multiple definitions, sometimes even by the same authors. So this paper gives a brief history of applications as well as a classification of effects types and clarifies issues that have come up in earlier literature. It further defines the field, lays a formal framework, explores technical aspects and applications, and considers the future from artistic, perceptual, scientific and engineering perspectives.

Check it out!