CV | Joseph T Colonel

Professional Experience

June 2023 - Ongoing

Icahn School of Medicine at Mount Sinai - Postdoctoral Fellow

Developing machine learning models for natural language processing and automated speech recognition to identify older adults with cognitive impairment.

May 2022 - August 2022

iZotope - Research Intern

Worked within iZotope's research team to develop novel machine learning techniques for intelligent audio production. The position was held remotely in NYC while the primary team worked in Boston and Berlin.

January 2022 - April 2022

Yamaha - Research Intern

Worked within the Vocaloid group to develop novel neural networks for controllable singing voice synthesis. The position was held remotely in London while the primary team worked in Hamamatsu, Japan.

July 2015 - June 2017

Citibank - Enterprise Operations and Technology Infrastructure Analyst

Held two rotations within Citibank's back office functions, including project management for Citibank's technology infrastructure and datacenter engineering.

Education

March 2019 - June 2023

PhD CS - Queen Mary University of London

Dissertation: Music Production Behaviour Modelling

Researched within the Centre for Digital Music at QMUL. Explored the usage of machine learning and artificial intelligence for music production and mixing. My dissertation used differentiable digital signal processing to recover the effect chain used by a mix engineer to produce a mixdown for a multitrack recording.

September 2017 - December 2018

MEng EE - Cooper Union

Thesis: Autoencoding Neural Networks as Musical Audio Synthesizer

Received a master's degree in electrical engineering at the Cooper Union in NYC. My thesis outlined a methodology for using standard autoencoders for timbre synthesis and novel sound effects.

September 2011 - May 2015

BEng EE - Cooper Union

Received a bachelor's degree in electrical engineering with a focus in signal processing.

Also received a math minor.

Selected Publications
Full bibliography available on Google Scholar

September 2023

Reverse Engineering a Nonlinear Mix of a Multitrack Recording

Authors: Joseph T Colonel and Joshua Reiss

Published in the Journal of the Audio Engineering Society's Special Issue on Trends in Digital Audio Effects

Abstract: In the field of intelligent audio production, neural networks have been trained to automatically mix a multitrack to a stereo mixdown. Although these algorithms contain latent models of mix engineering, there is still a lack of approaches that explicitly model the decisions a mix engineer makes while mixing. In this work, a method to retrieve the parameters used to create a multitrack mix using only raw tracks and the stereo mixdown is presented. This method is able to model a multitrack mix using gain, panning, equalization, dynamic range compression, distortion, delay, and reverb with the aid of greybox differentiable digital signal processing modules. This method allows for a fully interpretable representation of the mixing signal chain by explicitly modeling the audio effects one may expect in a typical engineer's mixing chain. The modeling capacities of several different mixing chains are measured using both objective and subjective measures on a dataset of student mixes. Results show that the full signal chain performs best on objective measures and that there is no statistically significant difference between the participants' perception of the full mixing chain and reference mixes.

October 2022

Reverse Engineering Memoryless Distortion Effects with Differentiable Waveshapers

Authors: Joseph T Colonel, Marco Comunità, Joshua Reiss

Winner of Best Student Paper Award at AES NYC 2022

Abstract: We present a lightweight method of reverse engineering distortion effects using Wiener-Hammerstein models implemented in a differentiable framework. The Wiener-Hammerstein models are formulated using graphic equalizer pre-emphasis and de-emphasis filters and a parameterized waveshaping function. Several parameterized waveshaping functions are proposed and evaluated. The performance of each method is measured both objectively and subjectively on a dataset of guitar distortion emulation software plugins and guitar audio samples.

May 2022

Direct Design of Biquad Filter Cascades with Deep Learning by Sampling Random Polynomials

Authors: Joseph T Colonel, Christian J Steinmetz, Marcus Michelen, Joshua Reiss

Accepted to ICASSP 2022

Abstract: Designing infinite impulse response filters to match an arbitrary magnitude response requires specialized techniques. Methods like modified Yule-Walker are relatively efficient, but may not be sufficiently accurate in matching high order responses. On the other hand, iterative optimization techniques often enable superior performance, but come at the cost of longer run-times and are sensitive to initial conditions, requiring manual tuning. In this work, we address some of these limitations by learning a direct mapping from the target magnitude response to the filter coefficient space with a neural network trained on millions of random filters. We demonstrate our approach enables both fast and accurate estimation of filter coefficients given a desired response. We investigate training with different families of random filters, and find training with a variety of filter families enables better generalization when estimating real-world filters, using head-related transfer functions and guitar cabinets as case studies. We compare our method against existing methods including modified Yule-Walker and gradient descent and show our approach is, on average, both faster and more accurate.

July 2021

Reverse Engineering of a Recording Mix with Differentiable Digital Signal Processing

Authors: Joseph T Colonel, Joshua Reiss

Published in the Journal of the Acoustical Society of America's special issue on Machine Learning for Acoustics

Abstract: A method to retrieve the parameters used to create a multitrack mix using only raw tracks and the stereo mixdown is presented. This method is able to model linear time-invariant effects such as gain, pan, equalisation, delay, and reverb. Nonlinear effects, such as distortion and compression, are not considered in this work. The optimization procedure used is the stochastic gradient descent with the aid of differentiable digital signal processing modules. This method allows for a fully interpretable representation of the mixing signal chain by explicitly modelling the audio effects rather than using differentiable blackbox modules. Two reverb module architectures are proposed, a “stereo reverb” model and an “individual reverb” model, and each is discussed. Objective feature measures are taken of the outputs of the two architectures when tasked with estimating a target mix and compared against a stereo gain mix baseline. A listening study is performed to measure how closely the two architectures can perceptually match a reference mix when compared to a stereo gain mix. Results show that the stereo reverb model performs best on objective measures and there is no statistically significant difference between the participants' perception of the stereo reverb model and reference mixes.

July 2020

Conditioning Autoencoder Latent Spaces for Real-Time Timbre Synthesis

Authors: Joseph T Colonel, Sam Keene

Accepted to IJCNN 2020

Abstract: We compare standard autoencoder topologies' performances for timbre generation. We demonstrate how different activation functions used in the autoencoder's bottleneck distributes a training corpus's embedding. We show that the choice of sigmoid activation in the bottleneck produces a more bounded and uniformly distributed embedding than a leaky rectified linear unit activation. We propose a one-hot encoded chroma feature vector for use in both input augmentation and latent space conditioning. We measure the performance of these networks, and characterize the latent embeddings that arise from the use of this chroma conditioning vector. An open source, real-time timbre synthesis algorithm in Python is outlined and shared.