Education

March 2019 - Ongoing

PhD - Queen Mary University of London

Researching within the Centre for Digital Music at QMUL. Exploring the usage of machine learning and artificial intelligence for music production and mixing.

September 2017 - December 2018

MEng - Cooper Union

Received a master's degree in electrical engineering at the Cooper Union in NYC. My thesis Autoencoding Nerual Networks as Musical Audio Synthesizers outlined a methodology for using standard autoencoders for timbre synthesis and novel sound effects.

September 2011 - May 2015

BEng - Cooper Union

Received a bachelor's degree in electrical engineering with a focus in signal processing. Also received a math minor.

Professional Experience

May 2022 - August 2022

iZotope - Research Intern

Working within iZotope's research team to develop novel machine learning techniques for intelligent audio production. The position is held remotely in NYC while the primary team works in Boston.

January 2022 - April 2022

Yamaha - Research Intern

Worked within the Vocaloid group to develop novel neural networks for controllable singing voice synthesis. The position was held remotely in London while the primary team worked in Hamamatsu, Japan.

July 2015 - June 2017

Citibank - Enterprise Operations and Technology Infrastructure Analyst

Held two rotations within Citibank's back office functions.

 

Publications

October 2022

Reverse Engineering Memoryless Distortion Effects with Differentiable Waveshapers

Authors: Joseph T Colonel, Marco Comunità, Joshua Reiss

Accepted as a full manuscript to AES NYC 2022

Abstract: We present a lightweight method of reverse engineering distortion effects using Wiener-Hammerstein models implemented in a differentiable framework. The Wiener-Hammerstein models are formulated using graphic equalizer pre-emphasis and de-emphasis filters and a parameterized waveshaping function. Several parameterized waveshaping functions are proposed and evaluated. The performance of each method is measured both objectively and subjectively on a dataset of guitar distortion emulation software plugins and guitar audio samples.

October 2022

Approximating Ballistics in a Differentiable Dynamic Range Compressor

Authors: Joseph T Colonel, Joshua Reiss

Accepted as an abstract+precis to AES NYC 2022

Abstract: We present a dynamic range compressor with ballistics implemented in a differentiable framework that can be used for differentiable digital signal processing tasks. This compressor can update the values of its threshold, compression ratio, knee width, makeup gain, attack time, and release time using stochastic gradient descent and backpropagation techniques. The performance of this technique is evaluated on a reverse engineering of audio effects task, in which the parameter settings of a dynamic range compressor are inferred from a dry and wet pair of audio samples. Techniques for initializing the parameter estimates in this reverse engineering task are outlined and discussed. 

May 2022

Direct Design of Biquad Filter Cascades with Deep Learning by Sampling Random Polynomials

Authors: Joseph T Colonel, Christian J Steinmetz, Marcus Michelen, Joshua Reiss

Accepted to ICASSP 2022


Abstract: Designing infinite impulse response filters to match an arbitrary magnitude response requires specialized techniques. Methods like modified Yule-Walker are relatively efficient, but may not be sufficiently accurate in matching high order responses. On the other hand, iterative optimization techniques often enable superior performance, but come at the cost of longer run-times and are sensitive to initial conditions, requiring manual tuning. In this work, we address some of these limitations by learning a direct mapping from the target magnitude response to the filter coefficient space with a neural network trained on millions of random filters. We demonstrate our approach enables both fast and accurate estimation of filter coefficients given a desired response. We investigate training with different families of random filters, and find training with a variety of filter families enables better generalization when estimating real-world filters, using head-related transfer functions and guitar cabinets as case studies. We compare our method against existing methods including modified Yule-Walker and gradient descent and show our approach is, on average, both faster and more accurate.

July 2021

Reverse Engineering of a Recording Mix with Differentiable Digital Signal Processing

Authors: Joseph T Colonel, Joshua Reiss

Published in the Journal of the Acoustical Society of America's special issue on Machine Learning for Acoustics


Abstract: A method to retrieve the parameters used to create a multitrack mix using only raw tracks and the stereo mixdown is presented. This method is able to model linear time-invariant effects such as gain, pan, equalisation, delay, and reverb. Nonlinear effects, such as distortion and compression, are not considered in this work. The optimization procedure used is the stochastic gradient descent with the aid of differentiable digital signal processing modules. This method allows for a fully interpretable representation of the mixing signal chain by explicitly modelling the audio effects rather than using differentiable blackbox modules. Two reverb module architectures are proposed, a “stereo reverb” model and an “individual reverb” model, and each is discussed. Objective feature measures are taken of the outputs of the two architectures when tasked with estimating a target mix and compared against a stereo gain mix baseline. A listening study is performed to measure how closely the two architectures can perceptually match a reference mix when compared to a stereo gain mix. Results show that the stereo reverb model performs best on objective measures and there is no statistically significant difference between the participants' perception of the stereo reverb model and reference mixes.

October 2020

Low Latency Timbre Interpolation and Warping Using Autoencoding Neural Networks

Authors: Joseph T Colonel, Sam Keene

Accepted as a full manuscript to AES NYC 2020


Abstract: A lightweight algorithm for low latency timbre interpolation of two input audio streams using an autoencoding neural network is presented. Short-time Fourier transform magnitude frames of each audio stream are encoded, and a new interpolated representation is created within the autoencoder's latent space. This new representation is passed to the decoder, which outputs a spectrogram. An initial phase estimation for the new spectrogram is calculated using the original phase of the two audio streams. Inversion to the time domain is done using a Griffin-Lim iteration. A method for avoiding pops between processed batches is discussed. An open source implementation in Python is made available.

July 2020

Conditioning Autoencoder Latent Spaces for Real-Time Timbre Synthesis

Authors: Joseph T Colonel, Sam Keene

Accepted to IJCNN 2020


Abstract: We compare standard autoencoder topologies' performances for timbre generation. We demonstrate how different activation functions used in the autoencoder's bottleneck distributes a training corpus's embedding. We show that the choice of sigmoid activation in the bottleneck produces a more bounded and uniformly distributed embedding than a leaky rectified linear unit activation. We propose a one-hot encoded chroma feature vector for use in both input augmentation and latent space conditioning. We measure the performance of these networks, and characterize the latent embeddings that arise from the use of this chroma conditioning vector. An open source, real-time timbre synthesis algorithm in Python is outlined and shared.

October 2019

Authors: Joseph T Colonel, Joshua Reiss

Accepted as an E-Brief to AES NYC 2019

We investigate listener preference in multitrack music production using the Mix Evaluation Dataset, comprised of 184 mixes across 19 songs. Features are extracted from verses and choruses of stereo mixdowns. Each observation is associated with an average listener preference rating and standard deviation of preference ratings. Principal component analysis is performed to analyze how mixes vary within the feature space. We demonstrate that virtually no correlation is found between the embedded features and either average preference or standard deviation of preference. We instead propose using principal component projections as a semantic embedding space by associating each observation with listener comments from the Mix Evaluation Dataset. Initial results disagree with simple descriptions such as “width” or “loudness” for principal component axes.

September 2018

Authors: Joseph T Colonel, Christopher Curro, Sam Keene

Accepted to DAFx 2018

Abstract: A method for musical audio synthesis using autoencoding neural networks is proposed. The autoencoder is trained to compress and reconstruct magnitude short-time Fourier transform frames. The autoencoder produces a spectrogram by activating its smallest hidden layer, and a phase response is calculated using real-time phase gradient heap integration. Taking an inverse short-time Fourier transform produces the audio signal. Our algorithm is light-weight when compared to current state-of-the-art audio-producing ma-chine learning algorithms. We outline our design process, produce metrics, and detail an open-source Python implementation of our model.

October 2017

Authors: Joseph T Colonel, Christopher Curro, Sam Keene

Accepted as abstract+precis to AES NYC 2017

Abstract: We present a novel architecture for a synthesizer based on an autoencoder that compresses and reconstructs magnitude short time Fourier transform frames. This architecture outperforms previous topologies by using improved regularization, employing several activation functions, creating a focused training corpus, and implementing the Adam learning method. By multiplying gains to the hidden layer, users can alter the autoencoder’s output, which opens up a palette of sounds unavailable to additive/subtractive synthesizers. Furthermore, our architecture can be quickly re-trained on any sound domain, making it flexible for music synthesis applications. Samples of the autoencoder’s outputs can be found at http://soundcloud. com/ann_synth, and the code used to generate and train the autoencoder is open source, hosted at http://github. com/JTColonel/ann_synth.