Education

March 2019 - Ongoing

PhD - Queen Mary University of London

Researching within the Centre for Digital Music at QMUL. Exploring the usage of machine learning and artificial intelligence for music production and mixing.

September 2017 - December 2018

MEng - Cooper Union

Received a master's degree in electrical engineering at the Cooper Union in NYC. My thesis Autoencoding Nerual Networks as Musical Audio Synthesizers outlined a methodology for using standard autoencoders for timbre synthesis and novel sound effects.

September 2011 - May 2015

BEng - Cooper Union

Received a bachelor's degree in electrical engineering with a focus in signal processing. Also received a math minor.

Publications

July 2021

Authors: Joseph T Colonel, Joshua Reiss

Published in the Journal of the Acoustical Society of America's special issue on Machine Learning for Acoustics


Abstract: A method to retrieve the parameters used to create a multitrack mix using only raw tracks and the stereo mixdown is presented. This method is able to model linear time-invariant effects such as gain, pan, equalisation, delay, and reverb. Nonlinear effects, such as distortion and compression, are not considered in this work. The optimization procedure used is the stochastic gradient descent with the aid of differentiable digital signal processing modules. This method allows for a fully interpretable representation of the mixing signal chain by explicitly modelling the audio effects rather than using differentiable blackbox modules. Two reverb module architectures are proposed, a “stereo reverb” model and an “individual reverb” model, and each is discussed. Objective feature measures are taken of the outputs of the two architectures when tasked with estimating a target mix and compared against a stereo gain mix baseline. A listening study is performed to measure how closely the two architectures can perceptually match a reference mix when compared to a stereo gain mix. Results show that the stereo reverb model performs best on objective measures and there is no statistically significant difference between the participants' perception of the stereo reverb model and reference mixes.

October 2020

Low Latency Timbre Interpolation and Warping Using Autoencoding Neural Networks

Authors: Joseph T Colonel, Sam Keene

Accepted as a full manuscript to AES NYC 2020


Abstract: A lightweight algorithm for low latency timbre interpolation of two input audio streams using an autoencoding neural network is presented. Short-time Fourier transform magnitude frames of each audio stream are encoded, and a new interpolated representation is created within the autoencoder's latent space. This new representation is passed to the decoder, which outputs a spectrogram. An initial phase estimation for the new spectrogram is calculated using the original phase of the two audio streams. Inversion to the time domain is done using a Griffin-Lim iteration. A method for avoiding pops between processed batches is discussed. An open source implementation in Python is made available.

July 2020

Conditioning Autoencoder Latent Spaces for Real-Time Timbre Synthesis

Authors: Joseph T Colonel, Sam Keene

Accepted to IJCNN 2020


Abstract: We compare standard autoencoder topologies' performances for timbre generation. We demonstrate how different activation functions used in the autoencoder's bottleneck distributes a training corpus's embedding. We show that the choice of sigmoid activation in the bottleneck produces a more bounded and uniformly distributed embedding than a leaky rectified linear unit activation. We propose a one-hot encoded chroma feature vector for use in both input augmentation and latent space conditioning. We measure the performance of these networks, and characterize the latent embeddings that arise from the use of this chroma conditioning vector. An open source, real-time timbre synthesis algorithm in Python is outlined and shared.

October 2019

Authors: Joseph T Colonel, Joshua Reiss

Accepted as an E-Brief to AES NYC 2019

We investigate listener preference in multitrack music production using the Mix Evaluation Dataset, comprised of 184 mixes across 19 songs. Features are extracted from verses and choruses of stereo mixdowns. Each observation is associated with an average listener preference rating and standard deviation of preference ratings. Principal component analysis is performed to analyze how mixes vary within the feature space. We demonstrate that virtually no correlation is found between the embedded features and either average preference or standard deviation of preference. We instead propose using principal component projections as a semantic embedding space by associating each observation with listener comments from the Mix Evaluation Dataset. Initial results disagree with simple descriptions such as “width” or “loudness” for principal component axes.

September 2018

Authors: Joseph T Colonel, Christopher Curro, Sam Keene

Accepted to DAFx 2018

Abstract: A method for musical audio synthesis using autoencoding neural networks is proposed. The autoencoder is trained to compress and reconstruct magnitude short-time Fourier transform frames. The autoencoder produces a spectrogram by activating its smallest hidden layer, and a phase response is calculated using real-time phase gradient heap integration. Taking an inverse short-time Fourier transform produces the audio signal. Our algorithm is light-weight when compared to current state-of-the-art audio-producing ma-chine learning algorithms. We outline our design process, produce metrics, and detail an open-source Python implementation of our model.

October 2017

Authors: Joseph T Colonel, Christopher Curro, Sam Keene

Accepted as abstract+precis to AES NYC 2017

Abstract: We present a novel architecture for a synthesizer based on an autoencoder that compresses and reconstructs magnitude short time Fourier transform frames. This architecture outperforms previous topologies by using improved regularization, employing several activation functions, creating a focused training corpus, and implementing the Adam learning method. By multiplying gains to the hidden layer, users can alter the autoencoder’s output, which opens up a palette of sounds unavailable to additive/subtractive synthesizers. Furthermore, our architecture can be quickly re-trained on any sound domain, making it flexible for music synthesis applications. Samples of the autoencoder’s outputs can be found at http://soundcloud. com/ann_synth, and the code used to generate and train the autoencoder is open source, hosted at http://github. com/JTColonel/ann_synth.