Education
March 2019 - Ongoing
PhD - Queen Mary University of London
Researching within the Centre for Digital Music at QMUL. Exploring the usage of machine learning and artificial intelligence for music production and mixing.
September 2017 - December 2018
MEng - Cooper Union
Received a master's degree in electrical engineering at the Cooper Union in NYC. My thesis Autoencoding Nerual Networks as Musical Audio Synthesizers outlined a methodology for using standard autoencoders for timbre synthesis and novel sound effects.
September 2011 - May 2015
BEng - Cooper Union
Received a bachelor's degree in electrical engineering with a focus in signal processing. Also received a math minor.
Professional Experience
May 2022 - August 2022
iZotope - Research Intern
Working within iZotope's research team to develop novel machine learning techniques for intelligent audio production. The position is held remotely in NYC while the primary team works in Boston.
January 2022 - April 2022
Yamaha - Research Intern
Worked within the Vocaloid group to develop novel neural networks for controllable singing voice synthesis. The position was held remotely in London while the primary team worked in Hamamatsu, Japan.
July 2015 - June 2017
Citibank - Enterprise Operations and Technology Infrastructure Analyst
Held two rotations within Citibank's back office functions.
Publications
October 2022
Reverse Engineering Memoryless Distortion Effects with Differentiable Waveshapers
Authors: Joseph T Colonel, Marco Comunità, Joshua Reiss
Accepted as a full manuscript to AES NYC 2022
Abstract: We present a lightweight method of reverse engineering distortion effects using Wiener-Hammerstein models implemented in a differentiable framework. The Wiener-Hammerstein models are formulated using graphic equalizer pre-emphasis and de-emphasis filters and a parameterized waveshaping function. Several parameterized waveshaping functions are proposed and evaluated. The performance of each method is measured both objectively and subjectively on a dataset of guitar distortion emulation software plugins and guitar audio samples.
October 2022
Approximating Ballistics in a Differentiable Dynamic Range Compressor
Authors: Joseph T Colonel, Joshua Reiss
Accepted as an abstract+precis to AES NYC 2022
Abstract: We present a dynamic range compressor with ballistics implemented in a differentiable framework that can be used for differentiable digital signal processing tasks. This compressor can update the values of its threshold, compression ratio, knee width, makeup gain, attack time, and release time using stochastic gradient descent and backpropagation techniques. The performance of this technique is evaluated on a reverse engineering of audio effects task, in which the parameter settings of a dynamic range compressor are inferred from a dry and wet pair of audio samples. Techniques for initializing the parameter estimates in this reverse engineering task are outlined and discussed.
May 2022
Direct Design of Biquad Filter Cascades with Deep Learning by Sampling Random Polynomials
Authors: Joseph T Colonel, Christian J Steinmetz, Marcus Michelen, Joshua Reiss
Accepted to ICASSP 2022
Abstract: Designing infinite impulse response filters to match an arbitrary magnitude response requires specialized techniques. Methods like modified Yule-Walker are relatively efficient, but may not be sufficiently accurate in matching high order responses. On the other hand, iterative optimization techniques often enable superior performance, but come at the cost of longer run-times and are sensitive to initial conditions, requiring manual tuning. In this work, we address some of these limitations by learning a direct mapping from the target magnitude response to the filter coefficient space with a neural network trained on millions of random filters. We demonstrate our approach enables both fast and accurate estimation of filter coefficients given a desired response. We investigate training with different families of random filters, and find training with a variety of filter families enables better generalization when estimating real-world filters, using head-related transfer functions and guitar cabinets as case studies. We compare our method against existing methods including modified Yule-Walker and gradient descent and show our approach is, on average, both faster and more accurate.
July 2021
Reverse Engineering of a Recording Mix with Differentiable Digital Signal Processing
Authors: Joseph T Colonel, Joshua Reiss
Published in the Journal of the Acoustical Society of America's special issue on Machine Learning for Acoustics
Abstract: A method to retrieve the parameters used to create a multitrack mix using only raw tracks and the stereo mixdown is presented. This method is able to model linear time-invariant effects such as gain, pan, equalisation, delay, and reverb. Nonlinear effects, such as distortion and compression, are not considered in this work. The optimization procedure used is the stochastic gradient descent with the aid of differentiable digital signal processing modules. This method allows for a fully interpretable representation of the mixing signal chain by explicitly modelling the audio effects rather than using differentiable blackbox modules. Two reverb module architectures are proposed, a “stereo reverb” model and an “individual reverb” model, and each is discussed. Objective feature measures are taken of the outputs of the two architectures when tasked with estimating a target mix and compared against a stereo gain mix baseline. A listening study is performed to measure how closely the two architectures can perceptually match a reference mix when compared to a stereo gain mix. Results show that the stereo reverb model performs best on objective measures and there is no statistically significant difference between the participants' perception of the stereo reverb model and reference mixes.
October 2020
Low Latency Timbre Interpolation and Warping Using Autoencoding Neural Networks
Authors: Joseph T Colonel, Sam Keene
Accepted as a full manuscript to AES NYC 2020
Abstract: A lightweight algorithm for low latency timbre interpolation of two input audio streams using an autoencoding neural network is presented. Short-time Fourier transform magnitude frames of each audio stream are encoded, and a new interpolated representation is created within the autoencoder's latent space. This new representation is passed to the decoder, which outputs a spectrogram. An initial phase estimation for the new spectrogram is calculated using the original phase of the two audio streams. Inversion to the time domain is done using a Griffin-Lim iteration. A method for avoiding pops between processed batches is discussed. An open source implementation in Python is made available.
July 2020
Conditioning Autoencoder Latent Spaces for Real-Time Timbre Synthesis
Authors: Joseph T Colonel, Sam Keene
Accepted to IJCNN 2020
Abstract: We compare standard autoencoder topologies' performances for timbre generation. We demonstrate how different activation functions used in the autoencoder's bottleneck distributes a training corpus's embedding. We show that the choice of sigmoid activation in the bottleneck produces a more bounded and uniformly distributed embedding than a leaky rectified linear unit activation. We propose a one-hot encoded chroma feature vector for use in both input augmentation and latent space conditioning. We measure the performance of these networks, and characterize the latent embeddings that arise from the use of this chroma conditioning vector. An open source, real-time timbre synthesis algorithm in Python is outlined and shared.
October 2019
Authors: Joseph T Colonel, Joshua Reiss
Accepted as an E-Brief to AES NYC 2019
We investigate listener preference in multitrack music production using the Mix Evaluation Dataset, comprised of 184 mixes across 19 songs. Features are extracted from verses and choruses of stereo mixdowns. Each observation is associated with an average listener preference rating and standard deviation of preference ratings. Principal component analysis is performed to analyze how mixes vary within the feature space. We demonstrate that virtually no correlation is found between the embedded features and either average preference or standard deviation of preference. We instead propose using principal component projections as a semantic embedding space by associating each observation with listener comments from the Mix Evaluation Dataset. Initial results disagree with simple descriptions such as “width” or “loudness” for principal component axes.
September 2018
Authors: Joseph T Colonel, Christopher Curro, Sam Keene
Accepted to DAFx 2018
Abstract: A method for musical audio synthesis using autoencoding neural networks is proposed. The autoencoder is trained to compress and reconstruct magnitude short-time Fourier transform frames. The autoencoder produces a spectrogram by activating its smallest hidden layer, and a phase response is calculated using real-time phase gradient heap integration. Taking an inverse short-time Fourier transform produces the audio signal. Our algorithm is light-weight when compared to current state-of-the-art audio-producing ma-chine learning algorithms. We outline our design process, produce metrics, and detail an open-source Python implementation of our model.
October 2017
Authors: Joseph T Colonel, Christopher Curro, Sam Keene
Accepted as abstract+precis to AES NYC 2017
Abstract: We present a novel architecture for a synthesizer based on an autoencoder that compresses and reconstructs magnitude short time Fourier transform frames. This architecture outperforms previous topologies by using improved regularization, employing several activation functions, creating a focused training corpus, and implementing the Adam learning method. By multiplying gains to the hidden layer, users can alter the autoencoder’s output, which opens up a palette of sounds unavailable to additive/subtractive synthesizers. Furthermore, our architecture can be quickly re-trained on any sound domain, making it flexible for music synthesis applications. Samples of the autoencoder’s outputs can be found at http://soundcloud. com/ann_synth, and the code used to generate and train the autoencoder is open source, hosted at http://github. com/JTColonel/ann_synth.