CANNe, or Cooper's Autoencoding Neural Network, is the second iteration of the project that became my master's thesis. It is a full fledged software synthesizer that explores the latent space constructed by an autoencoding neural network. It was accepted as a paper at DAFx-18 in Aveiro, Portugal.
First, a full autoencoder is trained to compress and reconstruct spectrograms of input sounds. Then, the encoder is tossed away so that a musician can directly explore the latent space constructed during the training process. These latent activations are passed to the decoder, which will output a spectrogram.
The main challenge with this project came when trying to invert the spectrogram produced by the decoder. Because the spectrogram is created purely by latent activations, it comes with no phase information and is thus non-invertible. To overcome this, I used an initial phase estimate produced by the real-time phase gradient heap integration method, and followed with a Griffin-Lim iterative phase estimation.
Below is an example of CANNe trained on bootleg bass stems.
Note: In this implementation of CANNe, the note selection on top is ignored. I forgot to trim that before recording.