The purpose of an autoencoder is to get a latent representation z of some input X where z = g(X).
If we had an oracle which would tell us the true latent representation z for each X then we could train the function
g in a supervised manner and were done. We don’t have such an oracle, so we create a function d(z) = X’ which reconstructs the original input from the latent representation and then compare the reconstruction to the original input.
Can’t we train function g without needing a decoder d? I was thinking that perhaps we can simply take the gradient of z wrt X as our reconstruction term. In tensorflow this can be done in one line:
# Instead of reconstruction = decoder(z) # We use reconstruction = tf.gradients(ys=z, xs=X)
I wrote up some code which compares this ‘Virtual’ autoencoder to a standard convolutional autoencoder (CAE) on MNIST and it seems to be working and getting a significantly lower MSE than the standard CAE.
It’s not learning a useful representation of the data, but what is it doing?
The code can be found on: https://github.com/silverjoda/Joda_research/tree/master/Machinelearning/Virtual_autoencoder