it really depends on which learning paradigm you will choose.
The rational for one hot in my mind is simple:
one hot encoding allows you to use the widest set of learning algos with maximum robustness. If you do things like continuous or discrete data representations such as Discretex n letters, your network complexity has to go up and how you go about optimizing this is very tricky and hard.
Approaches such as evolution training and deep forests are actually much easier to setup, don’t need to worry about gradients (if you have a gradient that is not say smooth gradient, it will not find you a global solution of any sort or local for that matter)
To be honest, this whole “deep net” hype is causing a lot of people to waste time with training networks which have very low chances of working properly. It is funny that both Amazon and Google released their neural network frameworks and both run cloud computing services, which by the way are the most crucial parts of their companies (i.e… without AWS, Amazon would not have been profitable, and hence it does not go openly bashing Tensorflow as even though a competitor for MXNET, likely, most of the AWS AI business comes from people running Tensorflow, not MXNET)