Hello. I’m wondering what observation space to use represent text strings. These would be fed into an RNN at every step of an episode. The issue I’m having is that these strings may vary in length. Does Gym have a solution for this, or do you just encode it as a vector of bytes and pad the end of the string with zeroes?
Just pad it with zeros.
Just imagine your problem like developing an AI to play chess or GO.
Probably best to have maxLen x 26 2d array and one-hot it.
Gym doesn’t provide agent code other than a few demos.
RNNs should cope with variable-length strings. It is one of the motivations for the RNN design, that it copes well with sequence data. You have to decide which outputs of the RNN represent your agent’s policy and/or value function. It should be OK to run the RNN multiple times for each time step and take its output at the end of each variable-length observation.
Couple of thoughts:
It may be worth having an “end of observation” token to help the RNN. This could be critical if observations do not have obvious separation between them otherwise.
Padding the input to a fixed shape as frenk1981 suggests is also valid. Which is better approach depends on details of your problem, and you won’t know for certain unless you try both thoroughly and measure the result.
The question wasn’t how the agent should handle the data, but how the environment should set it’s observation_space field to represent text strings of varying lengths and how the data should be represented. Best answer I have so far is to use a matrix that can hold the maximum string length and pad the end of the string to the maximum length. Still unclear on whether one-hot is the best representation to have.
it really depends on which learning paradigm you will choose.
The rational for one hot in my mind is simple:
one hot encoding allows you to use the widest set of learning algos with maximum robustness. If you do things like continuous or discrete data representations such as Discretex n letters, your network complexity has to go up and how you go about optimizing this is very tricky and hard.
Approaches such as evolution training and deep forests are actually much easier to setup, don’t need to worry about gradients (if you have a gradient that is not say smooth gradient, it will not find you a global solution of any sort or local for that matter)
To be honest, this whole “deep net” hype is causing a lot of people to waste time with training networks which have very low chances of working properly. It is funny that both Amazon and Google released their neural network frameworks and both run cloud computing services, which by the way are the most crucial parts of their companies (i.e… without AWS, Amazon would not have been profitable, and hence it does not go openly bashing Tensorflow as even though a competitor for MXNET, likely, most of the AWS AI business comes from people running Tensorflow, not MXNET)