- Home /
Unity ML Agent doesn't learn well on default examples
I am trying to understand Unity Machine Learning Beta.
I'm trying to train agents on all given example scenes, 3DBall worked fine, but Area and even Basic examples are not learning properly. Problem might be in hyper parameters but I'm not sure. My hyper parameters Are:
### General parameters
max_steps = 5e5 # Set maximum number of steps to run environment.
run_path = "ppo" # The sub-directory name for model and summary statistics
load_model = False # Whether to load a saved model.
train_model = True # Whether to train the model.
summary_freq = 10000 # Frequency at which to save training statistics.
save_freq = 50000 # Frequency at which to save model.
env_name = "basic" # Name of the training environment file.
curriculum_file = None
### Algorithm-specific parameters for tuning
gamma = 0.99 # Reward discount rate.
lambd = 0.95 # Lambda parameter for GAE.
time_horizon = 2048 # How many steps to collect per agent before adding to buffer.
beta = 1e-3 # Strength of entropy regularization
num_epoch = 5 # Number of gradient descent steps per batch of experiences.
num_layers = 2 # Number of hidden layers between state/observation encoding and value/policy layers.
epsilon = 0.2 # Acceptable threshold around ratio of old and new policy probabilities.
buffer_size = 2048 # How large the experience buffer should be before gradient descent.
learning_rate = 3e-4 # Model learning rate.
hidden_units = 64 # Number of units in hidden layer.
batch_size = 64 # How many experiences per gradient descent update step.
normalize = False
### Logging dictionary for hyperparameters
hyperparameter_dict = {'max_steps':max_steps, 'run_path':run_path, 'env_name':env_name,
'curriculum_file':curriculum_file, 'gamma':gamma, 'lambd':lambd, 'time_horizon':time_horizon,
'beta':beta, 'num_epoch':num_epoch, 'epsilon':epsilon, 'buffe_size':buffer_size,
'leaning_rate':learning_rate, 'hidden_units':hidden_units, 'batch_size':batch_size}
It's mean reward doesn't increase more than 0.2 (even though it has potential of 0.9+) If anybody have done training on these examples can you please tell me what hyper parameters did you use?
Thanks in advance
Answer by ArgMagnus · Feb 15, 2018 at 11:21 AM
You can try changing the input state type to continuous in the brain, it is not really a good solution but it gave me better results for the basic example. I first tried with both one-hot encoding and a number of hyperparameters but switching to continuous without changing anything else in the example code got it up to ~0.85 mean reward after 8 million steps. Would still be interesting to hear how someone trained this example more efficiently while keeping the discrete state type though.