Imitation Machine Learning actions values are low.
I'm trying to implement imitation learning for a project I am working on using Unity's ml-agents package.
I'm trying to train the AI to play a simple fighting game (move left, move right, jump, three types of punch, dodge left, dodge right). I'm using continuous output, so I get an array of floats that represent different "buttons" being pressed.
The problem I'm running into is, although I've set up the teacher brain to read my input as values of 1 when I am pressing a button, the student brain produces fairly weak outputs that fluctuate around 0.2. This makes it difficult to translate the student graph's outputs into a set of booleans. Ideally, I want the student brain to learn to jump between values around 0 and values around 1 in the same way that my human input does.
I think I need to edit some of the trainer settings, but I don't know much about them. Do I need to set normalize to false? I saw settings about collecting multiple sets of observations. Should I reduce this practice so that the teacher brains button presses aren't being averaged?