- Home /
ML Agents lots of zero observations
I am building an AI for a 2-4 player board game, but want to use the same brain for anywhere from 2 to 4 players. Because of this, if there is only 2 players, for example, all the data about the missing 2 players would just be fed to MakeObservations as just a bunch of zeroes (or negative ones or something). Is this okay to do?
Answer by Captain_Pineapple · Sep 04, 2021 at 07:24 AM
Depends on what you mean by "ok"- You can do this and Unity/ML-Agents wont complain but your results will not be good.
Without any further information i cannot really tell you how to approach this but it sounds to me that you should read more into how to properly design states for Reinforcement Learning agents. If you
Why?
Because with your current approach your "AI" would not be able to transfer a move learned in a game with 2 players to a game with 4 players as the states would be completely different. This basically means that you'd have to train the same behaviour 3 times at which point it'd be more efficient to train 3 separate brains.
Or take it from a different view: If you have N states per player and you train a Brain with 4*N where 2*N states are always zero then you could be (about) twice as fast in your learning process by cutting the unneeded states.
What you can do to work around such things is to (for example) rework some states like: "Is player 1 closer than X to me?"/"Is player 2 closer than X to me?"/"Is player 3 closer than X to me?" into "Is Any player closer than X to me?" with a second state: "How many players are closer than X to me?" This way you can give a similar information to the agent while gaining independence of the player count.
This abstrahation process is not easy. Take your time to do this right as it will save you a lot of trouble later down the road.
Thanks that makes sense. I like the idea of finding a way to represent state independent of player count.
I think I will use multiple brains. Will it not mess anything up to have all the possible agents in the scene (4 4-player agents, 3 3-player agents, and 2 2-player agents) at the start of training, when only one set is actually being trained at a time/episode? How to handle the inactive agents? Should I still call EndEpisode on them? Sorry if this seems off topic, seemed somewhat related to my original question.
Would you $$anonymous$$d sharing some info about what this board game is about? Knowing the environment is crucial if you want a proper solution to your problem. What kind of observation do you currently use?
other then that the best you can do is have as many agents as possible in the scene at the same time. This way you can cheese the most performance out of your PC. (Best make it so you run at like ~90% CPU load imo) I guess i'd still train only one brain at a time per scene if i was to create 3 seperate brains. I take from this that you are not able to create more abstract observations which make your brain independend of the player count?
Here is a GIF of it to get a better idea.
https://i.imgur.com/y93eARp.gifv
Game consists of rolling dice, answering questions to earn coins, buying and using powerups which help you or hinder opponents, and ultimately getting the shine sprite at the end of the board and returning to the start. Data for each player consists of: position on board, how many of each of the 5 powerups they have, how many coins, whether they have the shine sprite yet, their dice multiplier (from powerup), and whether they are fire powered (from powerup). It's not really a whole lot. Also, due to the nature of the game, once I remove animations, I should be able to simulate a huge amount of games, so time shouldn't be an issue. There may be a way to abstract it to not depend on number of players, but it's not a whole lot of data to begin with, so I would prefer the AI to just have all the data, and be able to choose based on that.
I am actually thinking of going back to the one-brain implementation though. The reason being, I don't want the different number of players starting to act weird/different from each other or develop different strategies from other amount of opponents, if training for each of the numbers of opponents becomes isolated from each other. I want 2, 3, and 4 player to be consistent. I am using self-play (all the players are equal), so if I use the same brain, it will always play against the same prior versions of itself, which would make it less likely for one of them to go on a tangent down a different strategy than the others, if that makes sense.