- Home /
Tracking cumulative reward results in ML Agents for 0 sum games using self-play; does self-play focus on one agent in the scene at a time?
I want to be able to track the learning progress of my agents. Here is a GIF of my game: https://imgur.com/y93eARp. There are 2-4 agents with the same strategy (self play) and they get 10 points for winning, -10 points for losing (0 sum).
According to the documentation (https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Using-Tensorboard.md):
Environment/Cumulative Reward - The mean cumulative episode reward over all agents. Should increase during a successful training session.
However, in a 0 sum game, the cumulative reward will always be 0. What I am wondering is, does the Academy system in self-play actually only test one current agent, while all other agents in the scene are set to prior versions of itself to compete against? Then it would make sense to track cumulative reward for that one agent, the "real" current agent.
At the bottom of the documentation, another metric is mentioned:
Self-Play/ELO (Self-Play) - ELO measures the relative skill level between two players. In a proper training run, the ELO of the agent should steadily increase.
This seems specifically targeted to self-play, so maybe it will have better results to track?