Will the agent controls all 5 players or will each agent control a single player?
One of the hard challenge of DOTA is whether or not to "trust" your teammate to do the right action. I.e. One can aggressively go for a kill knowing that their support will back them.. but one can also aggressively go for a kill while their support let them die, and then the whole team starts blaming and tilting because the dps "threw". It's a fine balance.. From personal experience, it seems like in lower leagues it's better to always assume that you're by yourself, whereas in higher leagues you can start expecting more team plays.
Another example is often many players will use their ultimate ability at the same time and "wasting" it. It would be easy for an agent controlling all 5 players to avoid this.. but how would a individual agent knows whether or not to use their ult? Are the agents able to communicate between each others? If so, is there a cap to "how fast it does it?". I.e. on voice, it takes a few seconds to give orders.
Seems it's five individual agents with no communication, just a reward function that shifts towards team-based rewards:
"OpenAI Five does not contain an explicit communication channel between the heroes’ neural networks. Teamwork is controlled by a hyperparameter we dubbed “team spirit”. Team spirit ranges from 0 to 1, putting a weight on how much each of OpenAI Five’s heroes should care about its individual reward function versus the average of the team’s reward functions. We anneal its value from 0 to 1 over training."
The Dota AI system has 3 'levels' to it - team level, mode level and action level. The mode/action level can choose to ignore or respect the team level as it sees fit. [1] Additionally, they say under the "Coordination" section:
OpenAI Five does not contain an explicit communication channel between the heroes’ neural networks. Teamwork is controlled by a hyperparameter we dubbed “team spirit”. Team spirit ranges from 0 to 1, putting a weight on how much each of OpenAI Five’s heroes should care about its individual reward function versus the average of the team’s reward functions. We anneal its value from 0 to 1 over training.
To me, that reads as 5 individual agents, one for each character.
One of the hard challenge of DOTA is whether or not to "trust" your teammate to do the right action. I.e. One can aggressively go for a kill knowing that their support will back them.. but one can also aggressively go for a kill while their support let them die, and then the whole team starts blaming and tilting because the dps "threw". It's a fine balance.. From personal experience, it seems like in lower leagues it's better to always assume that you're by yourself, whereas in higher leagues you can start expecting more team plays.
Another example is often many players will use their ultimate ability at the same time and "wasting" it. It would be easy for an agent controlling all 5 players to avoid this.. but how would a individual agent knows whether or not to use their ult? Are the agents able to communicate between each others? If so, is there a cap to "how fast it does it?". I.e. on voice, it takes a few seconds to give orders.