Over a year ago, OpenAI set themselves the task of building AI bots that could defeat a professional team of humans at Dota 2, a highly complex multiplayer battle strategy computer game. Since then, they’ve made unabated progress towards their goal, first beating the world’s best players in 1v1 games, then defeating semi-professional teams at 5v5 games, and finally, three weeks ago, defeating a team ranked in the top 0.05 percent of human players.
But this impressive run came to an end last week when the bots were soundly defeated at The International, the multi-million-dollar world championship Dota 2 tournament, which this year was held in Vancouver. The bots were scheduled to play a best of three against human teams from Wednesday to Friday but crashed out after their first two matches.
On Saturday, after the defeat, I met with OpenAI CTO Greg Brockman to ask him what this all means for the capabilities and limitations of machine learning.
Why has OpenAI decided to spend so much time and resources on solving a computer game?
We wanted to work on getting bots to play complex video games because in many ways they are a step closer to the real world than games like chess and Go, which are very cerebral, but not continuous. In chess and Go, you have time to stop and think before a move, which gives the computer a big advantage. In a complex video game like Dota 2, there’s still strategy and reasoning, but no time to pause.
So, to get a bot to play Dota well, you need to make an algorithm that isn’t just searching, but that is actually working off a type of artificial instinct. This is a very difficult problem to solve. Our hypothesis was: if you solve this hard problem, the solution will apply outside the game. And this turned out to be true. The algorithms that we’ve used to develop these Dota-playing bots have been applied to other domains, like our robotic arm that can now successfully manipulate physical objects.
So, we’re just starting to see how these skills learnt in simulated environments can be transferred to reality. And yet, by doing our experiments in this contained environment, we keep it all very low stakes. If it makes a mistake or fails, no one gets hurt.
In the previous Dota 2 matches, the bots were so convincing in their victories. What was so different this time around at The International?
I think there are two key differences. Firstly, the professional human teams are really, really good at what they do. I’ve come away with a greater appreciation of just how good these guys really are.
Secondly, whereas we had been playing modified versions of Dota in the past, in this game we took out many of those restrictions. For example, in the past, the system was able to choose the heroes [player characters] it plays with, and our system is really good at this selection process. In this game, the hero lineups were provided by a third party. In the previous games we also had a restriction in place that let the bots regenerate health without having to abandon an attack. This was also removed.
So, in these most recent games, we basically had all the challenging aspects of the game pushed to their absolute extremes. Because of this, going into the tournament, we estimated that we had a 30-40% chance of winning. The fact that we didn’t win wasn’t totally surprising for us. But I don’t feel like it was a failure either. For me failure would’ve been if we had not had interesting matches. But we held our own in both games.
What in particular do you think went wrong for the bots in these matches?
There has been lot of conversation about how the bot’s used “ultimates” really poorly. An ultimate is a special ability that is unique to a certain hero in the game and can be used strategically at certain points to great effect.
In the second game, one of our bots kept using its ultimate at the wrong time, which made it ineffective. So already a lot of people are concluding that bots aren’t able to think strategically, that they’ll never be able to trade off short term gains with long term wins.
Do you think that’s fair?
I think that saying that a bot will never be strategic by looking at one strange behavior really misunderstands the way the bots learn how to play through reinforcement learning. The bots get better through self-play, meaning that they play themselves over and over and over, at a scale that’s hard to comprehend, and through this process they learn how to get better.
He put the bot’s failure down to the fact that it had all these strange and seemingly ineffective behaviors. But then a week later he lost, and never won again.
For example, when we first started working on this problem, we tested a bot against a semi-pro player in a 1v1 game. After the first few games, he beat the bot every time and he was like, “I’m getting better faster than this thing is.” He put the bot’s failure down to the fact that it had all these strange and seemingly ineffective behaviors. But then a week later he lost, and never won again.
The funny thing is, though, those behaviors that the he thought were causing the bots to lose were still there. It turns out that the weird behaviors he was seeing weren’t important. If they were, the system would’ve bothered to get rid of them.
This has been the general shape of progress. Every time we start playing the next level human they start by thinking that the bot is really bad. Usually it’s because it displays some behavior that seems like very bad gameplay to a human. Then it starts beating them and never stops.
The core reason as to why our system gets so much better so quickly is a much more intangible thing than some particular weird behavior. You can’t just point to one bot misusing its ultimate and make a generalization about whether it will one day be able to strategize. It’s hard to pinpoint exactly why our bots get so good.
After Gary Kasparov and Lee Sedol lost their respective matches against AIs in chess and Go, both said that they found the games unpleasant. But elite Dota players seem to be much more open to the experience and more welcoming of the bots. Why do you think this is?
Chess and Go are ancient games that people have been thinking about for thousands of years. I think that amongst elite players there was this fear that if you solve it, suddenly the game would become uninteresting for humans, or undermine millennia of accumulated knowledge.
In contrast, Dota is a new game. No one thinks that they understand it fully and everyone is very eager to learn more about it. There are so many combinations of things you could do in the game, so much room for creativity that I think the Dota players are just excited to see how the bots play and what they can learn from them.
Another important difference is that unlike chess and Go, Dota is played on a computer. This makes the idea of having this digital being defeating you less scary. I mean why should a human, with these meat drumsticks holding a mouse, be able to beat a bot, which really has the home ground advantage.
What function do these competitions play in our culture, other than to play out a type of entertaining human vs. machine scenario?
People broadly do not know what or how to think about AI and its capabilities. There’s lots of confusion. I mean, take Sophia the robot [a “social robot” given citizenship status in Saudi Arabia]. There are lots of people who believe that it is a real AI, both sentient and intelligent. And people are ok with this. But if I believed Sophia was real, I would be astonished. The whole world be totally different, totally turned on its head. The fact that people believe that Sophia is real, and no one is running for the hills, that just shows me that people don’t understand how radically something like Sophia would change things for us.
While Sophia isn’t real, I do believe that eventually, algorithms not dissimilar to the ones that we’re using now are going radically change our world, but not in the way that people expect. I think that people need time to get used to what these future AI systems are going to be like. This is why I see my job as a technologist to do things like take on professional Dota players in a public showcase, even if we lose. I want to show people how these systems work, how they fail, and what it means when you achieve or just fall short of certain levels of capability.