How to approach HackerEarth bot challenges
The bot challenges on HackerEarth have so far all been for two players, with perfect information and with no chance. Because of this, there is only a fairly limited number of different approaches one can take to build a bot. As far as I know, there are only two reasonable strategies, and I will cover both of them in this tutorial.
Alpha-beta pruning
This is the more intuitive of the two strategies, and it is probably the easier approach for someone with little experience of writing bots. Alpha-beta pruning is a fancier variant of the idea that the best move is the one which puts the opponent in the worst position. Taken one step further, we would like to make the move which ensures that the opponent can’t make a move which puts our bot in a bad position. Of course, we can continue in the same manner and try to find the move which makes the position as good as possible for our bot some xmoves in the future, and the resulting algorithm is called Minimax or Negamax (the difference between the two is just the signs used in the return values).
Evaluation functions
One difficulty is how to know what constitutes a good position. Ideally, we would want to be able to look at a position and immediately tell which player would win if both played optimally, but for sufficiently complex games we can’t do that efficiently. Instead, we have to come up with some heuristic which tells us roughly how good the position is. This heuristic is known as an evaluation function.
If a position seems to be very good for the player A whose turn it is, the evaluation function should return a large positive value, whereas if it’s bad for A (and thus good for A‘s opponent B) the evaluation function should return a negative value. For example, in the game Reversi – Bot challenge #5, a decent evaluation function in the beginning of the game is to return the number of moves A can make minus the number of moves B would be able to make if it was their turn. A better evaluation function might also take into account that some squares are better than others, so each square on the board could be assigned a value which the evaluation function awards the player in control of that square.
An evaluation function can be very complex, so that it can take into account all the features of the board that may be important for deciding how good a position is. However, there are drawbacks to making a complicated evaluation function:
· Writing a complicated evaluation function takes more time and there’s a greater risk of bugs.
· A complicated evaluation function typically contains lots of parameters which you need to tune. For example, it might be difficult to know exactly how good a corner actually is in Reversi, especially if you’re not an expert player yourself.
· Computing a complicated evaluation function typically takes more time than a simple one, which means that the bot won’t be able to search as many moves ahead.
In the end, my evaluation functions often turn out quite complex, but it is important to start simple and gradually refine the evaluation function, always making sure that the changes you make actually are improvements (e.g. by letting the new version of your bot play against an old one many times).
Pruning
Alpha-beta pruning is the same thing as Minimax except that it avoids searching certain branches of the search tree. Alpha-beta pruning and Minimax always return the same result when searching to the same depth, but Alpha-beta pruning is often much faster.
To illustrate the idea behind Alpha-beta pruning, let’s assume that we’re searching two moves ahead (i.e. we try to maximize the evaluation function after both A and B have made one move) and that we have already found that if A makes the move m1, then B‘s best response leads to a position for which the evaluation function returns x1. Now we would like to know if the move m2 is even better than m1. Therefore, we go through each of B‘s responses to m2, and note how good the resulting positions are. If we find a response for B which results in the evaluation function returning a worse (smaller) value than x1, then we can immediately conclude that m2 is a worse move than m1 because at least one of B‘s responses to m2 puts A in a worse position than any of B‘s responses to m1. This way, we don’t even need to consider the remaining moves, so if we find B‘s refutation early on we can avoid evaluating many positions, thereby saving precious time.
As noted above, alpha-beta pruning is most effective if the best moves are searched first. If the move ordering is optimal (i.e. we always search the best move first), alpha-beta pruning roughly doubles the search depth compared to minimax, but even if the move ordering is completely random, alpha-beta pruning provides a significant improvement (the search depth is increased by about a factor 1/3).
Monte Carlo tree search
Despite its appearance, Monte Carlo tree search (MCTS) is in many ways easier than alpha-beta pruning, because it doesn’t require an evaluation function. Instead, MCTS relies on simulations of how the rest of the game might be played out.
Each simulation is divided into two phases: in the beginning, we can use information from earlier simulations to guess which moves are good and which aren’t, but in the end of the simulation we don’t have that information and therefore we make random moves instead.
The second phase is easy, but how do we use the available information to guide the selection of moves in the beginning of the simulation? It’s not a good idea to just select the move with the best win ratio, because then we could end up trying the best move only once, never trying it again because the first simulation happened to result in a loss. A good trade-off between choosing the move giving the best results so far and exploring new moves is given by the formula
where N is the number of simulations involving the current position and n is the number of simulations in which we selected that particular move. The second, slightly weird, term ensures that as N tends to infinity, so will n. This means that we will eventually investigate all the available moves thoroughly, but because of the first term, the algorithm will prefer the moves that look better.
After calling this function a large number of times from the root node, we can simply let the bot play the move giving the best win ratio. A more robust alternative is to pick the move with the most simulations.
MCTS performs better than alpha-beta pruning in games where it is difficult to design a good evaluation function, the most notable example being Go. On the other hand, if you can create an accurate evaluation function, then alpha-beta pruning is probably the way to go because using random simulations tends to be a rather inefficient way to get statistics, so a bot based on alpha-beta pruning is generally capable of searching much deeper than a bot based on MCTS.