BIOINFORMATICS ALGORITHMS.pdf (Size: 5.87 MB / Downloads: 474)

This article is presented by:

NEIL C. JONES AND PAVEL A. PEVZNER

BIOINFORMATICS ALGORITHMS

INTRODUCTION
Imagine Alice, Bob, and two piles of ten rocks. Alice and Bob are bored one Saturday afternoon so they play the following game. In each turn a player may either take one rock from a single pile, or one rock from both piles. Once the rocks are taken, they are removed from play; the player that takes the last rock wins the game. Alice moves first. It is not immediately clear what the winning strategy is, or even if there is one. Does the first player (or the second) always have an advantage? Bob tries to analyze the game and realizes that there are too many variants in the game with two piles of ten rocks (which we will refer to as the 10+10 game). Using a reductionist approach, he first tries to find a strategy for the simpler 2+2 game. He quickly sees that the second player—himself, in this case—wins any 2+2 game, so he decides to write the “winning recipe”: If Alice takes one rock from each pile, I will take the remaining rocks and win. If Alice takes one rock, I will take one rock from the same pile. As a result, there will be only one pile and it will have two rocks in it, so Alice’s only choice will be to take one of them. I will take the remaining rock to win the game. Inspired by this analysis, Bob makes a leap of faith: the second player (i.e., himself) wins in any n+n game, for n ≥ 2. Of course, every hypothesis must be confirmed by experiment, so Bob plays a few rounds with Alice. It turns out that sometimes he wins and sometimes he loses. Bob tries to come up with a simple recipe for the 3+3 game, but there are a large number of different game sequences to consider, and the recipe quickly gets too complicated. There is simply no hope of writing a recipe for the 10+10 game because the number of different strategies that Alice can take is enormous. Meanwhile, Alice quickly realizes that she will always lose the 2+2 game, but she does not lose hope of finding a winning strategy for the 3+3 game. 2 1 Introduction Moreover, she took Algorithms 101 and she understands that recipes written in the style that Bob uses will not help very much: recipe-style instructions are not a sufficiently expressive language for describing algorithms. Instead, she begins by drawing the following table filled with the symbols ↑, ←, _, and ∗. The entry in position (i, j) (i.e., the ith row and the jth column) describes the moves that Alice will make in the i + j game, with i and j rocks in piles A and B respectively. A ← indicates that she should take one stone from pile B. A ↑ indicates that she should take one stone from pile A. A _ indicates that she should take one stone from each pile, and ∗ indicates that she should not bother playing the game because she will definitely lose against an opponent who has a clue.

For example, if she is faced with the 3+3 game, she finds a _ in the third row and third column, indicating that she should take a rock from each pile. This makes Bob take the first move in a 2+2 game, which is marked with a ∗. No matter what he does, Alice wins. Suppose Bob takes a rock from pile B—this leads to the 2+1 game. Alice again consults the table by reading the entry at (2,1), seeing that she should also take a rock from pile B leaving two rocks in A. However, if Bob had instead taken a rock from pile A, Alice would consult entry (1,2) to find ↑. She again should also take a rock from pile A, leaving two rocks in pile B. Impressed by the table, Bob learns how to use it to win the 10+10 game. However, Bob does not know how to construct a similar table for the 20+20 game. The problem is not that Bob is stupid, but that he has not studied algorithms. Even if, through sheer luck, Bob figured how to always win the 1 Introduction 3 20+20 game, he could neither say with confidence that it would work no matter what Alice did, nor would he even be able to write down the recipe for the general n + n game. More embarrassing to Bob is that the a general 10+10+10 game with three piles would turn into an impossible conundrum for him. There are two things Bob could do to remedy his situation. First, he could take a class in algorithms to learn how to solve problems like the rock puzzle. Second, he could memorize a suitably large table that Alice gives him and use that to play the game. Leading questions notwithstanding, what would you do as a biologist? Of course, the answer we expect to hear from most rational people is “Why in the world do I care about a game with two nerdy people and a bunch of rocks? I’m interested in biology, and this game has nothing to do with me.” This is not actually true: the rock game is in fact the ubiquitous sequence alignment problem in disguise. Although it is not immediately clear what DNA sequence alignment and the rock game have in common, the computational idea used to solve both problems is the same. The fact that Bob was not able to find the strategy for the game indicates that he does not understand how alignment algorithms work either. He might disagree if he uses alignment algorithms or BLAST1 on a daily basis, but we argue that since he failed to come up with a strategy for the 10+10 rock game, he will also fail when confronted with a new flavor of alignment problem or a particularly complex similarity analysis. More troubling to Bob, he may find it difficult to compete with the scads of new biologists who think algorithmically about biological problems.2 Many biologists are comfortable using algorithms like BLAST without really understanding how the underlying algorithm works.