Posts

Showing posts from May, 2020

Edit distance

Image
Biological applications often need to compare the DNA of two (or more) different organisms. A strand of DNA consists of a string of molecules called bases, where the possible bases are adenine, guanine, cytosine, and thymine. Representing each of these bases by its initial letter, we can express a strand of DNA as a string over the finite set $\{A, C, G, T\}$. For example, the DNA of one organism may be $S1 = ACCGGTCGAGTGCGCGGAAGCCGGCCGAA$, and the DNA of another organism may be $S2 = GTCGTTCGGAATGCCGTTGCTCTGTAAA$. One reason to compare two strands of DNA is to determine how “similar” the two strands are, as some measure of how closely related the two organisms are. We can, and do, define similarity in many different ways. For example, we can say that two strands are similar if the number of changes needed to turn one into the other is small. [1] We formalize this notion of similarity in our Edit distance problem. 15-5 Edit distance - Problem Statement In order to transform one sour