Code samples
Being in a field involving programming and algorithms, some of the datastructures and algorithms frequently used in bioinformatics had to be implemented during the courses. For my part, these were usually done in Python v. <= 2.6.1 together with Jesper Møjbæk. One of the goals for the implementation tasks were that they ran in ie. squared or linear time; we usually succeded in this, except the suffix trees, which for a peculiar reason ran in
.
If you want to run the scripts with Python 3.x, some updating is requried!
Pairwise alignments
How close are two DNA or protein sequences related? It is possible to find an exact score for this with pairwise alignments. All it requires is the two sequences, a scoring scheme for the matches and mismatches, and of cause some code.
Read on here!Comparing evolutionary trees
If the pariwise alignment above was extended to compute the alignment of multiple sequences, it would be possible to create an evolutionary tree, such as the Tree of Life. But a question occures, as which sequences to use to generate the evolutionary tree or which method for creating the tree from an alignment/distance score between the sequences.
It it therefore required to be able to compare two trees, to get some objective measure of how (dis)similar a set of trees are. This is not an easy task, as the number of possible topologies of a tree increases exponentially for each added leaf. So one would think, that it would take O(n!) to test for every topology. However, using the Robinson-Fould distance metric and Days algorithm, it is possible to compare a set of trees in linear time O(n), but only on the topology, not branch length. But still, linear time!
Suffix trees
Suffix trees are a very interesting datastructure that allows one to find patterns in strings and other fancy string-operations. The naive approach to constructing them is very tedious for large strings, starting at the root and traversing downwards through the tree, looking for a place to split for each suffix, which is at best squared time. McCreight and Ukkonen have both developed construction algorithms that improve on this, and Ukkonens also allows for extending the string without rebuilding the tree.
Read on here!