The Maximum Common Subsequence Problem
A sequence is a subsequence of another sequence if it can be obtained by
deleting zero or more elements from that sequence. A classic problem in computer
science is to find the maximal common subsequnce of two sequences, and its
length can be used as a similarity measure for sequences. For example the maximal
common subsequence of the two lists [3,2,8,2,3,9,4,3,9]
and
[1,3,2,3,7,9]
is [3,2,3,9]
which has length 4.
A first attempt at a solution to this problem is given by the function
mcsLength'
below. In the case where first element of the two lists are
equal, they must be part of the common subsequence. The result is then simply one plus
the length of the mcs of the remainder of the lists. And if the first elements are not
equal, the result is the maximum of the two cases obtained be removing each of the two
front elements.
mcsLength' :: Eq a => [a] -> [a] -> Int mcsLength' _ [] = 0 mcsLength' [] _ = 0 mcsLength' (x:xs) (y:ys) | x == y = 1 + mcsLength' xs ys | otherwise = max (mcsLength' xs (y:ys)) (mcsLength' (x:xs) ys)
The problem with this solution is that the two recursive expressions on the last
line can both result in evaluations of mcsLength' xs ys
. As the
recursion unfolds this will lead to an enormous amount of redundant computations.
The solution is to store the intermediate results in a table where position i j contains the length of the maximum common subsequence of the suffixes of the two strings of length i and j respectively. For the two lists in the example above the upper left part of the table would look like this (actually, this is an even smaller example table for two lists [3,2,8,2,3] and [1,3,2,3] that are sublists of the lists above. It matters because the indices of the table are created backwards!):
[] 3:[] 2:[3] 3:[2,3] 1:[3,2,3] [] 0 0 0 0 0 3:[] 0 1 1 1 1 2:[3] 0 1 2 2 2 8:[2,3] 0 1 2 2 2 2:[8,2,3] 0 1 2 2 2 3:[2,8,2,3] 0 1 2 3 3
Recursive calls can now be replaced with lookups in this table. The code below shows an implementation of mcsLength which uses this method.
mcsLength :: Eq a => [a] -> [a] -> Int mcsLength xs ys = mcsLen (length xs) (length ys) where mcsLen i j = mcsTable!!i!!j mcsTable = [[ mcsEntry i j | j<-[0..]] | i<-[0..] ] mcsEntry :: Int -> Int -> Int mcsEntry _ 0 = 0 mcsEntry 0 _ = 0 mcsEntry i j | x == y = 1 + mcsLen (i-1) (j-1) | otherwise = max (mcsLen i (j-1)) (mcsLen (i-1) j) where x = xs!!(i-1) y = ys!!(j-1)