lunduniversity.lu.se

Computer Science

Faculty of Engineering, LTH

Denna sida på svenska This page in English

The Maximum Common Subsequence Problem

A sequence is a subsequence of another sequence if it can be obtained by deleting zero or more elements from that sequence. A classic problem in computer science is to find the maximal common subsequnce of two sequences, and its length can be used as a similarity measure for sequences. For example the maximal common subsequence of the two lists [3,2,8,2,3,9,4,3,9] and [1,3,2,3,7,9] is [3,2,3,9] which has length 4.

A first attempt at a solution to this problem is given by the function mcsLength' below. In the case where first element of the two lists are equal, they must be part of the common subsequence. The result is then simply one plus the length of the mcs of the remainder of the lists. And if the first elements are not equal, the result is the maximum of the two cases obtained be removing each of the two front elements.

  mcsLength' :: Eq a => [a] -> [a] -> Int
  mcsLength' _ [] = 0
  mcsLength' [] _ = 0
  mcsLength' (x:xs) (y:ys)
    | x == y    = 1 + mcsLength' xs ys
    | otherwise = max (mcsLength' xs (y:ys)) 
                      (mcsLength' (x:xs) ys)

The problem with this solution is that the two recursive expressions on the last line can both result in evaluations of mcsLength' xs ys. As the recursion unfolds this will lead to an enormous amount of redundant computations.

The solution is to store the intermediate results in a table where position i  j contains the length of the maximum common subsequence of the suffixes of the two strings of length i and j respectively. For the two lists in the example above the upper left part of the table would look like this (actually, this is an even smaller example table for two lists [3,2,8,2,3] and [1,3,2,3] that are sublists of the lists above. It matters because the indices of the table are created backwards!):

[]        
3:[]      
2:[3]    
3:[2,3]  
1:[3,2,3]
[]         
 0
 0
 0
 0
 0
3:[]       
 0
 1
 1
 1
 1
2:[3]      
 0
 1
 2
 2
 2
8:[2,3]    
 0
 1
 2
 2
 2
2:[8,2,3]  
 0
 1
 2
 2
 2
3:[2,8,2,3]
 0
 1
 2
 3
 3

Recursive calls can now be replaced with lookups in this table. The code below shows an implementation of mcsLength which uses this method.

mcsLength :: Eq a => [a] -> [a] -> Int
mcsLength xs ys = mcsLen (length xs) (length ys)
  where
    mcsLen i j = mcsTable!!i!!j
    mcsTable = [[ mcsEntry i j | j<-[0..]] | i<-[0..] ]
       
    mcsEntry :: Int -> Int -> Int
    mcsEntry _ 0 = 0
    mcsEntry 0 _ = 0
    mcsEntry i j
      | x == y    = 1 + mcsLen (i-1) (j-1)
      | otherwise = max (mcsLen i (j-1)) 
                        (mcsLen (i-1) j)
      where
         x = xs!!(i-1)
         y = ys!!(j-1)