Sunday, May 31, 2009

Suffix Trees: Java Code as Separate Classes

In this post - Suffix Trees: Java Code as Separate Classes.
Download java source code.

Please refer previous posts about Suffix Trees:
Class diagram:

contains and indexOf functions:
public boolean contains(String str) {
    return indexOf(str) >= 0;
}
   public int indexOf(String str) {
    if (str.length() == 0)
        return -1;
   
    int index = -1;
    Node node = root;
   
    int i = 0;
    while (i<str.length()) {
        if ((node == null) || (i == text.length()))
            return -1;
   
        Edge edge = node.findEdge(str.charAt(i));
        if (edge == null)
            return -1;
   
        index = edge.getBeginIndex()-i;
        i++;
   
        for(int j=edge.getBeginIndex()+1; j<=edge.getEndIndex(); j++) {
            if (i == str.length())
                break;
            if (text.charAt(j) != str.charAt(i))
                return -1;
            i++;
        }
        node = edge.getEndNode();
    }
    return index;
}

Future plans:
  • Generalized Suffix Tree;
  • Longest Common SubString;

More information about Suffix Trees Use Cases you can find in:

2 comments:

Unknown said...

hello

your code has been a great help to understand ukkonen's algorithm a little better.

I am trying to build a program that accepts a text and a prefix and builds a prefixed subtree of the suffix tree.

For example, given cacao$ and c, it should only add cacao$ and cao$ suffixes to the suffix tree and not acao$, ao$ and o$

This - at least for my level of understanding the ukkonen algorithm - is not as easy as only calling addPrefix for certain characters of the text. Nor is it as simple as manipulating the active suffix externally (out of addPrefix).

Maybe you are aware of some other implementation/paper that accommodates this. Or can make some suggestions.

thanks in advance

Unknown said...

Hello.

I've used a highly-modified version of your code in a music composition system (as part of a university research project).

As I couldn't find licensing information anywhere, could you confirm that you're okay with this? The code shall be properly attributed.

Many thanks