Monday, June 8, 2009

Generalized Suffix Trees: Java Applet

A Generalized Suffix Tree is a Suffix Tree for a set of strings.
Usage of Generalized Suffix Trees Java Applet is similar to Suffix Trees Java Applet:
  • enter 2 strings;
  • # and $ are used as strings terminators;
  • press "Build GST" button;
  • it will build suffix tree and create visual presentation in some auto calculated layout;
  • drag and drop nodes to find best layout;
  • you need Java installed and enabled in your browser;
  • applet uses Java implementation of Ukkonen's Algorithm to build Suffix Tree.




Here are most used examples of Generalized Suffix Trees:



Source code of GeneralizedSuffixTree class that is based on SuffixTree class.
public class GeneralizedSuffixTree {
// SOH - START OF HEADING
public static final char TERMINATOR1 = '\u0001';
// STX - START OF TEXT
public static final char TERMINATOR2 = '\u0002';

private String texts[];
private char terminators[];

private String generalized;
private SuffixTree suffixTree;

public GeneralizedSuffixTree(String text1, String text2) {
this(text1, text2, TERMINATOR1, TERMINATOR2);
}

public GeneralizedSuffixTree(String text1, String text2, char terminator1, char terminator2) {
this(new String[]{text1, text2}, new char[]{terminator1, terminator2});
}

public GeneralizedSuffixTree(String texts[], char terminators[]) {
this.texts = texts;
this.terminators = terminators;

StringBuilder sb = new StringBuilder();
for (int i = 0; i < texts.length; i++) {
sb.append(texts[i]);
sb.append(terminators[i]);
}
generalized = sb.toString();
suffixTree = new SuffixTree(generalized);
fixSpanSuffixes(suffixTree.getRootNode());
}

private void fixSpanSuffixes(Node node) {
for (Edge edge : node.getEdges()) {
for (int i = 0; i < texts.length; i++) {
if ((edge.getBeginIndex() <= getGeneralizedSubstringLength(i)) &&
(edge.getEndIndex() > getGeneralizedSubstringLength(i))) {

edge.setEndIndex(getGeneralizedSubstringLength(i));
continue;
}
}
fixSpanSuffixes(edge.getEndNode());
}
}

private int getGeneralizedSubstringLength(int n) {
int length = 0;
for (int i = 0; i <= n; i++) {
length += texts[i].length() + 1;
}
return length - 1;
}
}

See also: my other Suffix Trees posts

5 comments:

Marianna said...

I've found your softwares for suffix trees marvellous! Now I am affording Generalized Suffix Trees and you have put here the principal class... Could you be so kind to show us the other classes for this applet? Updated from suffix trees to generalized suffix trees. The compiler signs error when calling SuffixTree with argument "generalized".
THANKS!!!

Illya Havsiyevych said...

Marianna,

please check full sources
http://illya.yolasite.com/resources/suffix-tree-6.zip

Hui said...

Hi, do you have a c# code for suffix tree?
I have many strings, and I want to find the longest common substring among them.

Dr. Ahmet Bulut said...

While fixing suffixes that span multiple strings:

do we need to consider the last string? i.e., the for loop to run only for the indexes from 0 to texts.length()-2, rather than texts.length()-1 ?

cheers,

a

Avinash Rulz said...

where can i find the source code where you compute longest common sub string using suffix trees