Saturday, April 18, 2009

Suffix Trees: Java Ukkonen's Algorithm

Suffix Tree is a data structure that presents the suffixes of a given string in a way that allows for a particularly fast implementation of many important string operations. Details...

Ukkonen's algorithm begins with an implicit suffix tree containing the first character of the string. Then it steps through the string adding successive characters until the tree is complete. This order addition of characters gives Ukkonen's algorithm its "on-line" property; earlier algorithms proceeded backward from the last character. The implementation of this algorithm requires O(n) (linear time).

Check Suffix Tree Java Source Code
This is a Java-port of Mark Nelson's C++ implementation of Ukkonen's algorithm.

Test run and validation is moved to SuffixTreeTest JUnit Test Case.
Could be tested with Maven or from IDE.

mvn test
...
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running com.blogspot.illyakeeplearning.suffixtree.SuffixTreeTest
Start End   Suf   First Last  String
0     5     0     1     1     a
0     3     5     0     1     ca
0     7     -1    4     4     o
3     1     -1    2     4     cao
3     4     -1    4     4     o
5     2     -1    2     4     cao
5     6     -1    4     4     o
Suffix : acao
comparing: acao to acao
Suffix : ao
comparing: ao to ao
Suffix : cacao
comparing: cacao to cacao
Suffix : cao
comparing: cao to cao
Suffix : o
comparing: o to o
All Suffixes present!
Leaf count : 5 OK
Branch count : 7 OK
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.032 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

Future plans:
  • refactor to have less global variables
  • reuse java hash
References: