Suffix Tree is a data structure that presents the suffixes of a given string in a way that allows for a particularly fast implementation of many important string operations. Details...
Ukkonen's algorithm begins with an implicit suffix tree containing the first character of the string. Then it steps through the string adding successive characters until the tree is complete. This order addition of characters gives Ukkonen's algorithm its "on-line" property; earlier algorithms proceeded backward from the last character. The implementation of this algorithm requires O(n) (linear time).
Check Suffix Tree Java Source Code
This is a Java-port of Mark Nelson's C++ implementation of Ukkonen's algorithm.
Test run and validation is moved to SuffixTreeTest JUnit Test Case.
Could be tested with Maven or from IDE.
Future plans:
Ukkonen's algorithm begins with an implicit suffix tree containing the first character of the string. Then it steps through the string adding successive characters until the tree is complete. This order addition of characters gives Ukkonen's algorithm its "on-line" property; earlier algorithms proceeded backward from the last character. The implementation of this algorithm requires O(n) (linear time).
Check Suffix Tree Java Source Code
This is a Java-port of Mark Nelson's C++ implementation of Ukkonen's algorithm.
Test run and validation is moved to SuffixTreeTest JUnit Test Case.
Could be tested with Maven or from IDE.
mvn test ... ------------------------------------------------------- T E S T S ------------------------------------------------------- Running com.blogspot.illyakeeplearning.suffixtree.SuffixTreeTest Start End Suf First Last String 0 5 0 1 1 a 0 3 5 0 1 ca 0 7 -1 4 4 o 3 1 -1 2 4 cao 3 4 -1 4 4 o 5 2 -1 2 4 cao 5 6 -1 4 4 o Suffix : acao comparing: acao to acao Suffix : ao comparing: ao to ao Suffix : cacao comparing: cacao to cacao Suffix : cao comparing: cao to cao Suffix : o comparing: o to o All Suffixes present! Leaf count : 5 OK Branch count : 7 OK Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.032 sec Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
Future plans:
- refactor to have less global variables
- reuse java hash
14 comments:
see also Suffix Trees: Refactored Java Code
Does it work for "banana"?
Under which license this code is available (GPL, LGPL, BSD, try before you buy, non commercial) ? This matters a lot if somebody wants to try to use the code in some non-hobby and non-research project.
I decided to check the algorithm more seriously by extending automated test that initially passes. For that, I replaced the single pre-defined string with the loop over multiple, randomly generated strings, the beginning of the test now looking as
import static java.lang.System.out;
import static java.lang.System.err;
import java.util.Random;
public class SuffixTreeTest extends TestCase {
String line;
Random r = new Random(System.currentTimeMillis());
@Override
protected void setUp() throws Exception {
super.setUp();
}
public void testSeriously() {
StringBuilder b = new StringBuilder();
for (int i = 4; i < 30; i++) {
b.setLength(0);
for (int p = 0; p < i; p++) {
b.append( (char) ('a'+r.nextInt('z'-'a')));
}
b.append('$');
line = b.toString();
System.out.println("Test "+line);
_testSuffixTree();
}
}
(..)
Such extended version reports numerous failures on various strings, always with the same error message Logic error on node 1, not a leaf or internal node!. If the author is still active, it would be really nice to get some feedback on this result, or maybe even the corrected version.
P.S. The test string in the test must be terminated by the unique termination character. While cacao passes without it (as "o" is unique in this word), banana does not. However banana$ passes without issues. The examples of the failing string would be nboggh$ or vejuw$.
Hi,
Do you have the algorithm to extract the maximal repeats from the suffix tree.
Thanks,
Luay Alawneh
Thanks Ilya, btw is your tree building in quadratic or linear time?
I really like the implementation. it is fast, and well-coded and has unit tests which make it so easy to understand and use.
Prabhakar
prabhakar.srinivasan@gmail.com
Is not possible download code.
http://illya.yolasite.com/resources/suffix-tree.zip
403: Forbiden.
Not possible to download the code
Not possible to download the code
Man, is your Java code secret now? Download is not allowed.
Dear all,
I was a bit busy last years.
But now I was able to find original sources and upload to GitHub
Great post, keep sharing...
Derek Surabaya | Derek Sidoarjo | Derek Probolinggo | Derek Pasuruan | Derek Banyuwangi | Malang | Derek Lumajang | Derek Bali | Derek Caruban | Derek Mojokerto | Derek Nganjuk | Derek Jombang | Derek Madiun | Derek Ngawi | Derek Sragen | Derek Solo | Derek Blora | Derek Cepu | Derek Jogja | Derek Salatiga | Derek Semarang | Derek Jakarta | Derek Bekasi | Derek Bandung | Derek Bogor | Derek Cirebon
Post a Comment