A suffix tree is a useful data structure for doing very powerful searches on text strings. For example, it's probably possible to design a Python dictionary interface that accepts substrings of keys, and return a list of possible keys. Very very cool stuff. (I wonder if this is what Perl's study function does.) SuffixTree is a wrapper that allows Python programmers to play with suffix trees.
(February 14, 2002)But first, I have to understand this crazy structure. At the moment, I'm trying to make the Strmat library accessible from Python by using the SWIG library. I hope that I can get this done quickly, as I'm really interested to see how the darn thing works! *grin*
My first attempt at a partial wrapping is here: py_strmat-0.5.tar.gz.
(February 18, 2002) Well, I hacked at it a little more, and it now looks almost functional: pystrmat-0.6.tar.gz. I'm very excited; I've written a small "SubstringDict" class that allows one to make a relaxed dictionary:
This looks simple enough, but this dictionary is actually traversing the SuffixTree structure from the strmat library! I expect this structure to scale very well when our keys are things like Journal abstracts. I'll do some performance tests tonight, and work on the Perl binding to strmat. Looks like I can move the SuffixTree stuff onto a separate page when this is all done.>>> import SubstringDict >>> d = SubstringDict.SubstringDict() >>> d['foobar'] = 1 >>> d['barfoo'] = 2 >>> d['forget'] = 3 >>> d['arfbag'] = 4 >>> d['a'] [1, 2, 4] >>> d['arf'] [2, 4] >>> d['oo'] [1, 2] >>> d['food'] []
(February 22, 2002) Ok, I made the wrapper a little less ugly inside; it shouldn't be a leaky faucet in terms of memory either. I've used distutils to package the whole thing.
python setup.py install should be enough to install this package. I can't seem to make my mind up what to call this thing, but I think I'll stick with ``SuffixTree'' just because it's easier to remember than 'strmat'.
(November 15, 2002) spex66 contributed a bug fix to allow SuffixTree to compile under Windows. (Actually, he wrote the patch a long time ago, but I had forgotten to put it up! My apologies.) This will packaged as SuffixTree-0.7.1.zip. Thank you!
Any suggestions or improvements would be greatly appreciated!
Back to Python.