is a method for inferring compositional
hierarchies from strings. It detects repetition and factors it out of the
string by forming rules in a grammar. The rules can be composed of
non-terminals, giving rise to a hierarchy. It is useful for recognizing lexical
structure in strings, and excels at very long sequences.
Craig Nevill-Manning, Google
Ian Witten, University of Waikato, New Zealand