THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/
THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/
The Multiword Expressions toolkit aids in the automatic identification and extraction of multiword units in running text. These include idioms (kick the bucket), noun compounds (cable car), phrasal verbs (take off, give up), etc.
Even though it focuses on multiword expresisons, the framework is quite complete and can also be useful in any corpus-based study in computational linguistics.
The mwetoolkit can be...
A command line tool to extract data from xml files
XmlFind is a small tool to extract data from an xml file in a format adapted to a classical Unix Shell pipeline. Think of it as a kind of find command that act on the content of a (or a set of) xml files.