THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/

The Multiword Expressions toolkit aids in the automatic identification and extraction of multiword units in running text. These include idioms (kick the bucket), noun compounds (cable car), phrasal verbs (take off, give up), etc.

Even though it focuses on multiword expresisons, the framework is quite complete and can also be useful in any corpus-based study in computational linguistics.

The mwetoolkit can be applied to virtually any text collection, language, and MWE type. It is a command-line tool written mostly in Python. Its development started in 2010 as a PhD thesis but the project keeps active (see the SVN logs).

Up-to-date documentation and details about the tool can be found on the mwetoolkit website: http://mwetoolkit.sourceforge.net/

Features

  • Multi-level RegEx patterns
  • Large corpora support
  • Association measures
  • Token-based annotation

Project Samples

Project Activity

See All Activity >

License

GNU General Public License version 3.0 (GPLv3)

Follow mwetoolkit

mwetoolkit Web Site

Other Useful Business Software
Full-stack observability with actually useful AI | Grafana Cloud Icon
Full-stack observability with actually useful AI | Grafana Cloud

Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
Create free account
Rate This Project
Login To Rate This Project

User Ratings

★★★★★
★★★★
★★★
★★
1
0
0
0
0
ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5

User Reviews

  • Works fine, easy to use, and the documentation is clear.
Read more reviews >

Additional Project Details

Operating Systems

BSD, Cygwin, Linux, Mac

Languages

English

Intended Audience

Science/Research

User Interface

Command-line

Programming Language

C, Python, Unix Shell

Database Environment

Flat-file, XML-based

Related Categories

Unix Shell Artificial Intelligence Software, Unix Shell Linguistics Software, Unix Shell Command Line Tools, Python Artificial Intelligence Software, Python Linguistics Software, Python Command Line Tools, C Artificial Intelligence Software, C Linguistics Software, C Command Line Tools

Registered

2010-04-08