Indexing and query tools for very large text corpora
The IMS Open Corpus Workbench is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP, which can be used interactively in a terminal session, as a backend e.g. from a Perl script, or through the Web-based GUI CQPweb.
- Index corpora into a compact and swiftly-searchable format (with Unicode support!)
- Search corpora efficiently using the super-fast Corpus Query Processor (CQP)
- Queries can contain regular expressions on individual words or annotations, AND across sequences of words
- Support for indexing and querying of within-text XML elements and attribute values
- Plus CQPweb: a user-friendly online interface with lots of additional features, especially suitable for teaching and for non-specialists
Thanks for great project! Simply the best.Good,good,good.+1