A full spaCy pipeline and models for scientific/biomedical documents
The Classical Language Toolkit
Chinese XLNet pre-trained model
Resources, corpora, and tools for Chinese natural language processing
Unicode XML TEI text analysis platform
PyTorch original implementation of Cross-lingual Language Model
Tools to download and cleanup Common Crawl data
Natural Language Processing Best Practices & Examples
Unsupervised text tokenizer focused on computational efficiency
A Chinese information extraction tool
OWL/DL ontologies for linguistic annotations
Text categorization, arabic language processing, language modeling
We describe a simple XML format to share text documents and annotation
TextBlob is a Python library for processing textual data
A repository of software, documentation and data for NLP