RAG on Paul Graham's essays
A list of accessible speech corpora for ASR, TTS
Classical piano MIDI dataset
Reading Wikipedia to Answer Open-Domain Questions
PyTorch original implementation of Cross-lingual Language Model
PyTorch implementation of SimCLR: A Simple Framework
Tools to download and cleanup Common Crawl data
Natural Language Processing Best Practices & Examples
Unsupervised text tokenizer focused on computational efficiency
A Chinese information extraction tool
OWL/DL ontologies for linguistic annotations
Text categorization, arabic language processing, language modeling
DeepMind's Tacotron-2 Tensorflow implementation
Beautiful visualizations of how language differs among document types
We describe a simple XML format to share text documents and annotation
THIS PROJECT MIGRATED TO https://gitlab.com/mwetoolkit/mwetoolkit3/
TextBlob is a Python library for processing textual data
A repository of software, documentation and data for NLP
High-performance MoE model with MLA, MTP, and multilingual reasoning