Code release for "Detecting Twenty-thousand Classes
Unicode XML TEI text analysis platform
Editing large language models within 10 seconds
Repo for external large-scale work
RAG on Paul Graham's essays
A list of accessible speech corpora for ASR, TTS
Linking Language to Knowledge with Distributional Semantics
The Linguistic Analyzer is a tool for corpus analysis and comparison
Classical piano MIDI dataset
Set of tests for fuzzing engines
Reading Wikipedia to Answer Open-Domain Questions
PyTorch original implementation of Cross-lingual Language Model
PyTorch implementation of SimCLR: A Simple Framework
Tools to download and cleanup Common Crawl data
American fuzzy lop - a security-oriented fuzzer
A recommender system for discovering GitHub repos
Natural Language Processing Best Practices & Examples
Text Analysis Egyptian Schoolbooks
Unsupervised text tokenizer focused on computational efficiency
A Chinese information extraction tool
OWL/DL ontologies for linguistic annotations
Phrase-Based & Neural Unsupervised Machine Translation
@Note2 - A workbench for Biomedical Text Mining