Training data (data labeling, annotation, workflow) for all data types
Extract schema, statistics and entities from datasets
Data processing for and with foundation models
Evaluation code for various unsupervised automated metrics
Unified embedding model
An open-source NLP research library, built on PyTorch
PyTorch original implementation of Cross-lingual Language Model
Tools to download and cleanup Common Crawl data
Natural Language Processing Best Practices & Examples
Named-entity recognition using neural networks
Text categorization, arabic language processing, language modeling