Data processing for and with foundation models
Training data (data labeling, annotation, workflow) for all data types
Extract schema, statistics and entities from datasets
An open-source NLP research library, built on PyTorch
PyTorch original implementation of Cross-lingual Language Model
Tools to download and cleanup Common Crawl data
Natural Language Processing Best Practices & Examples
Dataset generation for AI chatbots, NLP tasks