Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
NVIDIA NeMo Curator 0.8.0 source code.tar.gz | 2025-05-09 | 4.6 MB | |
NVIDIA NeMo Curator 0.8.0 source code.zip | 2025-05-09 | 5.0 MB | |
README.md | 2025-05-09 | 310 Bytes | |
Totals: 3 Items | 9.6 MB | 0 |
- Llama Based PII Redaction
- Trafilatura Text Extractor
- Chinese & Japanese Stopwords for Text Extractors
- Writing gzip compressed jsonl datasets
- Training dataset curation for retriever customization using hard-negative mining
- Implemented a memory efficient pairwise similarity in Semantic Deduplication