Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.

Features

  • Modular and extensible data processing pipeline
  • Supports data augmentation for improving model robustness
  • Predefined templates for various NLP and CV tasks
  • Scalable to large datasets and distributed computing
  • Compatible with popular deep learning frameworks
  • Open-source with community-driven contributions

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Data-Juicer

Data-Juicer Web Site

Other Useful Business Software
$300 in Free Credit Towards Top Cloud Services Icon
$300 in Free Credit Towards Top Cloud Services

Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
Get Started
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Data-Juicer!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Natural Language Processing (NLP) Tool

Registered

2025-01-21