Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.
MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.