Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure content into rich datasets tailored for downstream LLM training needs. The system includes automated question-generation capabilities, hierarchical label trees, and answer generation pipelines that use LLM APIs to produce coherent paired data with customizable templates. Beyond dataset creation, Easy-dataset also provides a built-in evaluation system with model testing and blind-test features, helping teams validate model performance using curated test sets.

Features

  • Document ingest and intelligent parsing (PDF, DOCX, more)
  • Automatic dataset generation for fine-tuning
  • Question and answer generation using LLMs
  • Built-in model evaluation and testing systems
  • Multiple export formats (JSON/JSONL, Hugging Face)
  • Support for diverse dataset types (dialogue, image QA)

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Easy DataSet

Easy DataSet Web Site

Other Useful Business Software
Gemini 3 and 200+ AI Models on One Platform Icon
Gemini 3 and 200+ AI Models on One Platform

Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Easy DataSet!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

JavaScript

Related Categories

JavaScript Large Language Models (LLM)

Registered

2026-02-04