Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure content into rich datasets tailored for downstream LLM training needs. The system includes automated question-generation capabilities, hierarchical label trees, and answer generation pipelines that use LLM APIs to produce coherent paired data with customizable templates. Beyond dataset creation, Easy-dataset also provides a built-in evaluation system with model testing and blind-test features, helping teams validate model performance using curated test sets.

Features

  • Document ingest and intelligent parsing (PDF, DOCX, more)
  • Automatic dataset generation for fine-tuning
  • Question and answer generation using LLMs
  • Built-in model evaluation and testing systems
  • Multiple export formats (JSON/JSONL, Hugging Face)
  • Support for diverse dataset types (dialogue, image QA)

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Easy DataSet

Easy DataSet Web Site

Other Useful Business Software
Earn up to 16% annual interest with Nexo. Icon
Earn up to 16% annual interest with Nexo.

Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
Get started with Nexo.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Easy DataSet!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

JavaScript

Related Categories

JavaScript Large Language Models (LLM)

Registered

2026-02-04