Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure content into rich datasets tailored for downstream LLM training needs. The system includes automated question-generation capabilities, hierarchical label trees, and answer generation pipelines that use LLM APIs to produce coherent paired data with customizable templates. Beyond dataset creation, Easy-dataset also provides a built-in evaluation system with model testing and blind-test features, helping teams validate model performance using curated test sets.

Features

  • Document ingest and intelligent parsing (PDF, DOCX, more)
  • Automatic dataset generation for fine-tuning
  • Question and answer generation using LLMs
  • Built-in model evaluation and testing systems
  • Multiple export formats (JSON/JSONL, Hugging Face)
  • Support for diverse dataset types (dialogue, image QA)

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Easy DataSet

Easy DataSet Web Site

Other Useful Business Software
Our Free Plans just got better! | Auth0 Icon
Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Easy DataSet!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

JavaScript

Related Categories

JavaScript Large Language Models (LLM)

Registered

20 hours ago