DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL provides a declarative pipeline framework that breaks complex document analysis tasks into manageable operations that can be optimized and orchestrated automatically. Pipelines are typically defined using a low-code YAML interface, giving users full control over prompts and processing steps while still simplifying workflow creation.

Features

  • Low-code YAML interface for defining document processing pipelines
  • Specialized operators for entity resolution and contextual document analysis
  • Agent-based optimization that improves pipeline accuracy and output quality
  • Interactive development environment for experimenting with prompts and workflows
  • Python package for running production pipelines via CLI or code
  • Support for extracting structured data from large collections of unstructured documents

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow DocETL

DocETL Web Site

Other Useful Business Software
$300 in Free Credit Towards Top Cloud Services Icon
$300 in Free Credit Towards Top Cloud Services

Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
Get Started
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DocETL!

Additional Project Details

Programming Language

Python

Related Categories

Python Large Language Models (LLM)

Registered

2026-03-05