DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL provides a declarative pipeline framework that breaks complex document analysis tasks into manageable operations that can be optimized and orchestrated automatically. Pipelines are typically defined using a low-code YAML interface, giving users full control over prompts and processing steps while still simplifying workflow creation.

Features

  • Low-code YAML interface for defining document processing pipelines
  • Specialized operators for entity resolution and contextual document analysis
  • Agent-based optimization that improves pipeline accuracy and output quality
  • Interactive development environment for experimenting with prompts and workflows
  • Python package for running production pipelines via CLI or code
  • Support for extracting structured data from large collections of unstructured documents

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow DocETL

DocETL Web Site

Other Useful Business Software
Our Free Plans just got better! | Auth0 Icon
Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DocETL!

Additional Project Details

Programming Language

Python

Related Categories

Python Large Language Models (LLM)

Registered

2026-03-05