DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL provides a declarative pipeline framework that breaks complex document analysis tasks into manageable operations that can be optimized and orchestrated automatically. Pipelines are typically defined using a low-code YAML interface, giving users full control over prompts and processing steps while still simplifying workflow creation.

Features

  • Low-code YAML interface for defining document processing pipelines
  • Specialized operators for entity resolution and contextual document analysis
  • Agent-based optimization that improves pipeline accuracy and output quality
  • Interactive development environment for experimenting with prompts and workflows
  • Python package for running production pipelines via CLI or code
  • Support for extracting structured data from large collections of unstructured documents

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow DocETL

DocETL Web Site

Other Useful Business Software
Go from Code to Production URL in Seconds Icon
Go from Code to Production URL in Seconds

Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
Try it free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DocETL!

Additional Project Details

Programming Language

Python

Related Categories

Python Large Language Models (LLM)

Registered

2026-03-05