DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL provides a declarative pipeline framework that breaks complex document analysis tasks into manageable operations that can be optimized and orchestrated automatically. Pipelines are typically defined using a low-code YAML interface, giving users full control over prompts and processing steps while still simplifying workflow creation.

Features

  • Low-code YAML interface for defining document processing pipelines
  • Specialized operators for entity resolution and contextual document analysis
  • Agent-based optimization that improves pipeline accuracy and output quality
  • Interactive development environment for experimenting with prompts and workflows
  • Python package for running production pipelines via CLI or code
  • Support for extracting structured data from large collections of unstructured documents

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow DocETL

DocETL Web Site

Other Useful Business Software
Full-stack observability with actually useful AI | Grafana Cloud Icon
Full-stack observability with actually useful AI | Grafana Cloud

Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
Create free account
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DocETL!

Additional Project Details

Programming Language

Python

Related Categories

Python Large Language Models (LLM)

Registered

2026-03-05