Showing 14 open source projects for "etl."

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    Unstract

    Unstract

    No-code LLM Platform to launch APIs and ETL Pipelines

    ...It integrates a visual Prompt Studio environment where users can iteratively design extraction schemas, compare outputs from different models, and monitor costs and accuracy side by side, making it easier to refine prompts and extraction logic before deploying at scale. Unstract supports deploying structured extraction as REST API endpoints or embedding it into data engineering ETL pipelines, which allows it to plug directly into data warehouses, cloud storage, or downstream analytics systems. Its platform works with a broad variety of file types — from PDFs and spreadsheets to images — and includes integrations with databases, cloud storage providers, and vector databases.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 2
    Rubix ML

    Rubix ML

    A high-level machine learning and deep learning library for PHP

    Rubix ML is a free open-source machine learning (ML) library that allows you to build programs that learn from your data using the PHP language. We provide tools for the entire machine learning life cycle from ETL to training, cross-validation, and production with over 40 supervised and unsupervised learning algorithms. In addition, we provide tutorials and other educational content to help you get started using ML in your projects. Our intuitive interface is quick to grasp while hiding alot of power and complexity. Write less code and iterate faster leaving the hard stuff to us. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    ...It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources. Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Daft

    Daft

    Distributed DataFrame for Python designed for the cloud

    Daft is a framework for ETL, analytics and ML/AI at scale. Its familiar Python Dataframe API is built to outperform Spark in performance and ease of use. Daft plugs directly into your ML/AI stack through efficient zero-copy integrations with essential Python libraries such as Pytorch and Ray. It also allows requesting GPUs as a resource for running models. Daft runs locally with a lightweight multithreaded backend.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • 5
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 6
    Hamilton DAGWorks

    Hamilton DAGWorks

    Helps scientists define testable, modular, self-documenting dataflow

    ...As shown below, it results in readable code that can always be visualized. Hamilton loads that definition and automatically builds the DAG for you. Hamilton brings modularity and structure to any Python application moving data: ETL pipelines, ML workflows, LLM applications, RAG systems, BI dashboards, and the Hamilton UI allows you to automatically visualize, catalog, and monitor execution.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Instill Core

    Instill Core

    Instill Core is a full-stack AI infrastructure tool for data

    ...It provides an end-to-end solution that enables developers to build, deploy, and manage AI-powered applications without needing to manually stitch together multiple tools across the data and model lifecycle. The platform focuses heavily on handling unstructured data such as documents, images, audio, and video, transforming them into AI-ready formats through integrated ETL pipelines and processing workflows. Instill Core includes modular components such as pipelines, artifacts, and model services, which work together to enable flexible and scalable AI system design. It also supports retrieval-augmented generation workflows and model deployment without requiring complex GPU infrastructure management.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    Superduper

    Superduper

    Superduper: Integrate AI models and machine learning workflows

    ...Developers may leverage Superduper by building compositional and declarative objects that out-source the details of deployment, orchestration versioning, and more to the Superduper engine. This allows developers to completely avoid implementing MLOps, ETL pipelines, model deployment, data migration, and synchronization. Using Superduper is simply "CAPE": Connect to your data, apply arbitrary AI to that data, package and reuse the application on arbitrary data, and execute AI-database queries and predictions on the resulting AI outputs and data.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    DocETL

    DocETL

    A system for agentic LLM-powered data processing and ETL

    DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL...
    Downloads: 7 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 10
    Datapipe

    Datapipe

    Real-time, incremental ETL library for ML with record-level depend

    Datapipe is a real-time, incremental ETL library for Python with record-level dependency tracking. Datapipe is designed to streamline the creation of data processing pipelines. It excels in scenarios where data is continuously changing, requiring pipelines to adapt and process only the modified data efficiently. This library tracks dependencies for each record in the pipeline, ensuring minimal and efficient data processing.
    Downloads: 177 This Week
    Last Update:
    See Project
  • 11
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    HPCC Systems

    HPCC Systems

    End-to-end big data in a massively scalable supercomputing platform.

    Important: As of April 20, 2026, this project can now be found at https://github.com/hpcc-systems/HPCC-Platform/releases. HPCC Systems® (www.hpccsystems.com) from LexisNexis® Risk Solutions is a proven, open source solution for Big Data insights that can be implemented by businesses of all sizes. With HPCC Systems, developers can design applications with Big Data at their core, enabling businesses to better analyze and understand data at scale, improving business time to results and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    MentDB Projects

    MentDB Projects

    Generalized Interoperability and Strong AI

    ...Our architecture is built around the revolutionary Mentalese Query Language (MQL). MentDB Weak (Generalized Interoperability): A unified data layer enabling seamless data exchange and application integration (SOA, ETL, Data Quality). We eliminate data silos through a single, generalized data language. MentDB Strong (Strong AI / AGI): The framework for exploring and building Machine Consciousness, free will, and advanced ethical reasoning systems. Based on new-generation AI algorithms.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    cocoNLP

    cocoNLP

    A Chinese information extraction tool

    ...The project blends pattern-based methods with NLP heuristics, giving developers dependable results for real-world texts like chats, comments, and user-generated content. Its API is intentionally simple, so you can drop it into scripts, ETL jobs, or dashboards without deep ML expertise. Because it aims at utility over complexity, it’s useful for prototyping data products or building lightweight text analytics where large models would be overkill. The repository also includes examples and test snippets to help you understand expected inputs and typical outputs, which shortens the learning curve for newcomers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo