Best Data Pipeline Software for OpenMetadata

Compare the Top Data Pipeline Software that integrates with OpenMetadata as of September 2025

This a list of Data Pipeline software that integrates with OpenMetadata. Use the filters on the left to add additional filters for products that have integrations with OpenMetadata. View the products that work with OpenMetadata in the table below.

What is Data Pipeline Software for OpenMetadata?

Data pipeline software helps businesses automate the movement, transformation, and storage of data from various sources to destinations such as data warehouses, lakes, or analytic platforms. These platforms provide tools for extracting data from multiple sources, processing it in real-time or batch, and loading it into target systems for analysis or reporting (ETL: Extract, Transform, Load). Data pipeline software often includes features for data monitoring, error handling, scheduling, and integration with other software tools, making it easier for organizations to ensure data consistency, accuracy, and flow. By using this software, businesses can streamline data workflows, improve decision-making, and ensure that data is readily available for analysis. Compare and read user reviews of the best Data Pipeline software for OpenMetadata currently available using the table below. This list is updated regularly.

  • 1
    dbt

    dbt

    dbt Labs

    dbt Labs helps data teams transform raw data into trusted, analysis-ready datasets faster. With dbt, analysts and engineers can collaborate on version-controlled SQL models, enforce testing and documentation standards, and deploy transformations reliably at scale. Built on modern software engineering best practices, dbt brings transparency and governance to every step of the data transformation workflow. Thousands of companies, from startups to Fortune 500 enterprises, rely on dbt to reduce data debt, increase trust, and accelerate insights across their organization. Whether you’re scaling data operations or just getting started, dbt empowers your team to move from raw data to actionable analytics with confidence.
    Starting Price: $100 per user per month
    View Software
    Visit Website
  • 2
    Apache Kafka

    Apache Kafka

    The Apache Software Foundation

    Apache Kafka® is an open-source, distributed streaming platform. Scale production clusters up to a thousand brokers, trillions of messages per day, petabytes of data, hundreds of thousands of partitions. Elastically expand and contract storage and processing. Stretch clusters efficiently over availability zones or connect separate clusters across geographic regions. Process streams of events with joins, aggregations, filters, transformations, and more, using event-time and exactly-once processing. Kafka’s out-of-the-box Connect interface integrates with hundreds of event sources and event sinks including Postgres, JMS, Elasticsearch, AWS S3, and more. Read, write, and process streams of events in a vast array of programming languages.
  • 3
    Dagster

    Dagster

    Dagster Labs

    Dagster is a next-generation orchestration platform for the development, production, and observation of data assets. Unlike other data orchestration solutions, Dagster provides you with an end-to-end development lifecycle. Dagster gives you control over your disparate data tools and empowers you to build, test, deploy, run, and iterate on your data pipelines. It makes you and your data teams more productive, your operations more robust, and puts you in complete control of your data processes as you scale. Dagster brings a declarative approach to the engineering of data pipelines. Your team defines the data assets required, quickly assessing their status and resolving any discrepancies. An assets-based model is clearer than a tasks-based one and becomes a unifying abstraction across the whole workflow.
    Starting Price: $0
  • 4
    Airbyte

    Airbyte

    Airbyte

    Airbyte is an open-source data integration platform designed to help businesses synchronize data from various sources to their data warehouses, lakes, or databases. The platform provides over 550 pre-built connectors and enables users to easily create custom connectors using low-code or no-code tools. Airbyte's solution is optimized for large-scale data movement, enhancing AI workflows by seamlessly integrating unstructured data into vector databases like Pinecone and Weaviate. It offers flexible deployment options, ensuring security, compliance, and governance across all models.
    Starting Price: $2.50 per credit
  • 5
    Fivetran

    Fivetran

    Fivetran

    Fivetran is a leading data integration platform that centralizes an organization’s data from various sources to enable modern data infrastructure and drive innovation. It offers over 700 fully managed connectors to move data automatically, reliably, and securely from SaaS applications, databases, ERPs, and files to data warehouses and lakes. The platform supports real-time data syncs and scalable pipelines that fit evolving business needs. Trusted by global enterprises like Dropbox, JetBlue, and Pfizer, Fivetran helps accelerate analytics, AI workflows, and cloud migrations. It features robust security certifications including SOC 1 & 2, GDPR, HIPAA, and ISO 27001. Fivetran provides an easy-to-use, customizable platform that reduces engineering time and enables faster insights.
  • 6
    Databricks Data Intelligence Platform
    The Databricks Data Intelligence Platform allows your entire organization to use data and AI. It’s built on a lakehouse to provide an open, unified foundation for all data and governance, and is powered by a Data Intelligence Engine that understands the uniqueness of your data. The winners in every industry will be data and AI companies. From ETL to data warehousing to generative AI, Databricks helps you simplify and accelerate your data and AI goals. Databricks combines generative AI with the unification benefits of a lakehouse to power a Data Intelligence Engine that understands the unique semantics of your data. This allows the Databricks Platform to automatically optimize performance and manage infrastructure in ways unique to your business. The Data Intelligence Engine understands your organization’s language, so search and discovery of new data is as easy as asking a question like you would to a coworker.
  • 7
    Apache Airflow

    Apache Airflow

    The Apache Software Foundation

    Airflow is a platform created by the community to programmatically author, schedule and monitor workflows. Airflow has a modular architecture and uses a message queue to orchestrate an arbitrary number of workers. Airflow is ready to scale to infinity. Airflow pipelines are defined in Python, allowing for dynamic pipeline generation. This allows for writing code that instantiates pipelines dynamically. Easily define your own operators and extend libraries to fit the level of abstraction that suits your environment. Airflow pipelines are lean and explicit. Parametrization is built into its core using the powerful Jinja templating engine. No more command-line or XML black-magic! Use standard Python features to create your workflows, including date time formats for scheduling and loops to dynamically generate tasks. This allows you to maintain full flexibility when building your workflows.
  • Previous
  • You're on page 1
  • Next