Business Software for Apache Airflow - Page 2

Top Software that integrates with Apache Airflow as of October 2025 - Page 2

  • 1
    IRI Voracity

    IRI Voracity

    IRI, The CoSort Company

    Voracity is the only high-performance, all-in-one data management platform accelerating AND consolidating the key activities of data discovery, integration, migration, governance, and analytics. Voracity helps you control your data in every stage of the lifecycle, and extract maximum value from it. Only in Voracity can you: 1) CLASSIFY, profile and diagram enterprise data sources 2) Speed or LEAVE legacy sort and ETL tools 3) MIGRATE data to modernize and WRANGLE data to analyze 4) FIND PII everywhere and consistently MASK it for referential integrity 5) Score re-ID risk and ANONYMIZE quasi-identifiers 6) Create and manage DB subsets or intelligently synthesize TEST data 7) Package, protect and provision BIG data 8) Validate, scrub, enrich and unify data to improve its QUALITY 9) Manage metadata and MASTER data. Use Voracity to comply with data privacy laws, de-muck and govern the data lake, improve the reliability of your analytics, and create safe, smart test data
  • 2
    Datakin

    Datakin

    Datakin

    Instantly reveal the order hidden within your complex data world, and always know exactly where to look for answers. Datakin automatically traces data lineage, showing your entire data ecosystem in a rich visual graph. It clearly illustrates the upstream and downstream relationships for each dataset. The Duration tab summarizes a job’s performance in a Gantt-style chart along with its upstream dependencies, making it easy to find bottlenecks. When you need to pinpoint the exact moment of a breaking change, the Compare tab shows how your jobs and datasets have changed between runs. Sometimes jobs that run successfully produce bad output. The Quality tab surfaces critical data quality metrics, showing how they change over time so anomalies become obvious. Datakin helps you find the root cause of issues quickly – and prevent new ones from occurring.
    Starting Price: $2 per month
  • 3
    Google Cloud Composer
    Cloud Composer's managed nature and Apache Airflow compatibility allows you to focus on authoring, scheduling, and monitoring your workflows as opposed to provisioning resources. End-to-end integration with Google Cloud products including BigQuery, Dataflow, Dataproc, Datastore, Cloud Storage, Pub/Sub, and AI Platform gives users the freedom to fully orchestrate their pipeline. Author, schedule, and monitor your workflows through a single orchestration tool—whether your pipeline lives on-premises, in multiple clouds, or fully within Google Cloud. Ease your transition to the cloud or maintain a hybrid data environment by orchestrating workflows that cross between on-premises and the public cloud. Create workflows that connect data, processing, and services across clouds to give you a unified data environment.
    Starting Price: $0.074 per vCPU hour
  • 4
    Amazon MWAA
    Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Airflow that makes it easier to set up and operate end-to-end data pipelines in the cloud at scale. Apache Airflow is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as “workflows.” With Managed Workflows, you can use Airflow and Python to create workflows without having to manage the underlying infrastructure for scalability, availability, and security. Managed Workflows automatically scales its workflow execution capacity to meet your needs, and is integrated with AWS security services to help provide you with fast and secure access to data.
    Starting Price: $0.49 per hour
  • 5
    Telmai

    Telmai

    Telmai

    A low-code no-code approach to data quality. SaaS for flexibility, affordability, ease of integration, and efficient support. High standards of encryption, identity management, role-based access control, data governance, and compliance standards. Advanced ML models for detecting row-value data anomalies. Models will evolve and adapt to users' business and data needs. Add any number of data sources, records, and attributes. Well-equipped for unpredictable volume spikes. Support batch and streaming processing. Data is constantly monitored to provide real-time notifications, with zero impact on pipeline performance. Seamless boarding, integration, and investigation experience. Telmai is a platform for the Data Teams to proactively detect and investigate anomalies in real time. A no-code on-boarding. Connect to your data source and specify alerting channels. Telmai will automatically learn from data and alert you when there are unexpected drifts.
  • 6
    Chalk

    Chalk

    Chalk

    Powerful data engineering workflows, without the infrastructure headaches. Complex streaming, scheduling, and data backfill pipelines, are all defined in simple, composable Python. Make ETL a thing of the past, fetch all of your data in real-time, no matter how complex. Incorporate deep learning and LLMs into decisions alongside structured business data. Make better predictions with fresher data, don’t pay vendors to pre-fetch data you don’t use, and query data just in time for online predictions. Experiment in Jupyter, then deploy to production. Prevent train-serve skew and create new data workflows in milliseconds. Instantly monitor all of your data workflows in real-time; track usage, and data quality effortlessly. Know everything you computed and data replay anything. Integrate with the tools you already use and deploy to your own infrastructure. Decide and enforce withdrawal limits with custom hold times.
    Starting Price: Free
  • 7
    Foundational

    Foundational

    Foundational

    Identify code and optimization issues in real-time, prevent data incidents pre-deploy, and govern data-impacting code changes end to end—from the operational database to the user-facing dashboard. Automated, column-level data lineage, from the operational database all the way to the reporting layer, ensures every dependency is analyzed. Foundational automates data contract enforcement by analyzing every repository from upstream to downstream, directly from source code. Use Foundational to proactively identify code and data issues, find and prevent issues, and create controls and guardrails. Foundational can be set up in minutes with no code changes required.
  • 8
    Orchestra

    Orchestra

    Orchestra

    Orchestra is a Unified Control Plane for Data and AI Operations, designed to help data teams build, deploy, and monitor workflows with ease. It offers a declarative framework that combines code and GUI, allowing users to implement workflows 10x faster and reduce maintenance time by 50%. With real-time metadata aggregation, Orchestra provides full-stack data observability, enabling proactive alerting and rapid recovery from pipeline failures. It integrates seamlessly with tools like dbt Core, dbt Cloud, Coalesce, Airbyte, Fivetran, Snowflake, BigQuery, Databricks, and more, ensuring compatibility with existing data stacks. Orchestra's modular architecture supports AWS, Azure, and GCP, making it a versatile solution for enterprises and scale-ups aiming to streamline their data operations and build trust in their AI initiatives.
  • 9
    OpenMetadata

    OpenMetadata

    OpenMetadata

    OpenMetadata is an open, unified metadata platform that centralizes all metadata for data discovery, observability, and governance in a single interface. It leverages a Unified Metadata Graph and 80+ turnkey connectors to collect metadata from databases, pipelines, BI tools, ML systems, and more, providing a complete data context that enables teams to search, facet, and preview assets across their entire estate. Its API‑ and schema‑first architecture offers extensible metadata entities and relationships, giving organizations precise control and customization over their metadata model. Built with only four core system components, the platform is designed for simple setup, operation, and scalable performance, allowing both technical and non‑technical users to collaborate on discovery, lineage, quality, observability, collaboration, and governance workflows without complex infrastructure.
  • 10
    Mode

    Mode

    Mode Analytics

    Understand how users are interacting with your product and identify opportunity areas to inform your product decisions. Mode empowers one Stitch analyst to do the work of a full data team through speed, flexibility, and collaboration. Build dashboards for annual revenue, then use chart visualizations to identify anomalies quickly. Create polished, investor-ready reports or share analysis with teams for collaboration. Connect your entire tech stack to Mode and identify upstream issues to improve performance. Speed up workflows across teams with APIs and webhooks. Understand how users are interacting with your product and identify opportunity areas to inform your product decisions. Leverage marketing and product data to fix weak spots in your funnel, improve landing-page performance, and understand churn before it happens.
  • 11
    IBM Databand
    Monitor your data health and pipeline performance. Gain unified visibility for pipelines running on cloud-native tools like Apache Airflow, Apache Spark, Snowflake, BigQuery, and Kubernetes. An observability platform purpose built for Data Engineers. Data engineering is only getting more challenging as demands from business stakeholders grow. Databand can help you catch up. More pipelines, more complexity. Data engineers are working with more complex infrastructure than ever and pushing higher speeds of release. It’s harder to understand why a process has failed, why it’s running late, and how changes affect the quality of data outputs. Data consumers are frustrated with inconsistent results, model performance, and delays in data delivery. Not knowing exactly what data is being delivered, or precisely where failures are coming from, leads to persistent lack of trust. Pipeline logs, errors, and data quality metrics are captured and stored in independent, isolated systems.
  • 12
    MaxPatrol

    MaxPatrol

    Positive Technologies

    MaxPatrol is made for managing vulnerabilities and compliance on corporate information systems. Penetration testing, system checks, and compliance monitoring are at the core of MaxPatrol. Together, these mechanisms give an objective picture of the security stance across IT infrastructure as well as granular insight at the department, host, and application level, precisely the information needed to quickly detect vulnerabilities and prevent attacks. MaxPatrol makes it a cinch to keep an up-to-date inventory of IT assets. View information about network resources (network addresses, OS, available network applications and services), identify hardware and software in use, and monitor the state of updates. Best of all, it sees changes to your IT infrastructure. MaxPatrol doesn't blink as new accounts and hosts appear, or as hardware and software are updated. Information about the state of infrastructure security is quietly collected and processed.
  • 13
    lakeFS

    lakeFS

    Treeverse

    lakeFS enables you to manage your data lake the way you manage your code. Run parallel pipelines for experimentation and CI/CD for your data. Simplifying the lives of engineers, data scientists and analysts who are transforming the world with data. lakeFS is an open source platform that delivers resilience and manageability to object-storage based data lakes. With lakeFS you can build repeatable, atomic and versioned data lake operations, from complex ETL jobs to data science and analytics. lakeFS supports AWS S3, Azure Blob Storage and Google Cloud Storage (GCS) as its underlying storage service. It is API compatible with S3 and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc. lakeFS provides a Git-like branching and committing model that scales to exabytes of data by utilizing S3, GCS, or Azure Blob for storage.
  • 14
    Datafold

    Datafold

    Datafold

    Prevent data outages by identifying and fixing data quality issues before they get into production. Go from 0 to 100% test coverage of your data pipelines in a day. Know the impact of each code change with automatic regression testing across billions of rows. Automate change management, improve data literacy, achieve compliance, and reduce incident response time. Don’t let data incidents take you by surprise. Be the first one to know with automated anomaly detection. Datafold’s easily adjustable ML model adapts to seasonality and trend patterns in your data to construct dynamic thresholds. Save hours spent on trying to understand data. Use the Data Catalog to find relevant datasets, fields, and explore distributions easily with an intuitive UI. Get interactive full-text search, data profiling, and consolidation of metadata in one place.
  • 15
    Great Expectations

    Great Expectations

    Great Expectations

    Great Expectations is a shared, open standard for data quality. It helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. We recommend deploying within a virtual environment. If you’re not familiar with pip, virtual environments, notebooks, or git, you may want to check out the Supporting. There are many amazing companies using great expectations these days. Check out some of our case studies with companies that we've worked closely with to understand how they are using great expectations in their data stack. Great expectations cloud is a fully managed SaaS offering. We're taking on new private alpha members for great expectations cloud, a fully managed SaaS offering. Alpha members get first access to new features and input to the roadmap.
  • 16
    Meltano

    Meltano

    Meltano

    Meltano provides the ultimate flexibility in deployment options. Own your data stack, end to end. Ever growing connector library of 300+ connectors have been running in production for years. Run workflows in isolated environments, execute end-to-end tests, and version control everything. Open source gives you the power to build your ideal data stack. Define your entire project as code and collaborate confidently with your team. The Meltano CLI enables you to rapidly create your project, making it easy to start replicating data. Meltano is designed to be the best way to run dbt to manage your transformations. Your entire data stack is defined in your project, making it simple to deploy it to production. Validate your changes in development before moving to CI, and in staging before moving to production.
  • 17
    Metaphor

    Metaphor

    Metaphor Data

    Automatically indexed warehouses, lakes, dashboards, and other pieces of your data stack. Combined with utilization, lineage, and other social popularity signals, Metaphor lets you show the most trusted data to your users. Provide an open 360 view of your data and conversations about data to everyone in the organization. Meet your customers where they are - share artifacts from the catalog including documentation, natively via Slack. Tag insightful Slack conversations and associate them with data. Collaborate across silos by the organic discovery of important terms and usage patterns. Easily discover data across the entire stack, write technical details and Business friendly wiki that is easily consumed by non-technical users. Support your users directly in Slack and use the catalog as a Data Enablement tool to quickly onboard users for a more personalized experience.
  • 18
    rudol

    rudol

    rudol

    Unify your data catalog, reduce communication overhead and enable quality control to any member of your company, all without deploying or installing anything. rudol is a data quality platform that helps companies understand all their data sources, no matter where they come from; reduces excessive communication in reporting processes or urgencies; and enables data quality diagnosing and issue prevention to all the company, through easy steps With rudol, each organization is able to add data sources from a growing list of providers and BI tools with a standardized structure, including MySQL, PostgreSQL, Airflow, Redshift, Snowflake, Kafka, S3*, BigQuery*, MongoDB*, Tableau*, PowerBI*, Looker* (* in development). So, regardless of where it’s coming from, people can understand where and how the data is stored, read and collaborate with its documentation, or easily contact data owners using our integrations.
    Starting Price: $0
  • 19
    Acryl Data

    Acryl Data

    Acryl Data

    No more data catalog ghost towns. Acryl Cloud drives fast time-to-value via Shift Left practices for data producers and an intuitive UI for data consumers. Continuously detect data quality incidents in real-time, automate anomaly detection to prevent breakages, and drive fast resolution when they do occur. Acryl Cloud supports both push-based and pull-based metadata ingestion for easy maintenance, ensuring information is trustworthy, up-to-date, and definitive. Data should be operational. Go beyond simple visibility and use automated Metadata Tests to continuously expose data insights and surface new areas for improvement. Reduce confusion and accelerate resolution with clear asset ownership, automatic detection, streamlined alerts, and time-based lineage for tracing root causes.
  • 20
    Pantomath

    Pantomath

    Pantomath

    Organizations continuously strive to be more data-driven, building dashboards, analytics, and data pipelines across the modern data stack. Unfortunately, most organizations struggle with data reliability issues leading to poor business decisions and lack of trust in data as an organization, directly impacting their bottom line. Resolving complex data issues is a manual and time-consuming process involving multiple teams all relying on tribal knowledge to manually reverse engineer complex data pipelines across different platforms to identify root-cause and understand the impact. Pantomath is a data pipeline observability and traceability platform for automating data operations. It continuously monitors datasets and jobs across the enterprise data ecosystem providing context to complex data pipelines by creating automated cross-platform technical pipeline lineage.
  • 21
    Determined AI

    Determined AI

    Determined AI

    Distributed training without changing your model code, determined takes care of provisioning machines, networking, data loading, and fault tolerance. Our open source deep learning platform enables you to train models in hours and minutes, not days and weeks. Instead of arduous tasks like manual hyperparameter tuning, re-running faulty jobs, and worrying about hardware resources. Our distributed training implementation outperforms the industry standard, requires no code changes, and is fully integrated with our state-of-the-art training platform. With built-in experiment tracking and visualization, Determined records metrics automatically, makes your ML projects reproducible and allows your team to collaborate more easily. Your researchers will be able to build on the progress of their team and innovate in their domain, instead of fretting over errors and infrastructure.
  • 22
    Azure Marketplace
    Azure Marketplace is a comprehensive online store that provides access to thousands of certified, ready-to-use software applications, services, and solutions from Microsoft and third-party vendors. It enables businesses to discover, purchase, and deploy software directly within the Azure cloud environment. The marketplace offers a wide range of products, including virtual machine images, AI and machine learning models, developer tools, security solutions, and industry-specific applications. With flexible pricing options like pay-as-you-go, free trials, and subscription models, Azure Marketplace simplifies the procurement process and centralizes billing through a single Azure invoice. It supports seamless integration with Azure services, enabling organizations to enhance their cloud infrastructure, streamline workflows, and accelerate digital transformation initiatives.
  • 23
    SDF

    SDF

    SDF

    SDF is a developer platform for data that enhances SQL comprehension across organizations, enabling data teams to unlock the full potential of their data. It provides a transformation layer to streamline query writing and management, an analytical database engine for local execution, and an accelerator for improved transformation processes. SDF also offers proactive quality and governance features, including reports, contracts, and impact analysis, to ensure data integrity and compliance. By representing business logic as code, SDF facilitates the classification and management of data types, enhancing the clarity and maintainability of data models. It integrates seamlessly with existing data workflows, supporting various SQL dialects and cloud environments, and is designed to scale with the growing needs of data teams. SDF's open-core architecture, built on Apache DataFusion, allows for customization and extension, fostering a collaborative ecosystem for data development.
  • 24
    Cake AI

    Cake AI

    Cake AI

    Cake AI is a comprehensive AI infrastructure platform that enables teams to build and deploy AI applications using hundreds of pre-integrated open source components, offering complete visibility and control. It provides a curated, end-to-end selection of fully managed, best-in-class commercial and open source AI tools, with pre-built integrations across the full breadth of components needed to move an AI application into production. Cake supports dynamic autoscaling, comprehensive security measures including role-based access control and encryption, advanced monitoring, and infrastructure flexibility across various environments, including Kubernetes clusters and cloud services such as AWS. Its data layer equips teams with tools for data ingestion, transformation, and analytics, leveraging tools like Airflow, DBT, Prefect, Metabase, and Superset. For AI operations, Cake integrates with model catalogs like Hugging Face and supports modular workflows using LangChain, LlamaIndex, and more.
  • 25
    Soda

    Soda

    Soda

    Soda drives your data operations by identifying data issues, alerting the right people, and helping teams diagnose and resolve root causes. With automated and self-serve data monitoring capabilities, no data—or people—are ever left in the dark. Get ahead of data issues quickly by delivering full observability through easy instrumentation across your data workloads. Empower data teams to discover data issues that automation will miss. Self-service capabilities deliver the broad coverage that data monitoring needs. Alert the right people at the right time to help teams across the business diagnose, prioritize, and fix data issues. With Soda, your data never leaves your private cloud. Soda monitors data at the source and only stores metadata in your cloud.