Alternatives to Apache Airflow
Compare Apache Airflow alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Apache Airflow in 2024. Compare features, ratings, user reviews, pricing, and more from Apache Airflow competitors and alternatives in order to make an informed decision for your business.
-
1
JS7 JobScheduler
SOS GmbH
JS7 JobScheduler is an Open Source workload automation system designed for performance, resilience and security. It provides unlimited performance for parallel execution of jobs and workflows. JS7 offers cross-platform job execution, managed file transfer, complex no-code job dependencies and a real REST API. Platforms - Cloud scheduling from Containers for Docker®, Kubernetes®, OpenShift® etc. - True multi-platform scheduling on premises for Windows®, Linux®, AIX®, Solaris®, macOS® etc. - Hybrid use for cloud and on premises User Interface - Modern, no-code GUI for inventory management, monitoring and control with web browsers - Near real-time information brings immediate visibility of status changes and log output of jobs and workflows - Multi-client capability, role based access management High Availability - Redundancy and Resilience based on asynchronous design and autonomous Agents - Clustering for all JS7 products, automatic fail-over and manual switch-over -
2
Stonebranch
Stonebranch
Universal Automation Center (UAC) is a real-time IT automation platform designed to centrally manage and orchestrate tasks and processes across hybrid IT environments - from on-prem to the cloud. Universal Automation Center (UAC) is a software platform designed to automate and orchestrate your IT and business processes, securely manage file transfers, and centralize the management of disparate IT job scheduling and workload automation solutions. With our event-driven automation technology, it is now possible to achieve real-time automation across your entire hybrid IT environment. Real-time hybrid IT automation and managed file transfers (MFT) for any type of cloud, mainframe, distributed or hybrid environment. Start automating, managing and orchestrating file transfers from mainframe or disparate systems to the AWS or Azure cloud and vice versa with no ramp-up time or cost-intensive hardware investments. -
3
ActiveBatch Workload Automation
ActiveBatch by Redwood
ActiveBatch by Redwood makes setting up and launching automation easy with no custom scripting required. With a low-code Super REST API adapter, over 100 pre-built job steps and a user-friendly drag-and-drop workflow designer, you can integrate across any system, application and data source, on-prem, in the cloud or in hybrid environments. Maintain complete control and visibility and meet SLAs with monitoring of all automation from a single pane of glass and get custom alerts via emails or SMS. Managed Smart Queues dynamically scale resources for high-volume workloads, reducing process times while the self-service portal enables business users to run and monitor workflows independently. ActiveBatch meets security and compliance standards, with ISO 27001 and SOC 2, Type II certifications, encrypted connections and regular third-party tests, always keeping security at the forefront. Along with ongoing product advancements, get the added benefit of 24x7 support and on-site training. -
4
Union Cloud
Union.ai
Union.ai is an award-winning, Flyte-based data and ML orchestrator for scalable, reproducible ML pipelines. With Union.ai, you can write your code locally and easily deploy pipelines to remote Kubernetes clusters. “Flyte’s scalability, data lineage, and caching capabilities enable us to train hundreds of models on petabytes of geospatial data, giving us an edge in our business.” — Arno, CTO at Blackshark.ai “With Flyte, we want to give the power back to biologists. We want to stand up something that they can play around with different parameters for their models because not every … parameter is fixed. We want to make sure we are giving them the power to run the analyses.” — Krishna Yeramsetty, Principal Data Scientist at Infinome “Flyte plays a vital role as a key component of Gojek's ML Platform by providing exactly that." — Pradithya Aria Pura, Principal Engineer at GojStarting Price: Free (Flyte) -
5
Minitab Connect
Minitab
The best insights are based on the most complete, most accurate, and most timely data. Minitab Connect empowers data users from across the enterprise with self-serve tools to transform diverse data into a governed network of data pipelines, feed analytics initiatives and foster organization-wide collaboration. Users can effortlessly blend and explore data from databases, cloud and on-premise apps, unstructured data, spreadsheets, and more. Flexible, automated workflows accelerate every step of the data integration process, while powerful data preparation and visualization tools help yield transformative insights. Flexible, intuitive data integration tools let users connect and blend data from a variety of internal and external sources, like data warehouses, data lakes, IoT devices, SaaS applications, cloud storage, spreadsheets, and email. -
6
Rivery
Rivery
Rivery’s SaaS ETL platform provides a fully-managed solution for data ingestion, transformation, orchestration, reverse ETL and more, with built-in support for your development and deployment lifecycles. Key Features: Data Workflow Templates: Extensive library of pre-built templates that enable teams to instantly create powerful data pipelines with the click of a button. Fully managed: No-code, auto-scalable, and hassle-free platform. Rivery takes care of the back end, allowing teams to spend time on priorities rather than maintenance. Multiple Environments: Construct and clone custom environments for specific teams or projects. Reverse ETL: Automatically send data from cloud warehouses to business applications, marketing clouds, CPD’s, and more.Starting Price: $0.75 Per Credit -
7
ZenML
ZenML
Simplify your MLOps pipelines. Manage, deploy, and scale on any infrastructure with ZenML. ZenML is completely free and open-source. See the magic with just two simple commands. Set up ZenML in a matter of minutes, and start with all the tools you already use. ZenML standard interfaces ensure that your tools work together seamlessly. Gradually scale up your MLOps stack by switching out components whenever your training or deployment requirements change. Keep up with the latest changes in the MLOps world and easily integrate any new developments. Define simple and clear ML workflows without wasting time on boilerplate tooling or infrastructure code. Write portable ML code and switch from experimentation to production in seconds. Manage all your favorite MLOps tools in one place with ZenML's plug-and-play integrations. Prevent vendor lock-in by writing extensible, tooling-agnostic, and infrastructure-agnostic code.Starting Price: Free -
8
Activiti
Activiti
Helping businesses solve automation challenges in distributed, highly-scalable and cost effective infrastructures. Activiti is the leading lightweight, java-centric open-source BPMN engine supporting real-world process automation needs. Activiti Cloud is now the new generation of business automation platform offering a set of cloud native building blocks designed to run on distributed infrastructures. Inmutable, scalable & pain free Process & Decision Runtimes designed to integrate with your cloud native infrastructure. Scalable, storage independent and extensible audit service. Scalable, storage independent and extensible query service. Simplified system to system interactions that can scale in distributed environments. Distributed & Scalable application aggregation layer. Cloud ready secure WebSocket and subscription handling as part of GraphQL integration. -
9
Argo
Argo
Open-source tools for Kubernetes to run workflows, manage clusters and do GitOps right. Kubernetes-native workflow engine supporting DAG and step-based workflows. Declarative continuous delivery with a fully-loaded UI. Advanced Kubernetes deployment strategies such as Canary and Blue-Green made easy. Argo Workflows is an open-source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD. Model multi-step workflows as a sequence of tasks or capture the dependencies between tasks using a graph (DAG). Easily run compute-intensive jobs for machine learning or data processing in a fraction of the time using Argo Workflows on Kubernetes. Run CI/CD pipelines natively on Kubernetes without configuring complex software development products. Designed from the ground up for containers without the overhead and limitations of legacy VM and server-based environments. -
10
Airbyte
Airbyte
Get all your ELT data pipelines running in minutes, even your custom ones. Let your team focus on insights and innovation. Unify your data integration pipelines in one open-source ELT platform. Airbyte addresses all your data team's connector needs, however custom they are and whatever your scale. The data integration platform that can scale with your custom or high-volume needs. From high-volume databases to the long tail of API sources. Leverage Airbyte’s long tail of high-quality connectors that adapt to schema and API changes. Extensible to unify all native & custom ELT. Edit pre-built open-source connectors, or build new ones with our connector development kit in a few hours. Transparent and scalable pricing. Finally, a transparent and predictable cost-based pricing that scales with your data needs. You don’t need to worry about volume anymore. No more need for custom systems for your in-house scripts or database replication.Starting Price: $2.50 per credit -
11
Beamer
Beamer
Update and engage users effortlessly. Announce your latest updates and get powerful feedback with an in-app notification center, widgets and changelog. Install in-app or on your website so users can get announcements in context. Public page with your own domain, custom appearance and SEO optimization. Share your important news and updates Create and schedule posts to keep your users and site visitors in the know. Use visual content like images, videos and gifs to get even more engagement. Use segmentation to send targeted notifications Create custom segments by industry, product, role, location, language, behavior and more. Send more relevant notifications and get better results. Use push notifications to bring users back Send web push notifications to users or website visitors to make sure they get your announcements - even if they aren’t on your site. Get feedback on your latest updates and news.Starting Price: $49 per month -
12
Apache Gobblin
Apache Software Foundation
A distributed data integration framework that simplifies common aspects of Big Data integration such as data ingestion, replication, organization, and lifecycle management for both streaming and batch data ecosystems. Runs as a standalone application on a single box. Also supports embedded mode. Runs as an mapreduce application on multiple Hadoop versions. Also supports Azkaban for launching mapreduce jobs. Runs as a standalone cluster with primary and worker nodes. This mode supports high availability and can run on bare metals as well. Runs as an elastic cluster on public cloud. This mode supports high availability. Gobblin as it exists today is a framework that can be used to build different data integration applications like ingest, replication, etc. Each of these applications is typically configured as a separate job and executed through a scheduler like Azkaban. -
13
Apache Spark
Apache Software Foundation
Apache Spark™ is a unified analytics engine for large-scale data processing. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources. -
14
dbt
dbt Labs
Version control, quality assurance, documentation and modularity allow data teams to collaborate like software engineering teams. Analytics errors should be treated with the same level of urgency as bugs in a production product. Much of an analytic workflow is manual. We believe workflows should be built to execute with a single command. Data teams use dbt to codify business logic and make it accessible to the entire organization—for use in reporting, ML modeling, and operational workflows. Built-in CI/CD ensures that changes to data models move appropriately through development, staging, and production environments. dbt Cloud also provides guaranteed uptime and custom SLAs.Starting Price: $50 per user per month -
15
AWS Glue
Amazon
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months. Data integration is the process of preparing and combining data for analytics, machine learning, and application development. It involves multiple tasks, such as discovering and extracting data from various sources; enriching, cleaning, normalizing, and combining data; and loading and organizing data in databases, data warehouses, and data lakes. These tasks are often handled by different types of users that each use different products. AWS Glue runs in a serverless environment. There is no infrastructure to manage, and AWS Glue provisions, configures, and scales the resources required to run your data integration jobs. -
16
AWS Step Functions
Amazon
AWS Step Functions is a serverless function orchestrator that makes it easy to sequence AWS Lambda functions and multiple AWS services into business-critical applications. Through its visual interface, you can create and run a series of checkpointed and event-driven workflows that maintain the application state. The output of one step acts as an input to the next. Each step in your application executes in order, as defined by your business logic. Orchestrating a series of individual serverless applications, managing retries, and debugging failures can be challenging. As your distributed applications become more complex, the complexity of managing them also grows. With its built-in operational controls, Step Functions manages sequencing, error handling, retry logic, and state, removing a significant operational burden from your team. AWS Step Functions lets you build visual workflows that enable fast translation of business requirements into technical requirements.Starting Price: $0.000025 -
17
Mage
Mage
Mage is a tool that transforms your data into predictions. Build, train, and deploy predictive models in minutes. No AI experience required. Increase user engagement by ranking content on your user’s home feed. Increase conversion by showing the most relevant products for a user to buy. Increase retention by predicting which users will stop using your app. Increase conversion by matching users in a marketplace. Data is the most important part in building AI. Mage will guide you through this process with suggestions on how to improve your data, making you an AI expert. AI and its predictions are difficult to understand. Mage explains every metric in-depth, teaching you how your AI model thinks. Get real-time predictions with a few lines of code. Mage makes it easy for you to integrate your AI model in any application.Starting Price: Free -
18
Kedro
Kedro
Kedro is the foundation for clean data science code. It borrows concepts from software engineering and applies them to machine-learning projects. A Kedro project provides scaffolding for complex data and machine-learning pipelines. You spend less time on tedious "plumbing" and focus instead on solving new problems. Kedro standardizes how data science code is created and ensures teams collaborate to solve problems easily. Make a seamless transition from development to production with exploratory code that you can transition to reproducible, maintainable, and modular experiments. A series of lightweight data connectors is used to save and load data across many different file formats and file systems.Starting Price: Free -
19
Kestra
Kestra
Kestra is an open-source, event-driven orchestrator that simplifies data operations and improves collaboration between engineers and business users. By bringing Infrastructure as Code best practices to data pipelines, Kestra allows you to build reliable workflows and manage them with confidence. Thanks to the declarative YAML interface for defining orchestration logic, everyone who benefits from analytics can participate in the data pipeline creation process. The UI automatically adjusts the YAML definition any time you make changes to a workflow from the UI or via an API call. Therefore, the orchestration logic is defined declaratively in code, even if some workflow components are modified in other ways. -
20
KNIME Analytics Platform
KNIME.COM
One enterprise-grade software platform, two complementary tools. Open source KNIME Analytics Platform for creating data science and commercial KNIME Server for productionizing data science. KNIME Analytics Platform is the open source software for creating data science. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone. KNIME Server is the enterprise software for team-based collaboration, automation, management, and deployment of data science workflows as analytical applications and services. Non experts are given access to data science via KNIME WebPortal or can use REST APIs. Do even more with your data using extensions for KNIME Analytics Platform. Some are developed and maintained by us at KNIME, others by the community and our trusted partners. We also have integrations with many open source projects. -
21
Lyftrondata
Lyftrondata
Whether you want to build a governed delta lake, data warehouse, or simply want to migrate from your traditional database to a modern cloud data warehouse, do it all with Lyftrondata. Simply create and manage all of your data workloads on one platform by automatically building your pipeline and warehouse. Analyze it instantly with ANSI SQL, BI/ML tools, and share it without worrying about writing any custom code. Boost the productivity of your data professionals and shorten your time to value. Define, categorize, and find all data sets in one place. Share these data sets with other experts with zero codings and drive data-driven insights. This data sharing ability is perfect for companies that want to store their data once, share it with other experts, and use it multiple times, now and in the future. Define dataset, apply SQL transformations or simply migrate your SQL data processing logic to any cloud data warehouse. -
22
Dagster+
Dagster Labs
Dagster is a next-generation orchestration platform for the development, production, and observation of data assets. Unlike other data orchestration solutions, Dagster provides you with an end-to-end development lifecycle. Dagster gives you control over your disparate data tools and empowers you to build, test, deploy, run, and iterate on your data pipelines. It makes you and your data teams more productive, your operations more robust, and puts you in complete control of your data processes as you scale. Dagster brings a declarative approach to the engineering of data pipelines. Your team defines the data assets required, quickly assessing their status and resolving any discrepancies. An assets-based model is clearer than a tasks-based one and becomes a unifying abstraction across the whole workflow.Starting Price: $0 -
23
IBM Databand
IBM
Monitor your data health and pipeline performance. Gain unified visibility for pipelines running on cloud-native tools like Apache Airflow, Apache Spark, Snowflake, BigQuery, and Kubernetes. An observability platform purpose built for Data Engineers. Data engineering is only getting more challenging as demands from business stakeholders grow. Databand can help you catch up. More pipelines, more complexity. Data engineers are working with more complex infrastructure than ever and pushing higher speeds of release. It’s harder to understand why a process has failed, why it’s running late, and how changes affect the quality of data outputs. Data consumers are frustrated with inconsistent results, model performance, and delays in data delivery. Not knowing exactly what data is being delivered, or precisely where failures are coming from, leads to persistent lack of trust. Pipeline logs, errors, and data quality metrics are captured and stored in independent, isolated systems. -
24
Dataplane
Dataplane
The concept behind Dataplane is to make it quicker and easier to construct a data mesh with robust data pipelines and automated workflows for businesses and teams of all sizes. In addition to being more user friendly, there has been an emphasis on scaling, resilience, performance and security.Starting Price: Free -
25
Alooma
Google
Alooma enables data teams to have visibility and control. It brings data from your various data silos together into BigQuery, all in real time. Set up and flow data in minutes or customize, enrich, and transform data on the stream before it even hits the data warehouse. Never lose an event. Alooma's built in safety nets ensure easy error handling without pausing your pipeline. Any number of data sources, from low to high volume, Alooma’s infrastructure scales to your needs. -
26
Apache Flink
Apache Software Foundation
Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Any kind of data is produced as a stream of events. Credit card transactions, sensor measurements, machine logs, or user interactions on a website or mobile application, all of these data are generated as a stream. Apache Flink excels at processing unbounded and bounded data sets. Precise control of time and state enable Flink’s runtime to run any kind of application on unbounded streams. Bounded streams are internally processed by algorithms and data structures that are specifically designed for fixed sized data sets, yielding excellent performance. Flink is designed to work well each of the previously listed resource managers. -
27
Flowable
Flowable
Grow your company and attract new customers through outstanding customer experience and operational excellence. In today’s competitive environment leading organizations around the world are using Intelligent Business Automation solutions from Flowable to change the way they do business. Driving Customer Retention and Acquisition by delivering outstanding customer experience. Increasing Operational Excellence by driving business efficiencies and reducing working costs. Delivering increased Business Agility to adapt to changing market conditions. Enforcing Business Compliance to ensure business continuity. Flowable’s conversational engagement capabilities let you deliver a compelling mix of automated and personal service via popular chat platforms such as WhatsApp – even in highly-regulated industries. Flowable is lightning fast, with many years of real-world use. It has full support for process, case and decision modeling, and easily handles complex case management scenarios. -
28
Flyte
Union.ai
The workflow automation platform for complex, mission-critical data and ML processes at scale. Flyte makes it easy to create concurrent, scalable, and maintainable workflows for machine learning and data processing. Flyte is used in production at Lyft, Spotify, Freenome, and others. At Lyft, Flyte has been serving production model training and data processing for over four years, becoming the de-facto platform for teams like pricing, locations, ETA, mapping, autonomous, and more. In fact, Flyte manages over 10,000 unique workflows at Lyft, totaling over 1,000,000 executions every month, 20 million tasks, and 40 million containers. Flyte has been battle-tested at Lyft, Spotify, Freenome, and others. It is entirely open-source with an Apache 2.0 license under the Linux Foundation with a cross-industry overseeing committee. Configuring machine learning and data workflows can get complex and error-prone with YAML.Starting Price: Free -
29
Hevo
Hevo Data
Hevo Data is a no-code, bi-directional data pipeline platform specially built for modern ETL, ELT, and Reverse ETL Needs. It helps data teams streamline and automate org-wide data flows that result in a saving of ~10 hours of engineering time/week and 10x faster reporting, analytics, and decision making. The platform supports 100+ ready-to-use integrations across Databases, SaaS Applications, Cloud Storage, SDKs, and Streaming Services. Over 500 data-driven companies spread across 35+ countries trust Hevo for their data integration needs. Try Hevo today and get your fully managed data pipelines up and running in just a few minutes.Starting Price: $249/month -
30
n8n
n8n
Build complex automations 10x faster, without fighting APIs. Your days spent slogging through a spaghetti of scripts are over. Use JavaScript when you need flexibility and UI for everything else. n8n allows you to build flexible workflows focused on deep data integration. And with sharable templates and a user-friendly UI, the less technical people on your team can collaborate on them too. Unlike other tools, complexity is not a limitation. So you can build whatever you want — without stressing over budget. Connect APIs with no code to automate basic tasks. Or write vanilla Javascript when you need to manipulate complex data. You can implement multiple triggers. Branch and merge your workflows. And even pause flows to wait for external events. Interface easily with any API or service with custom HTTP requests. Avoid breaking live workflows by separating dev and prod environments with unique sets of auth data.Starting Price: $20 per month -
31
Meltano
Meltano
Meltano provides the ultimate flexibility in deployment options. Own your data stack, end to end. Ever growing connector library of 300+ connectors have been running in production for years. Run workflows in isolated environments, execute end-to-end tests, and version control everything. Open source gives you the power to build your ideal data stack. Define your entire project as code and collaborate confidently with your team. The Meltano CLI enables you to rapidly create your project, making it easy to start replicating data. Meltano is designed to be the best way to run dbt to manage your transformations. Your entire data stack is defined in your project, making it simple to deploy it to production. Validate your changes in development before moving to CI, and in staging before moving to production. -
32
StackStorm
StackStorm
StackStorm connects all your apps, services, and workflows. From simple if/then rules to complicated workflows, StackStorm lets you automate DevOps your way. No need to change your existing processes or workflows, StackStorm connects what you already have. Community is what makes a good product great. StackStorm is used by a lot of people around the world, and you can always count on getting answers to your questions. Stackstorm can be used to automate and streamline nearly any part of your business. Here are some of the most common applications. When failures happen, StackStorm can act as Tier 1 support: It troubleshoots, fixes known problems, and escalates to humans when needed. Continuous deployment can get complex, beyond Jenkins or other specialized opinionated tools. Automate advanced CI/CD pipelines your way. ChatOps brings automation and collaboration together; transforming devops teams to get things done better, faster, and with style. -
33
SnapLogic
SnapLogic
Quickly ramp up, learn and use SnapLogic to create, multi-point, enterprise- wide app and data integrations. Easily expose and manage pipeline APIs that extend your world. Eliminate slower, manual, error-prone methods and deliver faster results for business processes such as customer onboarding, employee onboarding and off-boarding, quote to cash, ERP SKU forecasting, support ticket creation, and more. Monitor, manage, secure, and govern your data pipelines, application integrations, and API calls––all from a single pane of glass. Launch automated workflows for any department, across your enterprise, in minutes – not days. To deliver superior employee experiences, the SnapLogic platform can bring together employee data across all your enterprise HR apps and data stores. Learn how SnapLogic can help you quickly set up seamless experiences powered by automated processes. -
34
Oracle Data Integrator
Oracle
Oracle Data Integrator is a comprehensive data integration platform that covers all data integration requirements: from high-volume, high-performance batch loads, to event-driven, trickle-feed integration processes, to SOA-enabled data services. Oracle Data Integrator (ODI) 12c, the latest version of Oracle’s strategic Data Integration offering, provides superior developer productivity and improved user experience with a redesigned flow-based declarative user interface and deeper integration with Oracle GoldenGate. ODI12c further builds on its flexible and high-performance architecture with comprehensive big data support and added parallelism when executing data integration processes. It includes interoperability with Oracle Warehouse Builder (OWB) for a quick and simple migration for OWB customers to ODI12c. Additionally, ODI can be monitored from a single solution along with other Oracle technologies and applications through the integration with Oracle Enterprise Manager 12c. -
35
Prefect
Prefect
Prefect Cloud is a command center for your workflows. Deploy from Prefect core and instantly gain complete oversight and control. Cloud's beautiful UI lets you keep an eye on the health of your infrastructure. Stream realtime state updates and logs, kick off new runs, and receive critical information exactly when you need it. With Prefect's Hybrid Model, your code and data remain on-prem while Prefect Cloud's managed orchestration keeps everything running smoothly. The Cloud scheduler service runs asynchronously to ensure your runs start on time, every time. Advanced scheduling options allow for scheduled parameter value changes as well as the execution environment for each run! Configure custom notifications and actions when your workflows change state. Monitor the health of all agents connected to your cloud instance and receive custom alerts when an agent goes offline.Starting Price: $0.0025 per successful task -
36
Stitch
Talend
Stitch is a cloud-based platform for ETL – extract, transform, and load. More than a thousand companies use Stitch to move billions of records every day from SaaS applications and databases into data warehouses and data lakes. -
37
StreamSets
StreamSets
StreamSets DataOps Platform. The data integration platform to build, run, monitor and manage smart data pipelines that deliver continuous data for DataOps, and power modern analytics and hybrid integration. Only StreamSets provides a single design experience for all design patterns for 10x greater developer productivity; smart data pipelines that are resilient to change for 80% less breakages; and a single pane of glass for managing and monitoring all pipelines across hybrid and cloud architectures to eliminate blind spots and control gaps. With StreamSets, you can deliver the continuous data that drives the connected enterprise.Starting Price: $1000 per month -
38
Amazon MWAA
Amazon
Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Airflow that makes it easier to set up and operate end-to-end data pipelines in the cloud at scale. Apache Airflow is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as “workflows.” With Managed Workflows, you can use Airflow and Python to create workflows without having to manage the underlying infrastructure for scalability, availability, and security. Managed Workflows automatically scales its workflow execution capacity to meet your needs, and is integrated with AWS security services to help provide you with fast and secure access to data.Starting Price: $0.49 per hour -
39
Astera Centerprise
Astera
Astera Centerprise is a complete on-premise data integration solution that helps extract, transform, profile, cleanse, and integrate data from disparate sources in a code-free, drag-and-drop environment. The software is designed to cater to enterprise-level data integration needs and is used by Fortune 500 companies, like Wells Fargo, Xerox, HP, and more. Through process orchestration, workflow automation, job scheduling, instant data preview, and more, enterprises can easily get accurate, consolidated data for their day-to-day decision making at the speed of business. -
40
Google Cloud Composer
Google
Cloud Composer's managed nature and Apache Airflow compatibility allows you to focus on authoring, scheduling, and monitoring your workflows as opposed to provisioning resources. End-to-end integration with Google Cloud products including BigQuery, Dataflow, Dataproc, Datastore, Cloud Storage, Pub/Sub, and AI Platform gives users the freedom to fully orchestrate their pipeline. Author, schedule, and monitor your workflows through a single orchestration tool—whether your pipeline lives on-premises, in multiple clouds, or fully within Google Cloud. Ease your transition to the cloud or maintain a hybrid data environment by orchestrating workflows that cross between on-premises and the public cloud. Create workflows that connect data, processing, and services across clouds to give you a unified data environment.Starting Price: $0.074 per vCPU hour -
41
Chalk
Chalk
Powerful data engineering workflows, without the infrastructure headaches. Complex streaming, scheduling, and data backfill pipelines, are all defined in simple, composable Python. Make ETL a thing of the past, fetch all of your data in real-time, no matter how complex. Incorporate deep learning and LLMs into decisions alongside structured business data. Make better predictions with fresher data, don’t pay vendors to pre-fetch data you don’t use, and query data just in time for online predictions. Experiment in Jupyter, then deploy to production. Prevent train-serve skew and create new data workflows in milliseconds. Instantly monitor all of your data workflows in real-time; track usage, and data quality effortlessly. Know everything you computed and data replay anything. Integrate with the tools you already use and deploy to your own infrastructure. Decide and enforce withdrawal limits with custom hold times.Starting Price: Free -
42
Activeeon ProActive
Activeeon
The solution provided by Activeeon is suited to fit modern challenges such as the growth of data, new infrastructures, cloud strategy evolving, new application architecture, etc. It provides orchestration and scheduling to automate and build a solid base for future growth. ProActive Workflows & Scheduling is a java-based cross-platform workflow scheduler and resource manager that is able to run workflow tasks in multiple languages and multiple environments (Windows, Linux, Mac, Unix, etc). ProActive Resource Manager makes compute resources available for task execution. It handles on-premises and cloud compute resources in an elastic, on-demand and distributed fashion. ProActive AI Orchestration from Activeeon empowers data engineers and data scientists with a simple, portable and scalable solution for machine learning pipelines. It provides pre-built and customizable tasks that enable automation within the machine learning lifecycle, which helps data scientists and IT Operations work.Starting Price: $10,000 -
43
Yandex Data Proc
Yandex
You select the size of the cluster, node capacity, and a set of services, and Yandex Data Proc automatically creates and configures Spark and Hadoop clusters and other components. Collaborate by using Zeppelin notebooks and other web apps via a UI proxy. You get full control of your cluster with root permissions for each VM. Install your own applications and libraries on running clusters without having to restart them. Yandex Data Proc uses instance groups to automatically increase or decrease computing resources of compute subclusters based on CPU usage indicators. Data Proc allows you to create managed Hive clusters, which can reduce the probability of failures and losses caused by metadata unavailability. Save time on building ETL pipelines and pipelines for training and developing models, as well as describing other iterative tasks. The Data Proc operator is already built into Apache Airflow.Starting Price: $0.19 per hour -
44
Automate Schedule
Fortra
Powerful workload automation for centralized Linux job scheduling. When you’re able to automate all your workflows across your Windows, UNIX, Linux, and IBM i systems with a job scheduler, your IT team has more time to tackle more strategic projects that impact the bottom line. Bring isolated job schedules from cron or Windows Task Scheduler enterprise-wide. When your job scheduler integrates with your other key software applications, it’s easier to see the whole picture, leverage data across the organization, and unify your job schedules. Be more efficient so you can meet your workload automation goals. Automated job scheduling makes your life easier and transforms the way you do business. Build dynamic, event-driven job schedules across servers and take dependencies into account—supporting your business goals with better workflows. Automate Schedule offers high availability for a master server and a standby server so if an outage were to occur, important tasks would continue. -
45
Argent
Argent Software
Argent Guardian® Ultra is the world's most scalable monitoring solution for all Windows, Linux, UNIX (AIX, HP-UX, SCO, Solaris), and iSeries Servers. Using a patented agent-optional architecture, Argent Guardian® Ultra monitors servers with or without installing agents, providing the power and flexibility to define the monitoring architecture to match customers' exact needs. The days of manually scheduling and managing batch processes are over. Business process automation lowers overall IT costs, ensures application efficiency, enhances IT service and assists with compliance requirements. Argent Job Scheduler and Argent Queue Engine automate business processes, alert customers via Argent Console when issues occur and provide Service Level Agreements so that management receives the Business View of IT. Argent Job Scheduler provides a single point of control across all operating systems, applications and databases for Windows, Linux, Solaris, HP-UX, AIX, SCO and iSeries Servers. -
46
cron-job.org
cron-job.org
Cron-job.org offers free, reliable scheduling for websites and scripts, allowing tasks to be executed as frequently as every minute or as infrequently as once a year. It provides versatile scheduling options, execution predictions, and a detailed execution history with response data. Users can configure custom HTTP requests, run test jobs for verification, and receive status notifications for failures and recoveries. The platform supports multi-factor authentication for enhanced security and offers a REST API for managing cron jobs programmatically. Powered by 100% CO₂-neutral hydropower and open-source, cron-job.org has been a trusted service for over 15 years, executing millions of tasks daily. We do not share user data with third parties, including your email address, your name, or other information. Every cronjob can be executed up to 60 times an hour, i.e. every minute.Starting Price: Free -
47
JAMS
Fortra
JAMS is a centralized workload automation and job scheduling solution that runs, monitors, and manages jobs and workflows that support critical business processes. JAMS is enterprise job scheduling software that automates IT processes, from simple batch processes to scripts to complex cross-platform workflows. JAMS integrates with various technologies throughout your enterprise to provide seamless, unattended job execution, allocating resources to run jobs in a sequence, at a specified time, or based on a trigger. JAMS job scheduler lets you define, manage, and monitor critical batch processes through one centralized console. From executing simple command lines to running multi-step tasks that leverage ERPs, databases, and BI tools – JAMS orchestrates your business’s entire schedule. And it’s easy to migrate tasks over from Windows Task Scheduler, SQL Agent, or Cron with built-in conversion utilities so you can keep jobs running without heavy lifting. -
48
Ctfreak
JYP Software
Tired of maintaining your multiple crontabs? Would you like a Slack notification when one of your backups fails? Ctfreak allows you to centralize and schedule various types of tasks: - Shell scripts (bash/powershell) on multiple servers concurrently via SSH - SQL scripts on multiple databases concurrently (mysql/mariadb/postgresql) - Chart Reports from SQL queries - Webhook call - Workflow for concurrent or sequential execution of tasks Not to mention: - A mobile friendly interface - Single Sign-On via OpenID Connect - Notifications via Slack / Discord / Mattermost / Email - REST API - Incoming webhooks (Github / Gitlab / ...) - Log retrieval and consultation - User rights management by projectStarting Price: $109/year/instance -
49
Nextflow
Seqera Labs
Data-driven computational pipelines. Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages. Its fluent DSL simplifies the implementation and deployment of complex parallel and reactive workflows on clouds and clusters. Nextflow is built around the idea that Linux is the lingua franca of data science. Nextflow allows you to write a computational pipeline by making it simpler to put together many different tasks. You may reuse your existing scripts and tools and you don't need to learn a new language or API to start using it. Nextflow supports Docker and Singularity containers technology. This, along with the integration of the GitHub code-sharing platform, allows you to write self-contained pipelines, manage versions, and rapidly reproduce any former configuration. Nextflow provides an abstraction layer between your pipeline's logic and the execution layer.Starting Price: Free -
50
Astro
Astronomer
For data teams looking to increase the availability of trusted data, Astronomer provides Astro, a modern data orchestration platform, powered by Apache Airflow, that enables the entire data team to build, run, and observe data pipelines-as-code. Astronomer is the commercial developer of Airflow, the de facto standard for expressing data flows as code, used by hundreds of thousands of teams across the world.