Compare the Top Data Pipeline Software in the UK as of November 2024 - Page 3

  • 1
    Google Cloud Composer
    Cloud Composer's managed nature and Apache Airflow compatibility allows you to focus on authoring, scheduling, and monitoring your workflows as opposed to provisioning resources. End-to-end integration with Google Cloud products including BigQuery, Dataflow, Dataproc, Datastore, Cloud Storage, Pub/Sub, and AI Platform gives users the freedom to fully orchestrate their pipeline. Author, schedule, and monitor your workflows through a single orchestration tool—whether your pipeline lives on-premises, in multiple clouds, or fully within Google Cloud. Ease your transition to the cloud or maintain a hybrid data environment by orchestrating workflows that cross between on-premises and the public cloud. Create workflows that connect data, processing, and services across clouds to give you a unified data environment.
    Starting Price: $0.074 per vCPU hour
  • 2
    Amazon MWAA
    Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Airflow that makes it easier to set up and operate end-to-end data pipelines in the cloud at scale. Apache Airflow is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as “workflows.” With Managed Workflows, you can use Airflow and Python to create workflows without having to manage the underlying infrastructure for scalability, availability, and security. Managed Workflows automatically scales its workflow execution capacity to meet your needs, and is integrated with AWS security services to help provide you with fast and secure access to data.
    Starting Price: $0.49 per hour
  • 3
    Stripe Data Pipeline
    Stripe Data Pipeline sends all your up-to-date Stripe data and reports to Snowflake or Amazon Redshift in a few clicks. Centralize your Stripe data with other business data to close your books faster and unlock richer business insights. Set up Stripe Data Pipeline in minutes and automatically receive your Stripe data and reports in your data warehouse on an ongoing basis–no code required. Create a single source of truth to speed up your financial close and access better insights. Identify your best-performing payment methods, analyze fraud by location, and more. Send your Stripe data directly to your data warehouse without involving a third-party extract, transform, and load (ETL) pipeline. Offload ongoing maintenance with a pipeline that’s built into Stripe. No matter how much data you have, your data is always complete and accurate. Automate data delivery at scale, minimize security risks, and avoid data outages and delays.
    Starting Price: 3¢ per transaction
  • 4
    Oarkflow

    Oarkflow

    Oarkflow

    Automate your business pipeline with our flow builder. Use operations that matters to you. Bring your own service providers for email, sms and http services. Use our advanced query builder to query and analyze csv with any field numbers and rows. We store the csv files you've uploaded on our platform in a secured vault and account activity logs. We don't store any data records you request for processing.
    Starting Price: $0.0005 per task
  • 5
    datuum.ai
    AI-powered data integration tool that helps streamline the process of customer data onboarding. It allows for easy and fast automated data integration from various sources without coding, reducing preparation time to just a few minutes. With Datuum, organizations can efficiently extract, ingest, transform, migrate, and establish a single source of truth for their data, while integrating it into their existing data storage. Datuum is a no-code product and can reduce up to 80% of the time spent on data-related tasks, freeing up time for organizations to focus on generating insights and improving the customer experience. With over 40 years of experience in data management and operations, we at Datuum have incorporated our expertise into the core of our product, addressing the key challenges faced by data engineers and managers and ensuring that the platform is user-friendly, even for non-technical specialists.
  • 6
    Montara

    Montara

    Montara

    Montara enables BI teams and data analysts, using SQL only, to model and transform their data easily and seamlessly and enjoy benefits such as modular code, CI/CD, versioning and automated testing and documentation. With Montara, analysts can quickly understand how changes to models impact analysis, reports and dashboards with report-level lineage and support for 3rd party visualization tools such as Tableau and Looker. Furthermore, BI teams can perform ad-hoc analysis and create reports and dashboards on Montara directly.
    Starting Price: $100/user/month
  • 7
    Kestra

    Kestra

    Kestra

    Kestra is an open-source, event-driven orchestrator that simplifies data operations and improves collaboration between engineers and business users. By bringing Infrastructure as Code best practices to data pipelines, Kestra allows you to build reliable workflows and manage them with confidence. Thanks to the declarative YAML interface for defining orchestration logic, everyone who benefits from analytics can participate in the data pipeline creation process. The UI automatically adjusts the YAML definition any time you make changes to a workflow from the UI or via an API call. Therefore, the orchestration logic is defined declaratively in code, even if some workflow components are modified in other ways.
  • 8
    Talend Pipeline Designer
    Talend Pipeline Designer is a web-based self-service application that takes raw data and makes it analytics-ready. Compose reusable pipelines to extract, improve, and transform data from almost any source, then pass it to your choice of data warehouse destinations, where it can serve as the basis for the dashboards that power your business insights. Build and deploy data pipelines in less time. Design and preview, in batch or streaming, directly in your web browser with an easy, visual UI. Scale with native support for the latest hybrid and multi-cloud technologies, and improve productivity with real-time development and debugging. Live preview lets you instantly and visually diagnose issues with your data. Make better decisions faster with dataset documentation, quality proofing, and promotion. Transform data and improve data quality with built-in functions applied across batch or streaming pipelines, turning data health into an effortless, automated discipline.
  • 9
    Datavolo

    Datavolo

    Datavolo

    Capture all your unstructured data for all your LLM needs. Datavolo replaces single-use, point-to-point code with fast, flexible, reusable pipelines, freeing you to focus on what matters most, doing incredible work. Datavolo is the dataflow infrastructure that gives you a competitive edge. Get fast, unencumbered access to all of your data, including the unstructured files that LLMs rely on, and power up your generative AI. Get pipelines that grow with you, in minutes, not days, without custom coding. Instantly configure from any source to any destination at any time. Trust your data because lineage is built into every
pipeline. Make single-use pipelines and expensive configurations a thing of the past. Harness your unstructured data and unleash AI innovation with Datavolo, powered by Apache NiFi and built specifically for unstructured data. Our founders have spent a lifetime helping organizations make the most of their data.
    Starting Price: $36,000 per year
  • 10
    Ask On Data

    Ask On Data

    Helical Insight

    Ask On Data is a chat based AI powered open source Data Engineering/ ETL tool. With agentic capabilities and pioneering next gen data stack, Ask On Data can help in creating data pipelines via a very simple chat interface. It can be used for tasks like Data Migration, Data Loading, Data Transformations, Data Wrangling, Data Cleaning as well as Data Analysis as well with a simple chat interface. This tool can be used by Data Scientists to get clean data. Data Analyst and BI engineers to create calculated tables. Data Engineers can also use this tool to increase their efficiency and achieve much more.
  • 11
    Unravel

    Unravel

    Unravel Data

    Unravel makes data work anywhere: on Azure, AWS, GCP or in your own data center– Optimizing performance, automating troubleshooting and keeping costs in check. Unravel helps you monitor, manage, and improve your data pipelines in the cloud and on-premises – to drive more reliable performance in the applications that power your business. Get a unified view of your entire data stack. Unravel collects performance data from every platform, system, and application on any cloud then uses agentless technologies and machine learning to model your data pipelines from end to end. Explore, correlate, and analyze everything in your modern data and cloud environment. Unravel’s data model reveals dependencies, issues, and opportunities, how apps and resources are being used, what’s working and what’s not. Don’t just monitor performance – quickly troubleshoot and rapidly remediate issues. Leverage AI-powered recommendations to automate performance improvements, lower costs, and prepare.
  • 12
    Actifio

    Actifio

    Google

    Automate self-service provisioning and refresh of enterprise workloads, integrate with existing toolchain. High-performance data delivery and re-use for data scientists through a rich set of APIs and automation. Recover any data across any cloud from any point in time – at the same time – at scale, beyond legacy solutions. Minimize the business impact of ransomware / cyber attacks by recovering quickly with immutable backups. Unified platform to better protect, secure, retain, govern, or recover your data on-premises or in the cloud. Actifio’s patented software platform turns data silos into data pipelines. Virtual Data Pipeline (VDP) delivers full-stack data management — on-premises, hybrid or multi-cloud – from rich application integration, SLA-based orchestration, flexible data movement, and data immutability and security.
  • 13
    Informatica Data Engineering
    Ingest, prepare, and process data pipelines at scale for AI and analytics in the cloud. Informatica’s comprehensive data engineering portfolio provides everything you need to process and prepare big data engineering workloads to fuel AI and analytics: robust data integration, data quality, streaming, masking, and data preparation capabilities. Rapidly build intelligent data pipelines with CLAIRE®-powered automation, including automatic change data capture (CDC) Ingest thousands of databases and millions of files, and streaming events. Accelerate time-to-value ROI with self-service access to trusted, high-quality data. Get unbiased, real-world insights on Informatica data engineering solutions from peers you trust. Reference architectures for sustainable data engineering solutions. AI-powered data engineering in the cloud delivers the trusted, high quality data your analysts and data scientists need to transform business.
  • 14
    Qlik Compose
    Qlik Compose for Data Warehouses (formerly Attunity Compose for Data Warehouses) provides a modern approach by automating and optimizing data warehouse creation and operation. Qlik Compose automates designing the warehouse, generating ETL code, and quickly applying updates, all whilst leveraging best practices and proven design patterns. Qlik Compose for Data Warehouses dramatically reduces the time, cost and risk of BI projects, whether on-premises or in the cloud. Qlik Compose for Data Lakes (formerly Attunity Compose for Data Lakes) automates your data pipelines to create analytics-ready data sets. By automating data ingestion, schema creation, and continual updates, organizations realize faster time-to-value from their existing data lake investments.
  • 15
    Hazelcast

    Hazelcast

    Hazelcast

    In-Memory Computing Platform. The digital world is different. Microseconds matter. That's why the world's largest organizations rely on us to power their most time-sensitive applications at scale. New data-enabled applications can deliver transformative business power – if they meet today’s requirement of immediacy. Hazelcast solutions complement virtually any database to deliver results that are significantly faster than a traditional system of record. Hazelcast’s distributed architecture provides redundancy for continuous cluster up-time and always available data to serve the most demanding applications. Capacity grows elastically with demand, without compromising performance or availability. The fastest in-memory data grid, combined with third-generation high-speed event processing, delivered through the cloud.
  • 16
    Trifacta

    Trifacta

    Trifacta

    The fastest way to prep data and build data pipelines in the cloud. Trifacta provides visual and intelligent guidance to accelerate data preparation so you can get to insights faster. Poor data quality can sink any analytics project. Trifacta helps you understand your data so you can quickly and accurately clean it up. All the power with none of the code. Trifacta provides visual and intelligent guidance so you can get to insights faster. Manual, repetitive data preparation processes don’t scale. Trifacta helps you build, deploy and manage self-service data pipelines in minutes not months.
  • 17
    StreamScape

    StreamScape

    StreamScape

    Make use of Reactive Programming on the back-end without the need for specialized languages or cumbersome frameworks. Triggers, Actors and Event Collections make it easy to build data pipelines and work with data streams using simple SQL-like syntax, shielding users from the complexities of distributed system development. Extensible Data Modeling is a key feature that supports rich semantics and schema definition for representing real-world things. On-the-fly validation and data shaping rules support a variey of formats like XML and JSON, allowing you to easily describe and evolve your schema, keeping pace with changing business requirements. If you can describe it, we can query it. Know SQL and Javascript? Then you already know how to use the data engine. Whatever the format, a powerful query language lets you instantly test logic expressions and functions, speeding up development and simplifying deployment for unmatched data agility.
  • 18
    TIBCO Data Fabric
    More data sources, more silos, more complexity, and constant change. Data architectures are challenged to keep pace—a big problem for today's data-driven organizations, and one that puts your business at risk. A data fabric is a modern distributed data architecture that includes shared data assets and optimized data fabric pipelines that you can use to address today's data challenges in a unified way. Optimized data management and integration capabilities so you can intelligently simplify, automate, and accelerate your data pipelines. Easy-to-deploy and adapt distributed data architecture that fits your complex, ever-changing technology landscape. Accelerate time to value by unlocking your distributed on-premises, cloud, and hybrid cloud data, no matter where it resides, and delivering it wherever it's needed at the pace of business.
  • 19
    Datazoom

    Datazoom

    Datazoom

    Improving the experience, efficiency, and profitability of streaming video requires data. Datazoom enables video publishers to better operate distributed architectures through centralizing, standardizing, and integrating data in real-time to create a more powerful data pipeline and improve observability, adaptability, and optimization solutions. Datazoom is a video data platform that continually gathers data from endpoints, like a CDN or a video player, through an ecosystem of collectors. Once the data is gathered, it is normalized using standardized data definitions. This data is then sent through available connectors to analytics platforms like Google BigQuery, Google Analytics, and Splunk and can be visualized in tools such as Looker and Superset. Datazoom is your key to a more effective and efficient data pipeline. Get the data you need in real-time. Don’t wait for your data when you need to resolve an issue immediately.
  • 20
    Fosfor Decision Cloud
    Everything you need to make better business decisions. The Fosfor Decision Cloud unifies the modern data ecosystem to deliver the long-sought promise of AI: enhanced business outcomes. The Fosfor Decision Cloud unifies the components of your data stack into a modern decision stack, built to amplify business outcomes. Fosfor works seamlessly with its partners to create the modern decision stack, which delivers unprecedented value from your data investments.
  • 21
    Spring Cloud Data Flow
    Microservice-based streaming and batch data processing for Cloud Foundry and Kubernetes. Spring Cloud Data Flow provides tools to create complex topologies for streaming and batch data pipelines. The data pipelines consist of Spring Boot apps, built using the Spring Cloud Stream or Spring Cloud Task microservice frameworks. Spring Cloud Data Flow supports a range of data processing use cases, from ETL to import/export, event streaming, and predictive analytics. The Spring Cloud Data Flow server uses Spring Cloud Deployer, to deploy data pipelines made of Spring Cloud Stream or Spring Cloud Task applications onto modern platforms such as Cloud Foundry and Kubernetes. A selection of pre-built stream and task/batch starter apps for various data integration and processing scenarios facilitate learning and experimentation. Custom stream and task applications, targeting different middleware or data services, can be built using the familiar Spring Boot style programming model.
  • 22
    Conduktor

    Conduktor

    Conduktor

    We created Conduktor, the all-in-one friendly interface to work with the Apache Kafka ecosystem. Develop and manage Apache Kafka with confidence. With Conduktor DevTools, the all-in-one Apache Kafka desktop client. Develop and manage Apache Kafka with confidence, and save time for your entire team. Apache Kafka is hard to learn and to use. Made by Kafka lovers, Conduktor best-in-class user experience is loved by developers. Conduktor offers more than just an interface over Apache Kafka. It provides you and your teams the control of your whole data pipeline, thanks to our integration with most technologies around Apache Kafka. Provide you and your teams the most complete tool on top of Apache Kafka.
  • 23
    Azkaban

    Azkaban

    Azkaban

    Azkaban is a distributed Workflow Manager, implemented at LinkedIn to solve the problem of Hadoop job dependencies. We had jobs that needed to run in order, from ETL jobs to data analytics products. After version 3.0, we provide two modes: the stand alone “solo-server” mode and distributed multiple-executor mode. The following describes the differences between the two modes. In solo server mode, the DB is embedded H2 and both web server and executor server run in the same process. This should be useful if one just wants to try things out. It can also be used on small scale use cases. The multiple executor mode is for most serious production environment. Its DB should be backed by MySQL instances with master-slave set up. The web server and executor servers should ideally run in different hosts so that upgrading and maintenance shouldn’t affect users. This multiple host setup brings in robust and scalable aspect to Azkaban.
  • 24
    Crux

    Crux

    Crux

    Find out why the heavy hitters are using the Crux external data automation platform to scale external data integration, transformation, and observability without increasing headcount. Our cloud-native data integration technology accelerates the ingestion, preparation, observability and ongoing delivery of any external dataset. The result is that we can ensure you get quality data in the right place, in the right format when you need it. Leverage automatic schema detection, delivery schedule inference, and lifecycle management to build pipelines from any external data source quickly. Enhance discoverability throughout your organization through a private catalog of linked and matched data products. Enrich, validate, and transform any dataset to quickly combine it with other data sources and accelerate analytics.
  • 25
    Nextflow Tower

    Nextflow Tower

    Seqera Labs

    Nextflow Tower is an intuitive centralized command post that enables large-scale collaborative data analysis. With Tower, users can easily launch, manage, and monitor scalable Nextflow data analysis pipelines and compute environments on-premises or on most clouds. Researchers can focus on the science that matters rather than worrying about infrastructure engineering. Compliance is simplified with predictable, auditable pipeline execution and the ability to reliably reproduce results obtained with specific data sets and pipeline versions on demand. Nextflow Tower is developed and supported by Seqera Labs, the creators and maintainers of the open-source Nextflow project. This means that users get high-quality support directly from the source. Unlike third-party frameworks that incorporate Nextflow, Tower is deeply integrated and can help users benefit from Nextflow's complete set of capabilities.
  • 26
    Pantomath

    Pantomath

    Pantomath

    Organizations continuously strive to be more data-driven, building dashboards, analytics, and data pipelines across the modern data stack. Unfortunately, most organizations struggle with data reliability issues leading to poor business decisions and lack of trust in data as an organization, directly impacting their bottom line. Resolving complex data issues is a manual and time-consuming process involving multiple teams all relying on tribal knowledge to manually reverse engineer complex data pipelines across different platforms to identify root-cause and understand the impact. Pantomath is a data pipeline observability and traceability platform for automating data operations. It continuously monitors datasets and jobs across the enterprise data ecosystem providing context to complex data pipelines by creating automated cross-platform technical pipeline lineage.
  • 27
    Tarsal

    Tarsal

    Tarsal

    Tarsal's infinite scalability means as your organization grows, Tarsal grows with you. Tarsal makes it easy for you to switch where you're sending data - today's SIEM data is tomorrow's data lake data; all with one click. Keep your SIEM and gradually migrate analytics over to a data lake. You don't have to rip anything out to use Tarsal. Some analytics just won't run on your SIEM. Use Tarsal to have query-ready data on a data lake. Your SIEM is one of the biggest line items in your budget. Use Tarsal to send some of that data to your data lake. Tarsal is the first highly scalable ETL data pipeline built for security teams. Easily exfil terabytes of data in just just a few clicks, with instant normalization, and route that data to your desired destination.
  • 28
    definity

    definity

    definity

    Monitor and control everything your data pipelines do with zero code changes. Monitor data and pipelines in motion to proactively prevent downtime and quickly root cause issues. Optimize pipeline runs and job performance to save costs and keep SLAs. Accelerate code deployments and platform upgrades while maintaining reliability and performance. Data & performance checks in line with pipeline runs. Checks on input data, before pipelines even run. Automatic preemption of runs. definity takes away the effort to build deep end-to-end coverage, so you are protected at every step, across every dimension. definity shifts observability to post-production to achieve ubiquity, increase coverage, and reduce manual effort. definity agents automatically run with every pipeline, with zero footprints. Unified view of data, pipelines, infra, lineage, and code for every data asset. Detect in run-time and avoid async checks. Auto-preempt runs, even on inputs.
  • 29
    DataFactory

    DataFactory

    RightData

    DataFactory is everything you need for data integration and building efficient data pipelines. Transform raw data into information and insights in the time of other tools. The days of writing pages of code to move and transform data are over. Drag data operations from a tool palette onto your pipeline canvas to build even the most complex pipelines. A palette of data transformations you can drag to a pipeline canvas. Build pipelines that would take hours in code in minutes. Automate and operationalize using in-built approval and a version control mechanism. It used to be that data wrangling was one tool, pipeline building was another, and machine learning was yet another tool. DataFactory brings these functions together, where they belong. Perform operations easily using drag-and-drop transformations. Wrangle datasets to prepare for advanced analytics. Add & operationalize ML functions like segmentation and categorization without code.
  • 30
    Lightbend

    Lightbend

    Lightbend

    Lightbend provides technology that enables developers to easily build data-centric applications that bring the most demanding, globally distributed applications and streaming data pipelines to life. Companies worldwide turn to Lightbend to solve the challenges of real-time, distributed data in support of their most business-critical initiatives. Akka Platform provides the building blocks that make it easy for businesses to build, deploy, and run large-scale applications that support digitally transformative initiatives. Accelerate time-to-value and reduce infrastructure and cloud costs with reactive microservices that take full advantage of the distributed nature of the cloud and are resilient to failure, highly efficient, and operative at any scale. Native support for encryption, data shredding, TLS enforcement, and continued compliance with GDPR. Framework for quick construction, deployment and management of streaming data pipelines.