Compare the Top Data Engineering Tools in the UK as of March 2025 - Page 2

  • 1
    Numbers Station

    Numbers Station

    Numbers Station

    Accelerating insights, eliminating barriers for data analysts. Intelligent data stack automation, get insights from your data 10x faster with AI. Pioneered at the Stanford AI lab and now available to your enterprise, intelligence for the modern data stack has arrived. Use natural language to get value from your messy, complex, and siloed data in minutes. Tell your data your desired output, and immediately generate code for execution. Customizable automation of complex data tasks that are specific to your organization and not captured by templated solutions. Empower anyone to securely automate data-intensive workflows on the modern data stack, free data engineers from an endless backlog of requests. Arrive at insights in minutes, not months. Uniquely designed for you, tuned for your organization’s needs. Integrated with upstream and downstream tools, Snowflake, Databricks, Redshift, BigQuery, and more coming, built on dbt.
  • 2
    Chalk

    Chalk

    Chalk

    Powerful data engineering workflows, without the infrastructure headaches. Complex streaming, scheduling, and data backfill pipelines, are all defined in simple, composable Python. Make ETL a thing of the past, fetch all of your data in real-time, no matter how complex. Incorporate deep learning and LLMs into decisions alongside structured business data. Make better predictions with fresher data, don’t pay vendors to pre-fetch data you don’t use, and query data just in time for online predictions. Experiment in Jupyter, then deploy to production. Prevent train-serve skew and create new data workflows in milliseconds. Instantly monitor all of your data workflows in real-time; track usage, and data quality effortlessly. Know everything you computed and data replay anything. Integrate with the tools you already use and deploy to your own infrastructure. Decide and enforce withdrawal limits with custom hold times.
    Starting Price: Free
  • 3
    DatErica

    DatErica

    DatErica

    DatErica: Revolutionizing Data Processing DatErica is a cutting-edge data processing platform designed to automate and streamline data operations. Leveraging a robust technology stack including Node.js and microservice architecture, it provides scalable and flexible solutions for complex data needs. The platform offers advanced ETL capabilities, seamless data integration from various sources, and secure data warehousing. DatErica's AI-powered tools enable sophisticated data transformation and validation, ensuring accuracy and consistency. With real-time analytics, customizable dashboards, and automated reporting, users gain valuable insights for informed decision-making. The user-friendly interface simplifies workflow management, while real-time monitoring and alerts enhance operational efficiency. DatErica is ideal for data engineers, analysts, IT teams, and businesses seeking to optimize their data processes and drive growth.
    Starting Price: 9
  • 4
    NAVIK AI Platform

    NAVIK AI Platform

    Absolutdata Analytics

    An Advanced Analytics Software Platform That Helps Sales, Marketing, Technology, and Operations Leaders Make Great Business Decisions Based on Powerful Data-Driven Insights. Addresses the breadth of AI needs across data infrastructure, data engineering and data analytics. UI, workflows and proprietary algorithms are tuned to the unique needs of each client. Components are modular enabling custom configurations. Supports, augments and automates decision making. Elimination of human biases drives better business outcomes. The AI adoption rate is unprecedented. To stay competitive, leading companies need a rapid implementation strategy that scales. To create scalable business impact, combine these four distinct capabilities.
  • 5
    Databricks Data Intelligence Platform
    The Databricks Data Intelligence Platform allows your entire organization to use data and AI. It’s built on a lakehouse to provide an open, unified foundation for all data and governance, and is powered by a Data Intelligence Engine that understands the uniqueness of your data. The winners in every industry will be data and AI companies. From ETL to data warehousing to generative AI, Databricks helps you simplify and accelerate your data and AI goals. Databricks combines generative AI with the unification benefits of a lakehouse to power a Data Intelligence Engine that understands the unique semantics of your data. This allows the Databricks Platform to automatically optimize performance and manage infrastructure in ways unique to your business. The Data Intelligence Engine understands your organization’s language, so search and discovery of new data is as easy as asking a question like you would to a coworker.
  • 6
    Fivetran

    Fivetran

    Fivetran

    Fivetran is the smartest way to replicate data into your warehouse. We've built the only zero-maintenance pipeline, turning months of on-going development into a 5-minute setup. Our connectors bring data from applications and databases into one central location so that analysts can unlock profound insights about their business. Schema designs and ERDs make synced data immediately usable. Transform data into analytics-ready tables as soon as it’s loaded into your warehouse. Spend less time writing transformation code with our out-of-the-box data modeling. Connect to any git repository and manage dbt models directly from Fivetran. Develop and deliver your product with the utmost confidence in ours. Uptime and data delivery guarantees ensure your customers’ data never goes stale. Troubleshoot fast with a global team of Support Specialists.
  • 7
    Iterative

    Iterative

    Iterative

    AI teams face challenges that require new technologies. We build these technologies. Existing data warehouses and data lakes do not fit unstructured datasets like text, images, and videos. AI hand in hand with software development. Built with data scientists, ML engineers, and data engineers in mind. Don’t reinvent the wheel! Fast and cost‑efficient path to production. Your data is always stored by you. Your models are trained on your machines. Existing data warehouses and data lakes do not fit unstructured datasets like text, images, and videos. AI teams face challenges that require new technologies. We build these technologies. Studio is an extension of GitHub, GitLab or BitBucket. Sign up for the online SaaS version or contact us to get on-premise installation
  • 8
    Bodo.ai

    Bodo.ai

    Bodo.ai

    Bodo’s powerful compute engine and parallel computing approach provides efficient execution and effective scalability even for 10,000+ cores and PBs of data. Bodo enables faster development and easier maintenance for data science, data engineering and ML workloads with standard Python APIs like Pandas. Avoid frequent failures with bare-metal native code execution and catch errors before they appear in production with end-to-end compilation. Experiment faster with large datasets on your laptop with the simplicity that only Python can provide. Write production-ready code without the hassle of refactoring for scaling on large infrastructure!
  • 9
    SiaSearch

    SiaSearch

    SiaSearch

    We want ML engineers to worry less about data engineering and focus on what they love, building better models in less time. Our product is a powerful framework that makes it 10x easier and faster for developers to explore, understand and share visual data at scale. Automatically create custom interval attributes using pre-trained extractors or any other model. Visualize data and analyze model performance using custom attributes combined with all common KPIs. Use custom attributes to query, find rare edge cases and curate new training data across your whole data lake. Easily save, edit, version, comment and share frames, sequences or objects with colleagues or 3rd parties. SiaSearch, a data management platform that automatically extracts frame-level, contextual metadata and utilizes it for fast data exploration, selection and evaluation. Automating these tasks with metadata can more than double engineering productivity and remove the bottleneck to building industrial AI.
  • 10
    Datakin

    Datakin

    Datakin

    Instantly reveal the order hidden within your complex data world, and always know exactly where to look for answers. Datakin automatically traces data lineage, showing your entire data ecosystem in a rich visual graph. It clearly illustrates the upstream and downstream relationships for each dataset. The Duration tab summarizes a job’s performance in a Gantt-style chart along with its upstream dependencies, making it easy to find bottlenecks. When you need to pinpoint the exact moment of a breaking change, the Compare tab shows how your jobs and datasets have changed between runs. Sometimes jobs that run successfully produce bad output. The Quality tab surfaces critical data quality metrics, showing how they change over time so anomalies become obvious. Datakin helps you find the root cause of issues quickly – and prevent new ones from occurring.
    Starting Price: $2 per month
  • 11
    Feast

    Feast

    Tecton

    Make your offline data available for real-time predictions without having to build custom pipelines. Ensure data consistency between offline training and online inference, eliminating train-serve skew. Standardize data engineering workflows under one consistent framework. Teams use Feast as the foundation of their internal ML platforms. Feast doesn’t require the deployment and management of dedicated infrastructure. Instead, it reuses existing infrastructure and spins up new resources when needed. You are not looking for a managed solution and are willing to manage and maintain your own implementation. You have engineers that are able to support the implementation and management of Feast. You want to run pipelines that transform raw data into features in a separate system and integrate with it. You have unique requirements and want to build on top of an open source solution.
  • 12
    datuum.ai
    AI-powered data integration tool that helps streamline the process of customer data onboarding. It allows for easy and fast automated data integration from various sources without coding, reducing preparation time to just a few minutes. With Datuum, organizations can efficiently extract, ingest, transform, migrate, and establish a single source of truth for their data, while integrating it into their existing data storage. Datuum is a no-code product and can reduce up to 80% of the time spent on data-related tasks, freeing up time for organizations to focus on generating insights and improving the customer experience. With over 40 years of experience in data management and operations, we at Datuum have incorporated our expertise into the core of our product, addressing the key challenges faced by data engineers and managers and ensuring that the platform is user-friendly, even for non-technical specialists.
  • 13
    Kestra

    Kestra

    Kestra

    Kestra is an open-source, event-driven orchestrator that simplifies data operations and improves collaboration between engineers and business users. By bringing Infrastructure as Code best practices to data pipelines, Kestra allows you to build reliable workflows and manage them with confidence. Thanks to the declarative YAML interface for defining orchestration logic, everyone who benefits from analytics can participate in the data pipeline creation process. The UI automatically adjusts the YAML definition any time you make changes to a workflow from the UI or via an API call. Therefore, the orchestration logic is defined declaratively in code, even if some workflow components are modified in other ways.
  • 14
    Roseman Labs

    Roseman Labs

    Roseman Labs

    Roseman Labs enables you to encrypt, link, and analyze multiple data sets while safeguarding the privacy and commercial sensitivity of the actual data. This allows you to combine data sets from several parties, analyze them, and get the insights you need to optimize your processes. Tap into the unused potential of your data. With Roseman Labs, you have the power of cryptography at your fingertips through the simplicity of Python. Encrypting sensitive data allows you to analyze it while safeguarding privacy, protecting commercial sensitivity, and adhering to GDPR regulations. Generate insights from personal or commercially sensitive information, with enhanced GDPR compliance. Ensure data privacy with state-of-the-art encryption. Roseman Labs allows you to link data sets from several parties. By analyzing the combined data, you'll be able to discover which records appear in several data sets, allowing for new patterns to emerge.
  • 15
    Advana

    Advana

    Advana

    Advana is a next-generation no-code data engineering and data science software designed to make implementing, accelerating, and scaling data analytics simpler and faster, giving you the freedom to focus on what matters most to you, solving your business problems. Advana includes a wide range of data analytics capabilities and features that allow you to transform, manage, and analyze your data effectively and efficiently. Modernize your legacy data analytics solutions. Deliver business value faster and cheaper leveraging the no-code paradigm. Retain talent with domain expertise while computing technology choices evolve. Collaborate across business functions and IT seamlessly in a common user interface. Enable solution development in new technologies without acquiring new coding skills. Port your solutions to new technologies effortlessly as and when they become available.
    Starting Price: $97,000 per year
  • 16
    Ask On Data

    Ask On Data

    Helical Insight

    Ask On Data is a chat based AI powered open source Data Engineering/ ETL tool. With agentic capabilities and pioneering next gen data stack, Ask On Data can help in creating data pipelines via a very simple chat interface. It can be used for tasks like Data Migration, Data Loading, Data Transformations, Data Wrangling, Data Cleaning as well as Data Analysis as well with a simple chat interface. This tool can be used by Data Scientists to get clean data. Data Analyst and BI engineers to create calculated tables. Data Engineers can also use this tool to increase their efficiency and achieve much more.
  • 17
    Xtract Data Automation Suite (XDAS)
    Xtract Data Automation Suite (XDAS) is a comprehensive platform designed to streamline process automation for data-intensive workflows. It offers a vast library of over 300 pre-built micro solutions and AI agents, enabling businesses to design and orchestrate AI-driven workflows with no code environment, thereby enhancing operational efficiency and accelerating digital transformation. Key components of XDAS include Bot Studio, which allows users to create custom bots and scripts; Scrape Studio, for effortless web data extraction; GenAI Studio, for developing AI agents that process unstructured data; HITL Studio, which integrates human oversight into data workflows; and XRAG Studio, for building advanced AI systems using retrieval-augmented generation techniques. By leveraging these tools, XDAS helps businesses ensure compliance, reduce time to market, enhance data accuracy, and forecast market trends across various industries.
  • 18
    SplineCloud

    SplineCloud

    SplineCloud

    SplineCloud is an open knowledge management platform designed to facilitate the discovery, formalization, and exchange of structured and reusable knowledge in science and engineering. It enables users to organize data into structured repositories, making it findable and accessible. The platform offers tools such as an online plot digitizer for extracting data from graphs and an interactive curve fitting tool that allows users to define functional relationships in datasets using smooth spline functions. Users can also reuse datasets and relations in their models and calculations by accessing them directly through the SplineCloud API or by utilizing open source client libraries for Python and MATLAB. The platform supports the development of reusable engineering and analytical applications, aiming to reduce redundancy in design processes, preserve expert knowledge, and facilitate better decision-making.
  • 19
    Informatica Data Engineering
    Ingest, prepare, and process data pipelines at scale for AI and analytics in the cloud. Informatica’s comprehensive data engineering portfolio provides everything you need to process and prepare big data engineering workloads to fuel AI and analytics: robust data integration, data quality, streaming, masking, and data preparation capabilities. Rapidly build intelligent data pipelines with CLAIRE®-powered automation, including automatic change data capture (CDC) Ingest thousands of databases and millions of files, and streaming events. Accelerate time-to-value ROI with self-service access to trusted, high-quality data. Get unbiased, real-world insights on Informatica data engineering solutions from peers you trust. Reference architectures for sustainable data engineering solutions. AI-powered data engineering in the cloud delivers the trusted, high quality data your analysts and data scientists need to transform business.
  • 20
    AtScale

    AtScale

    AtScale

    AtScale helps accelerate and simplify business intelligence resulting in faster time-to-insight, better business decisions, and more ROI on your Cloud analytics investment. Eliminate repetitive data engineering tasks like curating, maintaining and delivering data for analysis. Define business definitions in one location to ensure consistent KPI reporting across BI tools. Accelerate time to insight from data while efficiently managing cloud compute costs. Leverage existing data security policies for data analytics no matter where data resides. AtScale’s Insights workbooks and models let you perform Cloud OLAP multidimensional analysis on data sets from multiple providers – with no data prep or data engineering required. We provide built-in easy to use dimensions and measures to help you quickly derive insights that you can use for business decisions.
  • 21
    Presto

    Presto

    Presto Foundation

    Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. For data engineers who struggle with managing multiple query languages and interfaces to siloed databases and storage, Presto is the fast and reliable engine that provides one simple ANSI SQL interface for all your data analytics and your open lakehouse. Different engines for different workloads means you will have to re-platform down the road. With Presto, you get 1 familar ANSI SQL language and 1 engine for your data analytics so you don't need to graduate to another lakehouse engine. Presto can be used for interactive and batch workloads, small and large amounts of data, and scales from a few to thousands of users. Presto gives you one simple ANSI SQL interface for all of your data in various siloed data systems, helping you join your data ecosystem together.
  • 22
    Informatica Data Engineering Streaming
    AI-powered Informatica Data Engineering Streaming enables data engineers to ingest, process, and analyze real-time streaming data for actionable insights. Advanced serverless deployment option​ with integrated metering dashboard cuts admin overhead. Rapidly build intelligent data pipelines with CLAIRE®-powered automation, including automatic change data capture (CDC). Ingest thousands of databases and millions of files, and streaming events. Efficiently ingest databases, files, and streaming data for real-time data replication and streaming analytics. Find and inventory all data assets throughout your organization. Intelligently discover and prepare trusted data for advanced analytics and AI/ML projects.
  • 23
    Mosaic AIOps

    Mosaic AIOps

    Larsen & Toubro Infotech

    LTI’s Mosaic is a converged platform, which offers data engineering, advanced analytics, knowledge-led automation, IoT connectivity and improved solution experience to its users. Mosaic enables organizations to undertake quantum leaps in business transformation, and brings an insights-driven approach to decision-making. It helps deliver pioneering Analytics solutions at the intersection of physical and digital worlds. Catalyst for Enterprise ML & AI Adoption. ModelManagement. TrainingAtScale. AIDevOps. MLOps. MultiTenancy. LTI’s Mosaic AI is a cognitive AI platform, designed to provide its users with an intuitive experience in building, training, deploying and managing AI models at enterprise scale. It brings together the best AI frameworks & templates, to provide a platform where users enjoy a seamless & personalized “Build-to-Run” transition on their AI workflows.
  • 24
    IBM Databand
    Monitor your data health and pipeline performance. Gain unified visibility for pipelines running on cloud-native tools like Apache Airflow, Apache Spark, Snowflake, BigQuery, and Kubernetes. An observability platform purpose built for Data Engineers. Data engineering is only getting more challenging as demands from business stakeholders grow. Databand can help you catch up. More pipelines, more complexity. Data engineers are working with more complex infrastructure than ever and pushing higher speeds of release. It’s harder to understand why a process has failed, why it’s running late, and how changes affect the quality of data outputs. Data consumers are frustrated with inconsistent results, model performance, and delays in data delivery. Not knowing exactly what data is being delivered, or precisely where failures are coming from, leads to persistent lack of trust. Pipeline logs, errors, and data quality metrics are captured and stored in independent, isolated systems.
  • 25
    Molecula

    Molecula

    Molecula

    Molecula is an enterprise feature store that simplifies, accelerates, and controls big data access to power machine-scale analytics and AI. Continuously extracting features, reducing the dimensionality of data at the source, and routing real-time feature changes into a central store enables millisecond queries, computation, and feature re-use across formats and locations without copying or moving raw data. The Molecula feature store provides data engineers, data scientists, and application developers a single access point to graduate from reporting and explaining with human-scale data to predicting and prescribing real-time business outcomes with all data. Enterprises spend a lot of money preparing, aggregating, and making numerous copies of their data for every project before they can make decisions with it. Molecula brings an entirely new paradigm for continuous, real-time data analysis to be used for all your mission-critical applications.
  • 26
    Delta Lake

    Delta Lake

    Delta Lake

    Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. Data lakes typically have multiple data pipelines reading and writing data concurrently, and data engineers have to go through a tedious process to ensure data integrity, due to the lack of transactions. Delta Lake brings ACID transactions to your data lakes. It provides serializability, the strongest level of isolation level. Learn more at Diving into Delta Lake: Unpacking the Transaction Log. In big data, even the metadata itself can be "big data". Delta Lake treats metadata just like data, leveraging Spark's distributed processing power to handle all its metadata. As a result, Delta Lake can handle petabyte-scale tables with billions of partitions and files at ease. Delta Lake provides snapshots of data enabling developers to access and revert to earlier versions of data for audits, rollbacks or to reproduce experiments.
  • 27
    Sentrana

    Sentrana

    Sentrana

    Whether your data is trapped in silos or you’re generating data at the edge, Sentrana gives you the flexibility to create AI and data engineering pipelines wherever your data is. And you can share your AI, Data, and Pipelines with anyone anywhere. With Sentrana, you can achieve newfound agility to effortlessly move between compute environments, while all your data and your work replicates automatically to wherever you want. Sentrana provides a large inventory of building blocks from which you can stitch together custom AI and Data Engineering pipelines. Rapidly assemble and test many different pipelines to create the AI you need. Turn your data into AI with near-zero effort and cost. Since Sentrana is an open platform, newer cutting-edge AI building blocks that are emerging every day are put right at your fingertips. Sentrana turns the Pipelines and AI models you create into re-executable building blocks that anyone on your team can hook into their own pipelines.
  • 28
    Intergraph Smart Laser Data Engineer
    Learn how CloudWorx for Intergraph Smart 3D connects to the point cloud and enables users to make the hybrid between the existing plant and newly modeled parts. Intergraph Smart® Laser Data Engineer provides state-of-the-art point cloud rendering performance in CloudWorx for Intergraph Smart 3D users via the JetStream point cloud engine. With its instant loading and persistent full rendering of the point cloud during user actions – regardless of the size of the dataset – Smart Laser Data Engineer delivers ultimate fidelity to the user. JetStream’s centralized data storage and administrative architecture – while serving the high-performance point cloud to users – also provides an easy-to-use project environment, making data distribution, user access control, backups and other IT functions easy and effective, saving time and money.
  • 29
    Knoldus

    Knoldus

    Knoldus

    World's largest team of Functional Programming and Fast Data engineers focused on creating customized high-performance solutions. We move from "thought" to "thing" via rapid prototyping and proof of concept. Activate an ecosystem to deliver at scale with CI/CD to support your requirements. Understanding the strategic intent and stakeholder needs to develop a shared vision. Deploy MVP to launch the product in the most efficient & expedient manner possible. Continuous improvements and enhancements to support new requirements. Building great products and providing unmatched engineering services would not be possible without the knowledge and extensive usage of the latest tools and technology. We help you to capitalize on opportunities, respond to competitive threats, and scale successful investments by reducing organizational friction from your company’s structures, processes, and culture. Knoldus helps clients identify and capture the most value and meaningful insights from data.
  • 30
    Foghub

    Foghub

    Foghub

    Simplified IT/OT Integration, Data Engineering & Real-Time Edge Intelligence. Easy to use, cross-platform, open architecture, edge computing for industrial time-series data. Foghub offers the Critical-Path to IT/OT convergence, connecting Operations (Sensors, Devices, and Systems) with Business (People, Processes, and Applications), enabling automated data acquisition, data engineering, transformations, advanced analytics and ML. Handle large variety, volume, and velocity of industrial data with out-of-the-box support for all data types, most popular industrial network protocols, OT/lab systems, and databases. Easily automate the collection of data about your production runs, batches, parts, cycle-times, process parameters, asset condition, performance, health, utilities, consumables as well as operators and their performance. Designed for scale, Foghub offers a comprehensive set of capabilities to handle large volumes and velocity of data.