Best Data Engineering Tools for Amazon Web Services (AWS)

Compare the Top Data Engineering Tools that integrate with Amazon Web Services (AWS) as of June 2025

This a list of Data Engineering tools that integrate with Amazon Web Services (AWS). Use the filters on the left to add additional filters for products that have integrations with Amazon Web Services (AWS). View the products that work with Amazon Web Services (AWS) in the table below.

What are Data Engineering Tools for Amazon Web Services (AWS)?

Data engineering tools are designed to facilitate the process of preparing and managing large datasets for analysis. These tools support tasks like data extraction, transformation, and loading (ETL), allowing engineers to build efficient data pipelines that move and process data from various sources into storage systems. They help ensure data integrity and quality by providing features for validation, cleansing, and monitoring. Data engineering tools also often include capabilities for automation, scalability, and integration with big data platforms. By streamlining complex workflows, they enable organizations to handle large-scale data operations more efficiently and support advanced analytics and machine learning initiatives. Compare and read user reviews of the best Data Engineering tools for Amazon Web Services (AWS) currently available using the table below. This list is updated regularly.

  • 1
    DataBuck

    DataBuck

    FirstEigen

    DataBuck is an AI-powered data validation platform that automates risk detection across dynamic, high-volume, and evolving data environments. DataBuck empowers your teams to: ✅ Enhance trust in analytics and reports, ensuring they are built on accurate and reliable data. ✅ Reduce maintenance costs by minimizing manual intervention. ✅ Scale operations 10x faster compared to traditional tools, enabling seamless adaptability in ever-changing data ecosystems. By proactively addressing system risks and improving data accuracy, DataBuck ensures your decision-making is driven by dependable insights. Proudly recognized in Gartner’s 2024 Market Guide for #DataObservability, DataBuck goes beyond traditional observability practices with its AI/ML innovations to deliver autonomous Data Trustability—empowering you to lead with confidence in today’s data-driven world.
    View Tool
    Visit Website
  • 2
    Composable DataOps Platform

    Composable DataOps Platform

    Composable Analytics

    Composable is an enterprise-grade DataOps platform built for business users that want to architect data intelligence solutions and deliver operational data-driven products leveraging disparate data sources, live feeds, and event data regardless of the format or structure of the data. With a modern, intuitive dataflow visual designer, built-in services to facilitate data engineering, and a composable architecture that enables abstraction and integration of any software or analytical approach, Composable is the leading integrated development environment to discover, manage, transform and analyze enterprise data.
    Starting Price: $8/hr - pay-as-you-go
  • 3
    Domo

    Domo

    Domo

    Domo puts data to work for everyone so they can multiply their impact on the business. Our cloud-native data experience platform goes beyond traditional business intelligence and analytics, making data visible and actionable with user-friendly dashboards and apps. Underpinned by a secure data foundation that connects with existing cloud and legacy systems, Domo helps companies optimize critical business processes at scale and in record time to spark the bold curiosity that powers exponential business results.
  • 4
    Qrvey

    Qrvey

    Qrvey

    Qrvey is the only solution for embedded analytics with a built-in data lake. Qrvey saves engineering teams time and money with a turnkey solution connecting your data warehouse to your SaaS application. Qrvey’s full-stack solution includes the necessary components so that your engineering team can build less. Qrvey’s multi-tenant data lake includes: - Elasticsearch as the analytics engine - A unified data pipeline for ingestion and transformation - A complete semantic layer for simple user and data security integration Qrvey’s embedded visualizations support everything from: - standard dashboards and templates - self-service reporting - user-level personalization - individual dataset creation - data-driven workflow automation Qrvey delivers this as a self-hosted package for cloud environments. This offers the best security as your data never leaves your environment while offering a better analytics experience to users. Less time and money on analytics
  • 5
    Prophecy

    Prophecy

    Prophecy

    Prophecy enables many more users - including visual ETL developers and Data Analysts. All you need to do is point-and-click and write a few SQL expressions to create your pipelines. As you use the Low-Code designer to build your workflows - you are developing high quality, readable code for Spark and Airflow that is committed to your Git. Prophecy gives you a gem builder - for you to quickly develop and rollout your own Frameworks. Examples are Data Quality, Encryption, new Sources and Targets that extend the built-in ones. Prophecy provides best practices and infrastructure as managed services – making your life and operations simple! With Prophecy, your workflows are high performance and use scale-out performance & scalability of the cloud.
    Starting Price: $299 per month
  • 6
    DQOps

    DQOps

    DQOps

    DQOps is an open-source data quality platform designed for data quality and data engineering teams that makes data quality visible to business sponsors. The platform provides an efficient user interface to quickly add data sources, configure data quality checks, and manage issues. DQOps comes with over 150 built-in data quality checks, but you can also design custom checks to detect any business-relevant data quality issues. The platform supports incremental data quality monitoring to support analyzing data quality of very big tables. Track data quality KPI scores using our built-in or custom dashboards to show progress in improving data quality to business sponsors. DQOps is DevOps-friendly, allowing you to define data quality definitions in YAML files stored in Git, run data quality checks directly from your data pipelines, or automate any action with a Python Client. DQOps works locally or as a SaaS platform.
    Starting Price: $499 per month
  • 7
    Iterative

    Iterative

    Iterative

    AI teams face challenges that require new technologies. We build these technologies. Existing data warehouses and data lakes do not fit unstructured datasets like text, images, and videos. AI hand in hand with software development. Built with data scientists, ML engineers, and data engineers in mind. Don’t reinvent the wheel! Fast and cost‑efficient path to production. Your data is always stored by you. Your models are trained on your machines. Existing data warehouses and data lakes do not fit unstructured datasets like text, images, and videos. AI teams face challenges that require new technologies. We build these technologies. Studio is an extension of GitHub, GitLab or BitBucket. Sign up for the online SaaS version or contact us to get on-premise installation
  • 8
    SiaSearch

    SiaSearch

    SiaSearch

    We want ML engineers to worry less about data engineering and focus on what they love, building better models in less time. Our product is a powerful framework that makes it 10x easier and faster for developers to explore, understand and share visual data at scale. Automatically create custom interval attributes using pre-trained extractors or any other model. Visualize data and analyze model performance using custom attributes combined with all common KPIs. Use custom attributes to query, find rare edge cases and curate new training data across your whole data lake. Easily save, edit, version, comment and share frames, sequences or objects with colleagues or 3rd parties. SiaSearch, a data management platform that automatically extracts frame-level, contextual metadata and utilizes it for fast data exploration, selection and evaluation. Automating these tasks with metadata can more than double engineering productivity and remove the bottleneck to building industrial AI.
  • 9
    Chalk

    Chalk

    Chalk

    Powerful data engineering workflows, without the infrastructure headaches. Complex streaming, scheduling, and data backfill pipelines, are all defined in simple, composable Python. Make ETL a thing of the past, fetch all of your data in real-time, no matter how complex. Incorporate deep learning and LLMs into decisions alongside structured business data. Make better predictions with fresher data, don’t pay vendors to pre-fetch data you don’t use, and query data just in time for online predictions. Experiment in Jupyter, then deploy to production. Prevent train-serve skew and create new data workflows in milliseconds. Instantly monitor all of your data workflows in real-time; track usage, and data quality effortlessly. Know everything you computed and data replay anything. Integrate with the tools you already use and deploy to your own infrastructure. Decide and enforce withdrawal limits with custom hold times.
    Starting Price: Free
  • 10
    Databricks Data Intelligence Platform
    The Databricks Data Intelligence Platform allows your entire organization to use data and AI. It’s built on a lakehouse to provide an open, unified foundation for all data and governance, and is powered by a Data Intelligence Engine that understands the uniqueness of your data. The winners in every industry will be data and AI companies. From ETL to data warehousing to generative AI, Databricks helps you simplify and accelerate your data and AI goals. Databricks combines generative AI with the unification benefits of a lakehouse to power a Data Intelligence Engine that understands the unique semantics of your data. This allows the Databricks Platform to automatically optimize performance and manage infrastructure in ways unique to your business. The Data Intelligence Engine understands your organization’s language, so search and discovery of new data is as easy as asking a question like you would to a coworker.
  • 11
    AtScale

    AtScale

    AtScale

    AtScale helps accelerate and simplify business intelligence resulting in faster time-to-insight, better business decisions, and more ROI on your Cloud analytics investment. Eliminate repetitive data engineering tasks like curating, maintaining and delivering data for analysis. Define business definitions in one location to ensure consistent KPI reporting across BI tools. Accelerate time to insight from data while efficiently managing cloud compute costs. Leverage existing data security policies for data analytics no matter where data resides. AtScale’s Insights workbooks and models let you perform Cloud OLAP multidimensional analysis on data sets from multiple providers – with no data prep or data engineering required. We provide built-in easy to use dimensions and measures to help you quickly derive insights that you can use for business decisions.
  • 12
    IBM Databand
    Monitor your data health and pipeline performance. Gain unified visibility for pipelines running on cloud-native tools like Apache Airflow, Apache Spark, Snowflake, BigQuery, and Kubernetes. An observability platform purpose built for Data Engineers. Data engineering is only getting more challenging as demands from business stakeholders grow. Databand can help you catch up. More pipelines, more complexity. Data engineers are working with more complex infrastructure than ever and pushing higher speeds of release. It’s harder to understand why a process has failed, why it’s running late, and how changes affect the quality of data outputs. Data consumers are frustrated with inconsistent results, model performance, and delays in data delivery. Not knowing exactly what data is being delivered, or precisely where failures are coming from, leads to persistent lack of trust. Pipeline logs, errors, and data quality metrics are captured and stored in independent, isolated systems.
  • 13
    Foghub

    Foghub

    Foghub

    Simplified IT/OT Integration, Data Engineering & Real-Time Edge Intelligence. Easy to use, cross-platform, open architecture, edge computing for industrial time-series data. Foghub offers the Critical-Path to IT/OT convergence, connecting Operations (Sensors, Devices, and Systems) with Business (People, Processes, and Applications), enabling automated data acquisition, data engineering, transformations, advanced analytics and ML. Handle large variety, volume, and velocity of industrial data with out-of-the-box support for all data types, most popular industrial network protocols, OT/lab systems, and databases. Easily automate the collection of data about your production runs, batches, parts, cycle-times, process parameters, asset condition, performance, health, utilities, consumables as well as operators and their performance. Designed for scale, Foghub offers a comprehensive set of capabilities to handle large volumes and velocity of data.
  • 14
    witboost

    witboost

    Agile Lab

    witboost is a modular, scalable, fast, efficient data management system for your company to truly become data driven, reduce time-to-market, it expenditures and overheads. witboost comprises a series of modules. These are building blocks that can work as standalone solutions to address and solve a single need or problem, or they can be combined to create the perfect data management ecosystem for your company. Each module improves a specific data engineering function and they can be combined to create the perfect solution to answer your specific needs, guaranteeing a blazingly fact and smooth implementation, thus dramatically reducing time-to-market, time-to-value and consequently the TCO of your data engineering infrastructure. Smart Cities need digital twins to predict needs and avoid unforeseen problems, gathering data from thousands of sources and managing ever more complex telematics.
  • 15
    DataSentics

    DataSentics

    DataSentics

    Making data science & machine learning have a real impact on organizations. We are an AI product studio, a group of 100 experienced data scientists and data engineers with a combination of experience both from the agile world of digital start-ups as well as major international corporations. We don’t end with nice slides and dashboards. The result that counts is an automated data solution in production integrated inside a real process. We do not report clickers but data scientists and data engineers. We have a strong focus on productionalizing data science solutions in the cloud with high standards of CI and automation. Building the greatest concentration of the smartest and most creative data scientists and engineers by being the most exciting and fulfilling place for them to work in Central Europe. Giving them the freedom to use our critical mass of expertise to find and iterate on the most promising data-driven opportunities, both for our clients and our own products.
  • 16
    Vaex

    Vaex

    Vaex

    At Vaex.io we aim to democratize big data and make it available to anyone, on any machine, at any scale. Cut development time by 80%, your prototype is your solution. Create automatic pipelines for any model. Empower your data scientists. Turn any laptop into a big data powerhouse, no clusters, no engineers. We provide reliable and fast data driven solutions. With our state-of-the-art technology we build and deploy machine learning models faster than anyone on the market. Turn your data scientist into big data engineers. We provide comprehensive training of your employees, enabling you to take full advantage of our technology. Combines memory mapping, a sophisticated expression system, and fast out-of-core algorithms. Efficiently visualize and explore big datasets, and build machine learning models on a single machine.
  • Previous
  • You're on page 1
  • Next