Apache Spark

Apache Spark

Apache Software Foundation
dbt

dbt

dbt Labs
+
+
Visit Website

About

Apache Spark™ is a unified analytics engine for large-scale data processing. Apache Spark achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud. It can access diverse data sources. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.

About

dbt helps data teams transform raw data into trusted, analysis-ready datasets faster. With dbt, data analysts and data engineers can collaborate on version-controlled SQL models, enforce testing and documentation standards, lean on detailed metadata to troubleshoot and optimize pipelines, and deploy transformations reliably at scale. Built on modern software engineering best practices, dbt brings transparency and governance to every step of the data transformation workflow. Thousands of companies, from startups to Fortune 500 enterprises, rely on dbt to improve data quality and trust as well as drive efficiencies and reduce costs as they deliver AI-ready data across their organization. Whether you’re scaling data operations or just getting started, dbt empowers your team to move from raw data to actionable analytics with confidence.

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Audience

Organizations that want a unified analytics engine for large-scale data processing

Audience

SQL users looking for a ETL solution to engineer data transformations

Support

Phone Support
24/7 Live Support
Online

Support

Phone Support
24/7 Live Support
Online

API

Offers API

API

Offers API

Screenshots and Videos

Screenshots and Videos

Pricing

No information available.
Free Version
Free Trial

Pricing

$100 per user/ month
Free Version
Free Trial

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Reviews/Ratings

Training

Documentation
Webinars
Live Online
In Person

Training

Documentation
Webinars
Live Online
In Person

Company Information

Apache Software Foundation
Founded: 1999
United States
spark.apache.org

Company Information

dbt Labs
Founded: 2016
United States
www.getdbt.com

Alternatives

dbt

dbt

dbt Labs

Alternatives

AWS Glue

AWS Glue

Amazon

Categories

Categories

dbt powers the transformation layer of modern data pipelines. Once data has been ingested into a warehouse or lakehouse, dbt enables teams to clean, model, and document it so it’s ready for analytics and AI. With dbt, teams can: - Transform raw data at scale with SQL and Jinja. - Orchestrate pipelines with built-in dependency management and scheduling. - Ensure trust with automated testing and continuous integration. - Visualize lineage across models and columns for faster impact analysis. By embedding software engineering practices into pipeline development, dbt helps data teams build reliable, production-grade pipelines to accelerate time to insight, and deliver AI-ready data.

dbt brings rigor and scalability to data preparation by enabling teams to clean, transform, and structure raw data directly in the warehouse. Instead of siloed spreadsheets or manual workflows, dbt uses SQL and software engineering best practices to make data preparation reliable, repeatable, and collaborative. With dbt, teams can: - Clean and standardize data with reusable, version-controlled models. - Apply business logic consistently across all datasets. - Validate outputs through automated tests before data is exposed to analysts. - Document and share context so every prepared dataset comes with lineage and definitions. By treating data preparation as code, dbt ensures that prepared datasets aren’t just quick fixes — they’re trusted, governed, and production-ready assets that scale with the business.

ETL

dbt modernizes the “T” in ETL: Transformation. Instead of relying on legacy pipelines or black-box transformations, dbt empowers data teams to build, test, and document transformations directly inside the data warehouse or lakehouse. With dbt, teams can: - Transform raw data into analytics-ready models using SQL and Jinja. - Ensure reliability with built-in testing, version control, and CI/CD. - Standardize workflows across teams with reusable models and shared documentation. - Leverage modern platforms like Snowflake, Databricks, BigQuery, and Redshift for scalable transformation. By focusing on the transformation layer, dbt helps organizations shorten pipeline development cycles, reduce data debt, and deliver trusted insights faster — complementing ingestion and loading tools in a modern ELT stack.

Streaming Analytics Features

Data Enrichment
Data Wrangling / Data Prep
Multiple Data Source Support
Process Automation
Real-time Analysis / Reporting
Visualization Dashboards

Big Data Features

Collaboration
Data Blends
Data Cleansing
Data Mining
Data Visualization
Data Warehousing
High Volume Processing
No-Code Sandbox
Predictive Analytics
Templates

Data Lineage Features

Database Change Impact Analysis
Filter Lineage Links
Implicit Connection Discovery
Lineage Object Filtering
Object Lineage Tracing
Point-in-Time Visibility
User/Client/Target Connection Visibility
Visual & Text Lineage View

Data Preparation Features

Collaboration Tools
Data Access
Data Blending
Data Cleansing
Data Governance
Data Mashup
Data Modeling
Data Transformation
Machine Learning
Visual User Interface

ETL Features

Data Analysis
Data Filtering
Data Quality Control
Job Scheduling
Match & Merge
Metadata Management
Non-Relational Transformations
Version Control

Integrations

Azure Marketplace
DQOps
Dagster
DataHub
Databricks Data Intelligence Platform
Flyte
Kestra
Sifflet
Union Cloud
VeloDB
definity
Apache Doris
Apache Iceberg
Apache Kylin
Baidu AI Cloud Stream Computing
E2E Cloud
Hue
ModelOp
SQL
eQube®-DaaS

Integrations

Azure Marketplace
DQOps
Dagster
DataHub
Databricks Data Intelligence Platform
Flyte
Kestra
Sifflet
Union Cloud
VeloDB
definity
Apache Doris
Apache Iceberg
Apache Kylin
Baidu AI Cloud Stream Computing
E2E Cloud
Hue
ModelOp
SQL
eQube®-DaaS
Claim Apache Spark and update features and information
Claim Apache Spark and update features and information