apache free download - SourceForge

Showing 279 open source projects for "apache"

View related business solutions

Data Management Linux Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Context for your AI agents
Crawl websites, sync to vector databases, and power RAG applications. Pre-built integrations for LLM pipelines and AI assistants.

Build data pipelines that feed your AI models and agents without managing infrastructure. Crawl any website, transform content, and push directly to your preferred vector store. Use 10,000+ tools for RAG applications, AI assistants, and real-time knowledge bases. Monitor site changes, trigger workflows on new data, and keep your AIs fed with fresh, structured information. Cloud-native, API-first, and free to start until you need to scale.

Try for free
1

Apache HBase

Get random, realtime read/write access to your Big Data

Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables, billions of rows X millions of columns, atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. A Distributed Storage System for Structured Data by Chang et al.

Downloads: 11 This Week

Last Update: 2025-11-14
See Project
2

Apache DevLake

Apache DevLake is an open-source dev data platform

Apache DevLake is an open-source dev data platform that ingests, analyzes, and visualizes the fragmented data from DevOps tools to extract insights for engineering excellence, developer experience, and community growth. Apache DevLake is designed for developer teams looking to make better sense of their development process and to bring a more data-driven approach to their own practices.

Downloads: 5 This Week

Last Update: 2026-01-12
See Project
3

Apache SeaTunnel

SeaTunnel is a distributed, high-performance data integration platform

SeaTunnel is a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has been used in the production of nearly 100 companies. There are hundreds of commonly-used data sources of which versions are incompatible. With the emergence of new technologies, more data sources are appearing. It is difficult for users to find a tool that can...

Downloads: 1 This Week

Last Update: 2025-09-05
See Project
4

Apache RocketMQ

Distributed messaging and streaming platform with low latency

Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability. Messaging patterns including publish/subscribe, request/reply and streaming. Financial grade transactional message. Built-in fault tolerance and high availability configuration options base on DLedger.

Downloads: 1 This Week

Last Update: 2025-12-24
See Project
Desktop and Mobile Device Management Software
It's a modern take on desktop management that can be scaled as per organizational needs.

Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.

Learn More
5

Apache Seata

High-performance, open source distributed transaction solution

Seata is a distributed transaction solution for microservices that provides consistent, cross-service commits without forcing every team to adopt the same persistence model. Its architecture separates responsibilities into a global coordinator and per-service participants, so business services remain decoupled while transactions are orchestrated centrally. Multiple modes are supported—AT (automatic, SQL-based with undo logs), TCC (try-confirm-cancel), Saga (long-running compensation), and...

Downloads: 0 This Week

Last Update: 2025-09-06
See Project
6

Apache Hudi

Upserts, Deletes And Incremental Processing on Big Data

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. ...

Downloads: 0 This Week

Last Update: 2025-12-18
See Project
7

Apache InLong

Apache InLong - a one-stop integration framework for massive data

Apache InLong is a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities. InLong supports both batch and stream data processing at the same time, which offers great power to build data analysis, modeling and other real-time applications based on streaming data. InLong (应龙) is a divine beast in Chinese mythology who guides the river into the sea, and it is regarded as a metaphor of the InLong system for reporting data streams. ...

Downloads: 0 This Week

Last Update: 2025-11-13
See Project
8

Apache Doris

MPP-based interactive SQL data warehousing for reporting and analysis

Apache Doris is a modern MPP analytical database product. It can provide sub-second queries and efficient real-time data analysis. With it's distributed architecture, up to 10PB level datasets will be well supported and easy to operate. Apache Doris can meet various data analysis demands, including history data reports, real-time data analysis, interactive data analysis, and exploratory data analysis.

Downloads: 0 This Week

Last Update: 2025-12-26
See Project
9

.NET for Apache Spark

A free, open-source, and cross-platform big data analytics framework

.NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations.

Downloads: 1 This Week

Last Update: 2025-05-13
See Project
The Original Buy Center Software.
Never Go To The Auction Again.

VAN sources private-party vehicles from over 20 platforms and provides all necessary tools to communicate with sellers and manage opportunities. Franchise and Independent dealers can boost their buy center strategies with our advanced tools and an experienced Acquisition Coaching™ team dedicated to your success.

Learn More
10

Apache Airflow Provider

Great Expectations Airflow operator

Due to apply_default decorator removal, this version of the provider requires Airflow 2.1.0+. If your Airflow version is 2.1.0, and you want to install this provider version, first upgrade Airflow to at least version 2.1.0. Otherwise, your Airflow package version will be upgraded automatically, and you will have to manually run airflow upgrade db to complete the migration. This operator currently works with the Great Expectations V3 Batch Request API only. If you would like to use the...

Downloads: 0 This Week

Last Update: 2025-02-10
See Project
11

CyberChef

A web app for encryption, encoding, compression and data analysis

CyberChef, developed by GCHQ, is a versatile web application dubbed the "Cyber Swiss Army Knife." It enables users to perform a wide array of operations on data, including encryption, encoding, compression, and analysis, all within a browser interface.

Downloads: 32 This Week

Last Update: 2025-04-23
See Project
12

IoTDB

Apache IoTDB

Apache IoTDB (Database for Internet of Things) is an IoT native database with high performance for data management and analysis, deployable on the edge and the cloud. Due to its light-weight architecture, high performance and rich feature set together with its deep integration with Apache Hadoop, Spark and Flink, Apache IoTDB can meet the requirements of massive data storage, high-speed data ingestion and complex data analysis in the IoT industrial fields.

Downloads: 3 This Week

Last Update: 3 days ago
See Project
13

Superset

Apache Superset is a data visualization and data exploration platform

Apache Superset is a modern data exploration and visualization platform. Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts. Quickly and easily integrate and explore your data, using either our simple no-code viz builder or state-of-the-art SQL IDE.

Downloads: 16 This Week

Last Update: 4 days ago
See Project
14

Arrow Julia

Official Julia implementation of Apache Arrow

This is a pure Julia implementation of the Apache Arrow data standard. This package provides Julia AbstractVector objects for referencing data that conforms to the Arrow standard. This allows users to seamlessly interface Arrow formatted data with a great deal of existing Julia code.

Downloads: 0 This Week

Last Update: 2026-01-14
See Project
15

Cassandra Spark Connector

Apache Spark to Apache Cassandra connector

The Apache Cassandra Spark Connector allows Spark jobs (RDDs or DataFrames/Datasets) to read from and write to Cassandra tables. Compatible with Apache Cassandra (v2.1+), Spark 1.0–3.5, and Scala 2.11–2.13, it supports mapping Cassandra rows to Scala case classes, saving results back to Cassandra, and executing arbitrary CQL within Spark applications.

Downloads: 0 This Week

Last Update: 2025-08-04
See Project
16

Scio

A Scala API for Apache Beam and Google Cloud Dataflow

Scio is a Scala API developed by Spotify that builds on Apache Beam to enable expressive batch and streaming data pipelines, optimized for running on Google Cloud Dataflow. Inspired by Spark and Scalding, it provides scalable, type‑safe, and production-grade data processing, with built-in support for BigQuery, Pub/Sub, Cassandra, Elasticsearch, Redis, TensorFlow IO, and more.

Downloads: 0 This Week

Last Update: 5 days ago
See Project
17

Greenplum Database

Massive parallel data platform for analytics, machine learning and AI

Rapidly create and deploy models for complex applications in cybersecurity, predictive maintenance, risk management, fraud detection, and many other areas. With its unique cost-based query optimizer designed for large-scale data workloads, Greenplum scales interactive and batch-mode analytics to large datasets in the petabytes without degrading query performance and throughput. Based on PostgreSQL, Greenplum provides you with more control over the software you deploy, reducing vendor...

Downloads: 21 This Week

Last Update: 2024-03-08
See Project
18

Akka

Build concurrent, distributed, and resilient message-driven apps

Build powerful reactive, concurrent, and distributed applications more easily. Akka is a toolkit for building highly concurrent, distributed, and resilient message-driven applications for Java and Scala. Actors and Streams let you build systems that scale up, using the resources of a server more efficiently, and out, using multiple servers. Building on the principles of The Reactive Manifesto Akka allows you to write systems that self-heal and stay responsive in the face of failures. Up to...

Downloads: 6 This Week

Last Update: 2026-01-13
See Project
19

sparklyr

R interface for Apache Spark

sparklyr is an R package that provides seamless interfacing with Apache Spark clusters—either local or remote—while letting users write code in familiar R paradigms. It supplies a dplyr-compatible backend, Spark machine learning pipelines, SQL integration, and I/O utilities to manipulate and analyze large datasets distributed across cluster environments.

Downloads: 0 This Week

Last Update: 2025-10-03
See Project
20

Nightingale Visualization

An all-in-one observability solution

An all-in-one observability solution that aims to combine the advantages of Prometheus and Grafana. It manages alert rules and visualizes metrics, logs, and traces in a beautiful web UI.

Downloads: 3 This Week

Last Update: 2026-01-16
See Project
21

Logstash

Centralize, transform and stash your data

Logstash is a server-side data processing pipeline that dynamically ingests data from numerous sources, transforms it, and ships it to your favorite “stash” regardless of format or complexity. It supports and ingests data of all shapes, sizes and sources, dynamically transforms and prepares this data, and transports it to the output of your choice. Logstash is extensible, with over 200 plugins available to let you create and configure your pipeline how you choose.

Downloads: 6 This Week

Last Update: 2026-01-13
See Project
22

Dolphin Scheduler

A distributed and extensible workflow scheduler platform

Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available `out of the box`. Dedicated to solving the complex task dependencies in data processing, making the scheduler system out of the box for data processing.

Downloads: 3 This Week

Last Update: 2026-01-13
See Project
23

DataEase

Data visualization analysis tool

...DataEase is an open-source data visualization analysis tool that helps users quickly analyze data and gain insight into business trends, so as to achieve business improvement and optimization. DataEase supports rich data source connections, can quickly create charts by dragging and dropping, and can easily share with others. Supports rich chart types (Apache ECharts / AntV), supports drag-and-drop method to quickly create dashboards. Support direct connection mode, local mode (based on Apache Doris / Kettle implementation). Support various data sources such as data warehouse/data lake, OLAP database, OLTP database, Excel data file, API, etc. Open source and open: zero threshold, quick access and installation online; quick access to user feedback, new versions released monthly. pport multiple data sharing methods to ensure data security.

Downloads: 15 This Week

Last Update: 4 days ago
See Project
24

Kibana

Your window into the Elastic Stack

Kibana is a analytics and search dashboard for Elasticsearch that allows you to visualize Elasticsearch data and efficiently navigate the Elastic Stack. With Kibana you can visualize and shape your data simply and intuitively, share visualizations for greater collaboration, organize dashboards and visualizations, and so much more.

Downloads: 6 This Week

Last Update: 2026-01-13
See Project
25

Numaflow

Kubernetes-native platform to run massively parallel data/streaming

Numaflow is a Kubernetes-native tool for running massively parallel stream processing. A Numaflow Pipeline is implemented as a Kubernetes custom resource and consists of one or more source, data processing, and sink vertices. Numaflow installs in a few minutes and is easier and cheaper to use for simple data processing applications than a full-featured stream processing platform.

Downloads: 8 This Week

Last Update: 2026-01-18
See Project