spark gap linux free download

Showing 48 open source projects for "spark gap linux"

View related business solutions

Business Windows Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

.NET for Apache Spark

A free, open-source, and cross-platform big data analytics framework

...This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer. .NET for Apache Spark runs on Windows, Linux, and macOS using .NET Core, or Windows using .NET Framework. It also runs on all major cloud providers including Azure HDInsight Spark, Amazon EMR Spark, AWS & Azure Databricks.

Downloads: 1 This Week

Last Update: 2026-02-13
See Project
2

Cassandra Spark Connector

Apache Spark to Apache Cassandra connector

The Apache Cassandra Spark Connector allows Spark jobs (RDDs or DataFrames/Datasets) to read from and write to Cassandra tables. Compatible with Apache Cassandra (v2.1+), Spark 1.0–3.5, and Scala 2.11–2.13, it supports mapping Cassandra rows to Scala case classes, saving results back to Cassandra, and executing arbitrary CQL within Spark applications.

Downloads: 0 This Week

Last Update: 2025-08-04
See Project
3

SageMaker Spark Container

Docker image used to run data processing workloads

Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. The SageMaker Spark Container is a Docker image used to run batch data...

Downloads: 0 This Week

Last Update: 2026-04-22
See Project
4

sparklyr

R interface for Apache Spark

sparklyr is an R package that provides seamless interfacing with Apache Spark clusters—either local or remote—while letting users write code in familiar R paradigms. It supplies a dplyr-compatible backend, Spark machine learning pipelines, SQL integration, and I/O utilities to manipulate and analyze large datasets distributed across cluster environments.

Downloads: 2 This Week

Last Update: 2026-04-17
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

Synapse Machine Learning

Simple and distributed Machine Learning

SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit,...

Downloads: 0 This Week

Last Update: 2026-04-04
See Project
6

XGBoost

Scalable and Flexible Gradient Boosting

XGBoost is an optimized distributed gradient boosting library, designed to be scalable, flexible, portable and highly efficient. It supports regression, classification, ranking and user defined objectives, and runs on all major operating systems and cloud platforms. XGBoost works by implementing machine learning algorithms under the Gradient Boosting framework. It also offers parallel tree boosting (GBDT, GBRT or GBM) that can quickly and accurately solve many data science problems....

Downloads: 2 This Week

Last Update: 2026-02-10
See Project
7

Bytewax

Python Stream Processing

Bytewax is a Python framework that simplifies event and stream processing. Because Bytewax couples the stream and event processing capabilities of Flink, Spark, and Kafka Streams with the friendly and familiar interface of Python, you can re-use the Python libraries you already know and love. Connect data sources, run stateful transformations, and write to various downstream systems with built-in connectors or existing Python libraries. Bytewax is a Python framework and Rust distributed...

Downloads: 0 This Week

Last Update: 2024-11-25
See Project
8

Population Shift Monitoring

Monitor the stability of a Pandas or Spark dataframe

popmon is a package that allows one to check the stability of a dataset. popmon works with both pandas and spark datasets. popmon creates histograms of features binned in time-slices, and compares the stability of the profiles and distributions of those histograms using statistical tests, both over time and with respect to a reference. It works with numerical, ordinal, categorical features, and the histograms can be higher-dimensional, e.g. it can also track correlations between any two...

Downloads: 0 This Week

Last Update: 2026-01-09
See Project
9

Apache Polaris

Apache Polaris, the interoperable, open source catalog

Apache Polaris is an open-source metadata catalog and data management service designed to manage Apache Iceberg tables in modern data lakehouse environments. It provides a centralized catalog that allows multiple compute engines and analytics systems to interact with the same datasets through a standardized interface. By implementing the Iceberg REST catalog API, Polaris enables distributed data platforms to access shared table metadata without tightly coupling storage systems and query...

Downloads: 1 This Week

Last Update: 2026-04-21
See Project
Add Two Lines of Code. Get Full APM.
AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.

Start Free
10

Explorer

Series (one-dimensional) and dataframes (two-dimensional)

Explorer brings series (one-dimensional) and data frames (two-dimensional) to Elixir for fast data exploration.

Downloads: 0 This Week

Last Update: 2025-08-17
See Project
11

Scio

A Scala API for Apache Beam and Google Cloud Dataflow

Scio is a Scala API developed by Spotify that builds on Apache Beam to enable expressive batch and streaming data pipelines, optimized for running on Google Cloud Dataflow. Inspired by Spark and Scalding, it provides scalable, type‑safe, and production-grade data processing, with built-in support for BigQuery, Pub/Sub, Cassandra, Elasticsearch, Redis, TensorFlow IO, and more.

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
12

IoTDB

Apache IoTDB

Apache IoTDB (Database for Internet of Things) is an IoT native database with high performance for data management and analysis, deployable on the edge and the cloud. Due to its light-weight architecture, high performance and rich feature set together with its deep integration with Apache Hadoop, Spark and Flink, Apache IoTDB can meet the requirements of massive data storage, high-speed data ingestion and complex data analysis in the IoT industrial fields. In the scene of factories, there...

Downloads: 0 This Week

Last Update: 2026-04-14
See Project
13

Genie

Distributed Big Data Orchestration Service

Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.

Downloads: 0 This Week

Last Update: 2025-08-05
See Project
14

Dolphin Scheduler

A distributed and extensible workflow scheduler platform

Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available `out of the box`. Dedicated to solving the complex task dependencies in data processing, making the scheduler system out of the box for data processing. Decentralized multi-master and multi-worker, HA is supported by itself, overload processing. All process...

Downloads: 1 This Week

Last Update: 2026-03-01
See Project
15

visual-explainer

Agent skill + prompt templates that generate rich HTML pages

visual-explainer is an AI-oriented agent skill that converts complex terminal or analytical output into polished, human-readable HTML reports designed for quick comprehension and sharing. The project includes prompt templates and automation logic that enable coding agents to generate visual summaries such as diff reviews, architecture overviews, plan audits, and structured data tables. Its primary goal is to bridge the readability gap between raw machine output and stakeholder-friendly...

Downloads: 2 This Week

Last Update: 3 days ago
See Project
16

Koordinator

A QoS-based scheduling system brings optimal layout and status to work

Koordinator is a modern scheduling system that colocates microservices, AI, and big data workloads on Kubernetes. It achieves high utilization by combining elastic resource quota, efficient pod-packing, over-commitment, and resource sharing with container resource isolation. Koordinator is high-performance, scalable, yet most importantly, proven in mass production environments. It allows you to build container orchestration systems that support enterprise production environments. Koordinator...

Downloads: 0 This Week

Last Update: 2026-04-16
See Project
17

Cucumber

Cucumber for Ruby

It’s simple. Whether open source or commercial, our collaboration tools will boost your engineering team's performance by employing Behavior-Driven Development (BDD). And with our world-class training, take it to places it’s never been. Cucumber is a tool for running automated tests written in plain language. Because they're written in plain language, they can be read by anyone on your team. Because they can be read by anyone, you can use them to help improve communication, collaboration and...

Downloads: 3 This Week

Last Update: 2026-04-14
See Project
18

Durable Streams

The open protocol for real-time sync to client applications

Durable Streams is an open protocol and reference implementation designed to standardize reliable, resumable, real-time streaming between servers and client applications using simple HTTP semantics, filling a gap left by ephemeral technologies like WebSockets and traditional SSE. It defines an append-only, offset-addressable stream primitive where each stream is mapped to a URL that clients can read from or tail, supporting catch-up reads, historical replay, and live updates with robust...

Downloads: 0 This Week

Last Update: 2026-04-14
See Project
19

UVdesk Open Source

Build and make a full ticketing support system

Uvdesk community helpdesk project skeleton packaged along with the bare essential utilities and tools to build and customize your own helpdesk solutions. Build on top of Symfony and backbone.js, UVdesk community is a service-oriented, event-driven extensible open-source helpdesk system that can be used by your organization to provide efficient support to your clients effortlessly whichever way you imagine. At the heart of the helpdesk system, the core framework consists of all the necessary...

Downloads: 4 This Week

Last Update: 2025-09-19
See Project
20

beautiful-mermaid

Render Mermaid diagrams as beautiful SVGs or ASCII art

beautiful-mermaid is a styling and rendering toolkit built to produce visually enhanced diagrams from Mermaid syntax, aiming to bridge the gap between simple technical diagrams and rich, presentation-ready visualizations, all while preserving the lightweight text-to-diagram workflow that Mermaid offers. Instead of plain, utilitarian shapes and lines, Beautiful Mermaid applies themes, typography enhancements, color palettes, and layout optimizations so diagrams look polished and professional...

Downloads: 0 This Week

Last Update: 2026-02-26
See Project
21

F1 Race Replay

An interactive Formula 1 race visualisation and data analysis tool

F1 Race Replay is an interactive replay viewer that lets users watch and analyze recorded Formula 1 race sessions with precise control over camera angles, timing, and telemetry overlay, offering a rich experience beyond standard broadcast replays. It ingests official timing and positional data, then renders vehicle movements through track maps and 3D visualizations so fans, analysts, and engineers can review strategy, overtakes, tire degradation effects, and pit stop impacts in detail. Users...

Downloads: 0 This Week

Last Update: 2026-04-17
See Project
22

HugeGraph

A graph database that supports more than 100+ billion data

HugeGraph is a convenient, efficient, and adaptable graph database compatible with the Apache TinkerPop3 framework and the Gremlin query language. HugeGraph supports fast import performance in the case of more than 10 billion Vertices and Edges Graph, millisecond-level OLTP query capability, and can be integrated into big data platforms like Hadoop or Spark for OLAP analysis. The main scenarios of HugeGraph include correlation search, fraud detection, and knowledge graph. Not only supports...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
23

geemap

A Python package for interactive geospaital analysis and visualization

A Python package for interactive geospatial analysis and visualization with Google Earth Engine. Geemap is a Python package for geospatial analysis and visualization with Google Earth Engine (GEE), which is a cloud computing platform with a multi-petabyte catalog of satellite imagery and geospatial datasets. During the past few years, GEE has become very popular in the geospatial community and it has empowered numerous environmental applications at local, regional, and global scales. GEE...

Downloads: 0 This Week

Last Update: 2026-03-20
See Project
24

leafmap

A Python package for interactive mapping and geospatial analysis

A Python package for geospatial analysis and interactive mapping in a Jupyter environment. Leafmap is a Python package for interactive mapping and geospatial analysis with minimal coding in a Jupyter environment. It is a spin-off project of the geemap Python package, which was designed specifically to work with Google Earth Engine (GEE). However, not everyone in the geospatial community has access to the GEE cloud computing platform. Leafmap is designed to fill this gap for non-GEE users. It...

Downloads: 0 This Week

Last Update: 1 day ago
See Project
25

json-scada

A portable SCADA/IoT platform centered on the MongoDB database server.

Standard IT tools applied to SCADA/IoT (MongoDB, PostgreSQL/TimescaleDB,Node.js, C#, Golang, Grafana, etc.). MongoDB as the real-time core database, persistence layer, config store, SOE historian. Portability and interoperability over Linux, Windows, x86/64, ARM. Horizontal scalability, from a single computer to big clusters (MongoDB-sharding), Bare Metal, Docker containers, VM, cloud, or hybrid deployments. Unlimited tags, servers, and users. HTML5 Web interface. UTF-8/I18N. Protocols:...

Downloads: 9 This Week

Last Update: 2026-03-22
See Project