Search Results for "distributed computing"

Sort By:

Showing 1530 open source projects for "distributed computing"

View related business solutions

Save Up to 91% on Cloud Compute With Spot VMs
Automatic sustained-use discounts. One free VM per month. No negotiation needed.

Run batch jobs at 60-91% off with Spot VMs. Long-running workloads get automatic discounts with sustained use.

Try Free
Host LLMs in Production With On-Demand GPUs
NVIDIA L4 GPUs. 5-second cold starts. Scale to zero when idle.

Deploy your model, get an endpoint, pay only for compute time. No GPU provisioning or infrastructure management required.

Try Free
1

NumPy

The fundamental package for scientific computing with Python

Fast and versatile, the NumPy vectorization, indexing, and broadcasting concepts are the de-facto standards of array computing today. NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more. NumPy supports a wide range of hardware and computing platforms, and plays well with distributed, GPU, and sparse array libraries. The core of NumPy is well-optimized C code. Enjoy the flexibility of Python with the speed of compiled code. ...

Downloads: 104 This Week

Last Update: 2026-07-04
See Project
2

Zipkin

Distributed tracing system to gather timing data

Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in service architectures. Features include both the collection and lookup of this data. If you have a trace ID in a log file, you can jump directly to it. Otherwise, you can query based on attributes such as service, operation name, tags and duration. Some interesting data will be summarized for you, such as the percentage of time spent in a service, and whether or not operations...

Downloads: 18 This Week

Last Update: 2026-04-08
See Project
3

PowerJob

Enterprise job scheduling middleware with distributed computing

...Four execution modes are supported, including stand-alone, broadcast, Map and MapReduce. Distributed computing resources could be utilized in MapReduce mode, try the magic out here! Both job dependency management and data communications between jobs are supported. Developers can write their processors in Java, Shell, Python, and will subsequently support multilingual scheduling via HTTP.

Downloads: 2 This Week

Last Update: 2025-08-17
See Project
4

PolarDB-X

PolarDB-X is a cloud native distributed SQL Database

PolarDB-X is a cloud-native distributed SQL database designed to handle high concurrency, massive storage, and complex querying scenarios. It features a shared-nothing architecture that decouples computing from storage, providing scalability and flexibility for various applications.

Downloads: 2 This Week

Last Update: 2025-08-22
See Project
Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
5

Dagger.jl

A framework for out-of-core and parallel execution

Dagger.jl is a framework for out-of-core and parallel computing in Julia that allows users to construct and execute dynamic task graphs. It is designed for large-scale, distributed, and memory-efficient computations. Dagger supports lazy evaluation and scheduling across multiple threads or machines, enabling high-performance workflows for data processing, scientific computing, and machine learning.

Downloads: 6 This Week

Last Update: 5 days ago
See Project
6

Dask

Parallel computing with task scheduling

Dask is a Python library for parallel and distributed computing, designed to scale analytics workloads from single machines to large clusters. It integrates with familiar tools like NumPy, Pandas, and scikit-learn while enabling execution across cores or nodes with minimal code changes. Dask excels at handling large datasets that don’t fit into memory and is widely used in data science, machine learning, and big data pipelines.

Downloads: 9 This Week

Last Update: 2026-07-13
See Project
7

ShardingSphere

Distributed database ecosphere

Apache ShardingSphere is an open-source ecosystem consisted of a set of distributed database solutions, including 3 independent products, JDBC, Proxy & Sidecar (Planning). They all provide functions of data scale out, distributed transaction and distributed governance, applicable in a variety of situations such as Java isomorphism, heterogeneous language and cloud native. Apache ShardingSphere aiming at reasonably making full use of the computation and storage capacity of existed database in...

Downloads: 2 This Week

Last Update: 2026-02-23
See Project
8

BOINC

Open-source software for volunteer computing and grid computing

BOINC (Berkeley Open Infrastructure for Network Computing) is an open-source platform that enables distributed computing using volunteered computer resources. It allows researchers to harness massive amounts of processing power from public participants for scientific projects such as climate research, disease modeling, and astrophysics. BOINC supports cross-platform deployment and is backed by a large, active community.

Downloads: 2 This Week

Last Update: 2026-06-12
See Project
9

fugue

A unified interface for distributed computing

Fugue is a unified interface for distributed computing that lets users execute Python, Pandas, and SQL code on Spark, Dask, and Ray with minimal rewrites.

Downloads: 1 This Week

Last Update: 2026-02-20
See Project
Build Securely on AWS with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
10

JupyterLab

JupyterLab computational environment

JupyterLab is the next-generation web-based user interface for Project Jupyter. Try it on Binder. JupyterLab follows the Jupyter Community Guides. JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner. You can arrange multiple documents and activities side by side in the work area using tabs and splitters. Documents and activities integrate with each other,...

Downloads: 29 This Week

Last Update: 2026-07-21
See Project
11

Pholcus

Distributed high-concurrency crawler software written in pure golang

Pholcus is a high-concurrency crawler software written in pure Go language that supports distributed, only used for programming learning and research. It supports three operating modes of stand-alone, server and client, and has three operating interfaces, Web, GUI, and command line; simple and flexible rules, concurrent batch tasks, and rich output methods (mysql/mongodb/kafka/csv/excel, etc.); In addition, it also supports horizontal and vertical grabbing modes, and a series of advanced...

Downloads: 1 This Week

Last Update: 2026-03-03
See Project
12

HDF5

Official HDF5® Library Repository

HDF5 (Hierarchical Data Format v5) is a widely-used data management library and file format for storing large and complex scientific data sets efficiently.

Downloads: 11 This Week

Last Update: 15 hours ago
See Project
13

Infinispan

Infinispan is an open source data grid platform

Infinispan is a distributed in-memory data grid and caching system designed for high-performance computing. It allows applications to scale dynamically by distributing data across multiple nodes, reducing latency and improving resilience.

Downloads: 1 This Week

Last Update: 2026-07-01
See Project
14

EdgeChains

EdgeChains.js is Full-Stack GenAI library

EdgeChains.js is a full-stack generative AI library that provides front-end, back-end, APIs, prompt management, and distributed computing capabilities, with core prompts and chains managed declaratively in Jsonnet. At EdgeChains, we take a unique approach to Generative AI - we think Generative AI is a deployment and configuration management challenge rather than a UI and library design pattern challenge. We build on top of a tech that has solved this problem in a different domain - Kubernetes Config Management - and bring that to Generative AI. ...

Downloads: 1 This Week

Last Update: 2025-01-29
See Project
15

Parallax

Parallax is a distributed model serving framework

Parallax is a decentralized inference framework designed to run large language models across distributed computing resources. Instead of relying on centralized GPU clusters in data centers, the system allows multiple heterogeneous machines to collaborate in serving AI inference workloads. Parallax divides model layers across different nodes and dynamically coordinates them to form a complete inference pipeline. A two-stage scheduling architecture determines how model layers are allocated to available hardware and how requests are routed across nodes during execution. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
16

Kraken

P2P Docker registry capable of distributing TBs of data in seconds

Kraken is a P2P-powered Docker registry that focuses on scalability and availability. It is designed for Docker image management, replication, and distribution in a hybrid cloud environment. With pluggable backend support, Kraken can easily integrate into existing Docker registry setups as the distribution layer. Kraken has been in production at Uber since early 2018. In our busiest cluster, Kraken distributes more than 1 million blobs per day, including 100k 1G+ blobs. At its peak...

Downloads: 2 This Week

Last Update: 2026-07-03
See Project
17

Bacalhau

Community-driven, simple, yet powerful framework

Bacalhau is a decentralized compute platform for running jobs on data stored across distributed networks, like IPFS or Filecoin, without moving the data to centralized cloud environments. It allows developers to run containerized workloads close to where the data lives, reducing latency, cost, and privacy risks. Bacalhau supports various runtime environments and is designed to make decentralized data processing as accessible as traditional cloud computing.

Downloads: 14 This Week

Last Update: 2026-07-19
See Project
18

sparklyr

R interface for Apache Spark

sparklyr is an R package that provides seamless interfacing with Apache Spark clusters—either local or remote—while letting users write code in familiar R paradigms. It supplies a dplyr-compatible backend, Spark machine learning pipelines, SQL integration, and I/O utilities to manipulate and analyze large datasets distributed across cluster environments.

Downloads: 1 This Week

Last Update: 2026-06-19
See Project
19

Apache SeaTunnel

SeaTunnel is a distributed, high-performance data integration platform

...Data synchronization needs to support various synchronization scenarios such as offline-full synchronization, offline-incremental synchronization, CDC, real-time synchronization, and full database synchronization. Existing data integration and data synchronization tools often require vast computing resources or JDBC connection resources to complete real-time synchronization of massive small tables.

Downloads: 3 This Week

Last Update: 2026-02-18
See Project
20

Kubeflow Trainer

Distributed AI Model Training and LLM Fine-Tuning on Kubernetes

...The platform supports a wide range of machine learning frameworks, including PyTorch, JAX, Hugging Face, DeepSpeed, and XGBoost, making it highly flexible for different AI use cases. One of its key innovations is the integration of MPI-based distributed computing within Kubernetes, allowing efficient communication between nodes for high-performance training. It also includes advanced scheduling capabilities through integrations with tools like Kueue and Volcano, enabling topology-aware resource allocation and multi-cluster job orchestration.

Downloads: 0 This Week

Last Update: 2026-06-17
See Project
21

Tau

Open source distributed Platform as a Service (PaaS)

tau is the core runtime and orchestration engine of the Taubyte platform, an event-driven, distributed computing framework for building and running decentralized applications. tau handles the dynamic deployment of code, services, and data across edge and cloud environments based on real-time events. It abstracts infrastructure and simplifies application delivery by combining GitOps principles with a secure, multi-tenant execution model. tau enables seamless scalability, event-based routing, and on-demand execution without managing underlying servers.

Downloads: 3 This Week

Last Update: 2026-04-22
See Project
22

MetaCall Core

The ultimate polyglot programming experience

A polyglot runtime that enables seamless execution of multiple programming languages within the same environment, improving interoperability between different codebases.

Downloads: 5 This Week

Last Update: 6 days ago
See Project
23

Numba

NumPy aware dynamic Python compiler using LLVM

...Special decorators can create universal functions that broadcast over NumPy arrays just like NumPy functions do. Numba also works great with Jupyter notebooks for interactive computing, and with distributed execution frameworks, like Dask and Spark.

Downloads: 9 This Week

Last Update: 2026-07-01
See Project
24

Ubicloud

Open source alternative to AWS. Elastic compute, block storage

Ubicloud is an open-source cloud platform that aims to provide a decentralized alternative to traditional hyperscale cloud providers. It focuses on building a federated network of providers where individuals and organizations can contribute infrastructure and offer compute, storage, and networking resources. Ubicloud emphasizes transparency and openness: APIs, orchestration, and management layers are open, enabling users to audit and customize their infrastructure instead of relying on...

Downloads: 0 This Week

Last Update: 5 days ago
See Project
25

Datahike

A durable Datalog implementation adaptable for distribution

Datahike is a durable Datalog database powered by an efficient Datalog query engine. This project started as a port of DataScript to the hitchhiker-tree. All DataScript tests are passing, but we are still working on the internals. Having said this we consider Datahike usable for medium sized projects, since DataScript is very mature and deployed in many applications and the hitchhiker-tree implementation is heavily tested through generative testing. We are building on the two projects and...

Downloads: 0 This Week

Last Update: 6 hours ago
See Project