Showing 170 open source projects for "spark"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    IoTDB

    IoTDB

    Apache IoTDB

    Apache IoTDB (Database for Internet of Things) is an IoT native database with high performance for data management and analysis, deployable on the edge and the cloud. Due to its light-weight architecture, high performance and rich feature set together with its deep integration with Apache Hadoop, Spark and Flink, Apache IoTDB can meet the requirements of massive data storage, high-speed data ingestion and complex data analysis in the IoT industrial fields. In the scene of factories, there are tens of devices under LAN network. IoTDB can be installed on a local controller server in the factory to receive data from those devices. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    Apache Polaris

    Apache Polaris

    Apache Polaris, the interoperable, open source catalog

    ...By implementing the Iceberg REST catalog API, Polaris enables distributed data platforms to access shared table metadata without tightly coupling storage systems and query engines. This design allows organizations to run queries on the same Iceberg tables using tools such as Apache Spark, Flink, Trino, and other analytics engines while maintaining consistency across platforms. Polaris also focuses on data governance, security, and interoperability within large-scale cloud data architectures. Because Iceberg tables often exist across many services in a distributed ecosystem, the catalog helps coordinate metadata, schemas, and access policies in a unified system.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Serverless Java container

    Serverless Java container

    A Java wrapper to run Spring, Spring Boot, Jersey, and other apps

    The AWS Serverless Java Container library is a framework that allows developers to run existing or new Java web applications—built with frameworks such as Spring, Jersey, Spark, Struts—inside AWS Lambda with minimal modifications. It bridges the gap between traditional servlet or web-framework models and serverless functions by mapping HTTP events from API Gateway into requests your framework understands and routing responses back appropriately. This means you can keep much of your familiar Java-based architecture (controllers, filters, dependency injection) and deploy it in a serverless environment without rewriting everything from scratch. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    ChatALL

    ChatALL

    Concurrently chat with ChatGPT, Bing Chat, Bard, Alpaca, Vincuna, etc.

    Concurrently chat with ChatGPT, Bing Chat, bard, Alpaca, Vincuna, Claude, ChatGLM, MOSS, iFlytek Spark, ERNIE and more, discover the best answers. Large Language Models (LLMs) based AI bots are amazing. However, their behavior can be random and different bots excel at different tasks. If you want the best experience, don't try them one by one. ChatALL (Chinese name: 齐叨) can send prompt to several AI bots concurrently, help you to discover the best results.
    Downloads: 8 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    mlforecast

    mlforecast

    Scalable machine learning for time series forecasting

    ...It supports multi-series forecasting, meaning you can train one model that forecasts many time series at once (common in retail, demand forecasting, etc.), rather than one model per series. The library is built to scale: behind the scenes, it can leverage distributed computing frameworks (Spark, Dask, Ray) when datasets or the number of series grow large.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Explorer

    Explorer

    Series (one-dimensional) and dataframes (two-dimensional)

    Explorer brings series (one-dimensional) and data frames (two-dimensional) to Elixir for fast data exploration.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Laravel Lang

    Laravel Lang

    List of 126 languages for Laravel Framework, Laravel Jetstream, etc.

    List of 126 languages for Laravel Framework, Laravel Jetstream, Laravel Fortify, Laravel Breeze, Laravel Cashier, Laravel Nova, Laravel Spark and Laravel UI. It is recommended to use this particular package as it will allow you to very quickly update all the necessary dependencies that ensure application localization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Dataproc Templates

    Dataproc Templates

    Dataproc templates and pipelines for solving simple in-cloud data task

    Dataproc templates are designed to address various in-cloud data tasks, including data import/export/backup/restore and bulk API operations. These templates leverage the power of Google Cloud's Dataproc, supporting both Dataproc Serverless and Dataproc clusters. Google provides this collection of pre-implemented Dataproc templates as a reference and for easy customization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    SQL Formatter

    SQL Formatter

    A whitespace formatter for different query languages

    ...It supports various SQL dialects: GCP BigQuery, IBM DB2, Apache Hive, MariaDB, MySQL, Couchbase N1QL, Oracle PL/SQL, PostgreSQL, Amazon Redshift, SingleStoreDB, Snowflake, Spark, SQL Server Transact-SQL, Trino/Presto. See language option docs for more details. The CLI tool will be installed under sql-formatter and may be invoked via npx sql-formatter. If you don't use a module bundler, clone the repository, run npm install and grab a file from /dist directory to use inside a script tag. This makes SQL Formatter available as a global variable window.sqlFormatter.
    Downloads: 5 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    Soot

    Soot

    Soot - A Java optimization framework

    Soot is a Java optimization framework. It provides four intermediate representations for analyzing and transforming Java bytecode. Baf: a streamlined representation of bytecode which is simple to manipulate. Jimple: a typed 3-address intermediate representation suitable for optimization. Shimple: an SSA variation of Jimple. Grimp: an aggregated version of Jimple suitable for decompilation and code inspection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Genie

    Genie

    Distributed Big Data Orchestration Service

    Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    MLflow

    MLflow

    Open source platform for the machine learning lifecycle

    MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud).
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    Scio

    Scio

    A Scala API for Apache Beam and Google Cloud Dataflow

    Scio is a Scala API developed by Spotify that builds on Apache Beam to enable expressive batch and streaming data pipelines, optimized for running on Google Cloud Dataflow. Inspired by Spark and Scalding, it provides scalable, type‑safe, and production-grade data processing, with built-in support for BigQuery, Pub/Sub, Cassandra, Elasticsearch, Redis, TensorFlow IO, and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Apache Beam

    Apache Beam

    Unified programming model for Batch and Streaming

    Apache Beam is an open source, unified programming model to define both batch and streaming data-parallel processing pipelines, as well as certain language-specific SDKs for constructing pipelines and Runners. These pipelines are executed on one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Beam is especially useful for Embarrassingly Parallel data processing tasks, and caters to the different needs and backgrounds of end users, SDK writers and runner writers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    go-chart

    go-chart

    go chart is a basic charting library in go

    Package chart is a very simple golang native charting library that supports time-series and continuous line charts. Master should now be on the v3.x codebase, which overhauls the api significantly. Per usual, see examples for more information. Actual chart configurations and examples can be found in the ./examples/ directory. They are simple CLI programs that write to output.png (they are also updated with go generate. Everything on the chart.Chart object has defaults that can be overridden....
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    dtreeviz

    dtreeviz

    Python library for decision tree visualization & model interpretation

    A python library for decision tree visualization and model interpretation. Decision trees are the fundamental building block of gradient boosting machines and Random Forests(tm), probably the two most popular machine learning models for structured data. Visualizing decision trees is a tremendous aid when learning how these models work and when interpreting models. The visualizations are inspired by an educational animation by R2D3; A visual introduction to machine learning. Please see How to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    omegaml

    omegaml

    MLOps simplified. From ML Pipeline ⇨ Data Product without the hassle

    omega|ml is the innovative Python-native MLOps platform that provides a scalable development and runtime environment for your Data Products. Works from laptop to cloud.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Koordinator

    Koordinator

    A QoS-based scheduling system brings optimal layout and status to work

    ...Koordinator provides a range of options for customizing scheduling policies, allowing users to fine-tune the behavior of the system to suit their specific needs, such as Web Service, Spark, Presto, TensorFlow, Pytorch, etc. We provide a profile tool to help you manage workload scheduling policies, which allows to control scheduling policies without modifying the existing workload controller.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Numba

    Numba

    NumPy aware dynamic Python compiler using LLVM

    ...Special decorators can create universal functions that broadcast over NumPy arrays just like NumPy functions do. Numba also works great with Jupyter notebooks for interactive computing, and with distributed execution frameworks, like Dask and Spark.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Smile

    Smile

    Statistical machine intelligence and learning engine

    Smile is a fast and comprehensive machine learning engine. With advanced data structures and algorithms, Smile delivers the state-of-art performance. Compared to this third-party benchmark, Smile outperforms R, Python, Spark, H2O, xgboost significantly. Smile is a couple of times faster than the closest competitor. The memory usage is also very efficient. If we can train advanced machine learning models on a PC, why buy a cluster? Write applications quickly in Java, Scala, or any JVM languages. Data scientists and developers can speak the same language now! ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    polynote

    polynote

    A better notebook for Scala (and more)

    Polynote is an innovative polyglot notebook environment that improves on traditional interactive computing tools by enabling multi-language notebooks where users can mix languages like Scala, Python, SQL, and more within a single document while sharing variables and definitions seamlessly across those languages. Designed to address shortcomings in classic notebook solutions, it blends features familiar from IDEs — such as smart autocomplete, parameter hints, and better runtime insights —...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Angel

    Angel

    A Flexible and Powerful Parameter Server for large-scale ML

    ...With a model-centered core design concept, Angel partitions the parameters of complex models into multiple parameter-server nodes and implements a variety of machine learning algorithms and graph algorithms using efficient model-updating interfaces and functions, as well as a flexible consistency model for synchronization. Angel is developed with Java and Scala. It supports running on Yarn. With PS Service abstraction, it supports Spark on Angel.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    DoWhy

    DoWhy

    DoWhy is a Python library for causal inference

    ...DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks. Much like machine learning libraries have done for prediction, DoWhy is a Python library that aims to spark causal thinking and analysis. DoWhy provides a wide variety of algorithms for effect estimation, causal structure learning, diagnosis of causal structures, root cause analysis, interventions and counterfactuals. DoWhy builds on two of the most powerful frameworks for causal inference: graphical causal models and potential outcomes. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    HugeGraph

    HugeGraph

    A graph database that supports more than 100+ billion data

    ...HugeGraph supports fast import performance in the case of more than 10 billion Vertices and Edges Graph, millisecond-level OLTP query capability, and can be integrated into big data platforms like Hadoop or Spark for OLAP analysis. The main scenarios of HugeGraph include correlation search, fraud detection, and knowledge graph. Not only supports Gremlin graph query language and RESTful API but also provides commonly used graph algorithm APIs. To help users easily implement various queries and analyses, HugeGraph has a full range of accessory tools, such as supporting distributed storage, data replication, scaling horizontally, and supports many built-in backends of storage engines.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Apache Bigtop

    Apache Bigtop

    Bigtop is an Apache Foundation project for Infrastructure Engineers

    Apache Bigtop is a project focused on building and packaging the Hadoop ecosystem and related big data components. It provides a consistent framework for testing, packaging, and deploying Hadoop distributions, including tools like HDFS, YARN, Spark, Hive, HBase, and more. By maintaining cross-platform builds (RPMs, DEBs, Docker images, and Kubernetes support), Bigtop makes it easier for organizations to deploy big data stacks in different environments. It also includes a set of integration tests and smoke tests to ensure compatibility and stability between ecosystem components. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB