Showing 37 open source projects for "data"

View related business solutions
  • Build AI Apps with Gemini 3 on Vertex AI Icon
    Build AI Apps with Gemini 3 on Vertex AI

    Access Google’s most capable multimodal models. Train, test, and deploy AI with 200+ foundation models on one platform.

    Vertex AI gives developers access to Gemini 3—Google’s most advanced reasoning and coding model—plus 200+ foundation models including Claude, Llama, and Gemma. Build generative AI apps with Vertex AI Studio, customize with fine-tuning, and deploy to production with enterprise-grade MLOps. New customers get $300 in free credits.
    Try Vertex AI Free
  • Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud Icon
    Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud

    Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

    Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.
    Try Cloud SQL Free
  • 1
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    ...With Spark Streaming (microbatches) and Structured Streaming, it delivers low-latency event processing suitable for real-time analytics. The built-in MLlib library provides scalable machine learning algorithms, while GraphX enables graph computations integrated with data pipelines. Spark supports multiple languages—Scala, Java, Python, R—and connects with many storage systems like HDFS, S3, Cassandra, and streaming platforms like Kafka, making it a versatile choice for big data workloads in analytics, ETL, and data science.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Deequ

    Deequ

    Deequ is a library built on top of Apache Spark

    Deequ is a library built atop Apache Spark that enables defining “unit tests for data” — that is, formal constraints or checks on datasets to ensure data quality along dimensions such as completeness, uniqueness, value ranges, correlations, etc. It can scale to large datasets (billions of rows) by translating those data checks into Spark jobs. Deequ supports advanced features like a metrics repository for storing computed statistics over time, anomaly detection of data quality metrics, and the suggestion of likely constraints automatically for new datasets. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Akka

    Akka

    Build concurrent, distributed, and resilient message-driven apps

    ...Small memory footprint; ~2.5 million actors per GB of heap. Distributed systems without single points of failure. Load balancing and adaptive routing across nodes. Event Sourcing and CQRS with Cluster Sharding. Distributed Data for eventual consistency using CRDTs. Asynchronous non-blocking stream processing with backpressure.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Synapse Machine Learning

    Synapse Machine Learning

    Simple and distributed Machine Learning

    ...SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit, and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of data sources. SynapseML also brings new networking capabilities to the Spark Ecosystem. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Cut Cloud Costs with Google Compute Engine Icon
    Cut Cloud Costs with Google Compute Engine

    Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

    Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.
    Try Compute Engine
  • 5
    Chimney

    Chimney

    Scala library for boilerplate-free, type-safe data transformations

    Chimney is a Scala library that facilitates boilerplate-free, type-safe data transformations between different data types. It enables developers to define mappings between source and target types, ensuring that transformations are checked at compile time, thereby reducing runtime errors and enhancing code reliability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Monocle

    Monocle

    Optics library for Scala

    Monocle is a pure functional, optics library for Scala providing immutable data access and transformation tools — including Lens, Prism, Iso, Optional, and Traversal. It enables composable, declarative modifications of deeply nested immutable structures in a concise and type-safe fashion.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Slick database

    Slick database

    Slick (Scala Language Integrated Connection Kit) is a modern database

    Slick is a modern database query and access library for Scala. It allows you to work with stored data almost as if you were using Scala collections while at the same time giving you full control over when a database access happens and which data is transferred. You can write your database queries in Scala instead of SQL, thus profiting from the static checking, compile-time safety and compositionality of Scala. Slick features an extensible query compiler which can generate code for different backends. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Flix

    Flix

    The Flix Programming Language

    Flix is a statically typed programming language combining functional, imperative, and logic paradigms, with first‑class Datalog constraints and a polymorphic effect system. Designed to run on the JVM, Flix enforces purity tracking at compile time, supports algebraic data types, tail‑call elimination, and allows entire Datalog programs as values.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Gatling

    Gatling

    Modern Load Testing as Code

    Gatling is a high-performance load testing tool built on the JVM that emphasizes realism, scalability, and developer ergonomics. Test scenarios are scripted in a concise Scala-based DSL, allowing you to model user journeys with think times, feeders (dynamic data), checks, and assertions all in code. Its asynchronous, non-blocking engine (backed by Netty) can drive very high concurrency from a single injector, reducing the need for large injector farms. Gatling supports HTTP out of the box as well as WebSocket, Server-Sent Events, and JMS, so you can exercise modern, real-time systems end to end. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Scalaz

    Scalaz

    Principled Functional Programming in Scala

    Scalaz is a foundational functional-programming library for Scala that provides type classes, data types, and syntax to write pure, composable code. It implements classic abstractions such as Functor, Applicative, Monad, Monoid, Foldable, and Traverse, along with powerful transformers (ReaderT, StateT, WriterT, OptionT, and more) to structure effects. The library offers rich data structures—\/ (disjunction), Validation, NonEmptyList, IList, and Free—that help model errors, invariants, and interpretable programs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    X's Recommendation Algorithm

    X's Recommendation Algorithm

    Source code for the X Recommendation Algorithm

    ...Written primarily in Scala, it shows the architecture of large-scale recommendation systems, including candidate sourcing, ranking, and heuristics. While certain components (such as safety layers, spam detection, or private data) are excluded, the release provides valuable insights into the design of real-world machine learning–driven ranking systems. The project is intended as a reference for researchers, developers, and the public to study, experiment with, and better understand the mechanisms behind social media content.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Scala 2

    Scala 2

    Scala 2 compiler and standard library

    ...Scaladex is officially supported by Scala Center. In Scala, functions are values, and can be defined as anonymous functions with a concise syntax. In Scala, case classes are used to represent structural data types.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Skunk

    Skunk

    A data access library for Scala + Postgres

    Skunk is a Postgres library for Scala. Skunk is powered by cats, cats-effect, scodec, and fs2. Skunk is purely functional, non-blocking, and provides a tagless-final API. Skunk gives very good error messages. Skunk embraces the Scala Code of Conduct. Skunk is pre-release software! Code and documentation are under active development! Skunk is published for Scala 2.12/2.13/3.1 and can be included in your project.Query and Command types are usually inferrable, but specifying a type ensures that...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Sangria

    Sangria

    Scala GraphQL implementation

    ...Since GraphQL has a type system, the server defines a schema that the client can query using the introspection API. This provides the client with a set of possibilities. After the client got this information and decided which parts of the data it needs, it is able to describe its data requirements in form of a GraphQL query. An important aspect of GraphQL is that it’s completely backend agnostic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    ScalaCheck

    ScalaCheck

    Property-based testing for Scala

    ScalaCheck is a library for property-based testing in Scala (and Java), inspired by Haskell’s QuickCheck. It automatically generates test inputs based on specifications, validating that properties hold across randomized scenarios, thereby enabling robust, declarative testing of edge cases and invariants.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    GitBucket

    GitBucket

    A Git platform powered by Scala

    ...You can also deploy gitbucket.war to a servlet container which supports Servlet 3.0 (like Jetty, Tomcat, JBoss, etc). To upgrade GitBucket, replace gitbucket.war with the new version, after stopping GitBucket. All GitBucket data is stored in HOME/.gitbucket by default. So if you want to back up GitBucket's data, copy the directory to the backup location. If you want to try the development version of GitBucket, or want to contribute to the project, please see the Developer's Guide. It provides instructions on building from source and on setting up an IDE for debugging. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    SageMaker Spark

    SageMaker Spark

    A Spark library for Amazon SageMaker

    SageMaker Spark is an open-source Spark library for Amazon SageMaker. With SageMaker Spark you construct Spark ML Pipelines using Amazon SageMaker stages. These pipelines interleave native Spark ML stages and stages that interact with SageMaker training and model hosting. With SageMaker Spark, you can train on Amazon SageMaker from Spark DataFrames using Amazon-provided ML algorithms like K-Means clustering or XGBoost, and make predictions on DataFrames against SageMaker endpoints hosting...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Shapeless

    Shapeless

    Generic programming for Scala

    Shapeless is a powerful generic programming library for Scala, enabling compile-time abstraction and manipulation of types. It provides features such as HLists (heterogenous lists), generic derivation of type class instances, dependent types, and polymorphic functions—allowing developers to write boilerplate-free, type-safe code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    DataNucleus

    DataNucleus

    Java persistence using JDO, JPA or REST

    DataNucleus provides Java data persistence to a range of datastores using JDO/JPA/REST APIs. *** Note that code development is no longer on SourceForge (code on SourceForge is for versions up to 3.3.5 only) ***
    Downloads: 6 This Week
    Last Update:
    See Project
  • 20
    Algebird

    Algebird

    Abstract Algebra for Scala

    Algebird is Twitter’s Apache‑licensed Scala library providing abstract algebra data structures and algorithms, especially for online/streaming aggregation. It includes Monoid, Approximate, HyperLogLog, CMS, BloomFilter, Min/Max, Averaged Value types, supporting efficient distributed aggregation and approximate analytics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Binding.scala

    Binding.scala

    Reactive data-binding for Scala

    Binding.scala is a data-binding library for Scala, running on both JVM and Scala.js. Binding.scala can be used as the basis of UI frameworks, however latest Binding.scala 12.x does not contain any build-in UI frameworks anymore. For creating reactive HTML UI, you may want to check out html.scala, which is a UI framework based on Binding.scala, and it is also the successor of the previously built-in dom library.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    FiloDB

    FiloDB

    Distributed Prometheus time series database

    ...Designed to ingest many millions of entities, sharded across multiple processes, with distributed querying built in. Support for indexing and fast querying over flexible tags for each time series/partition, just like Prometheus. Holds a huge amount of data in-memory thanks to columnar compression techniques. Designed for highly concurrent, low-latency workloads such as dashboards and alerting. Data immediately available for querying once ingested.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    CoolplaySpark

    CoolplaySpark

    Spark Cool Play: Spark source code analysis, Spark class library, etc.

    ...The project contains annotated examples, explanations, and exercises that guide learners through Spark’s architecture, execution model, and source code internals. It is particularly valuable for developers who want to strengthen their understanding of Spark by not only using it as a data processing engine but also exploring how its internals function. Through code analysis and commentary, CoolplaySpark helps readers connect theoretical concepts with practical implementation details. By combining book study with this repository, learners can develop both conceptual clarity and hands-on expertise in Spark’s core components.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Prisma 1

    Prisma 1

    Database Tools incl. ORM, migrations and admin UI

    ...Prisma replaces traditional ORMs and simplifies database workflows. Access, Type-safe database access with the auto-generated Prisma client (in JavaScript, TypeScript, Go). Migrate, declarative data modeling and migrations (optional). Manage, visual data management with Prisma Admin. It is used to build GraphQL, REST, gRPC APIs and a lot more. Prisma currently supports MySQL, PostgreSQL, MongoDB. Prisma is a great fit for building REST& gRPC APIs where it can be used in place of traditional ORMs. It provides many benefits such as type safety, a modern API and flexible ways for reading and writing relational data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Graphcool Framework

    Graphcool Framework

    Graphcool is an open-source backend development framework

    ...Users could deploy Graphcool either locally (e.g. via Docker) or on a managed cloud offering. The framework also provided features like schema evolution, migrations, data loaders for performance, and built-in tooling to manage endpoints and deployments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB