Showing 220 open source projects for "data"

View related business solutions
  • Cut Data Warehouse Costs up to 54% with BigQuery Icon
    Cut Data Warehouse Costs up to 54% with BigQuery

    Migrate from Snowflake, Databricks, or Redshift with free migration tools. Exabyte scale without the Exabyte price.

    BigQuery delivers up to 54% lower TCO than cloud alternatives. Migrate from legacy or competing warehouses using free BigQuery Migration Service with automated SQL translation. Get serverless scale with no infrastructure to manage, compressed storage, and flexible pricing—pay per query or commit for deeper discounts. New customers get $300 in free credit.
    Try BigQuery Free
  • Build AI Apps with Gemini 3 on Vertex AI Icon
    Build AI Apps with Gemini 3 on Vertex AI

    Access Google’s most capable multimodal models. Train, test, and deploy AI with 200+ foundation models on one platform.

    Vertex AI gives developers access to Gemini 3—Google’s most advanced reasoning and coding model—plus 200+ foundation models including Claude, Llama, and Gemma. Build generative AI apps with Vertex AI Studio, customize with fine-tuning, and deploy to production with enterprise-grade MLOps. New customers get $300 in free credits.
    Try Vertex AI Free
  • 1
    sq data wrangler

    sq data wrangler

    sq data wrangler

    sq is a command line tool that provides jq-style access to structured data sources: SQL databases, or document formats like CSV or Excel. sq executes jq-like queries, or database-native SQL. It can join across sources: join a CSV file to a Postgres table, or MySQL with Excel. sq outputs to a multitude of formats including JSON, Excel, CSV, HTML, Markdown and XML, and can insert query results directly to a SQL database. sq can also inspect sources to view metadata about the source structure (tables, columns, size). ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    The CUE Data Constraint Language

    The CUE Data Constraint Language

    The home of the CUE language. Validate and define text-based config

    CUE is an open source data constraint language which aims to simplify tasks involving defining and using data. CUE merges the notion of schema and data. The same CUE definition can simultaneously be used for validating data and act as a template to reduce boilerplate. Schema definition is enriched with fine-grained value definitions and default values. At the same time, data can be simplified by removing values implied by such detailed definitions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Bifrost

    Bifrost

    Middleware for production-oriented MySQL

    Heterogeneous middleware for production-oriented MySQL, MariaDB, kafka synchronization to Redis, MongoDB, ClickHouse and other services. And this Bifrost can synchronize the full amount of data to multiple targets in real time. Support MySQL, MariaDB all storage types. Interface-based dynamic and flexible configuration of different data tables and target libraries. Multiple data sources, multiple target library support. Both incremental and full data synchronization are supported. One Binlog parsing thread, multiple target libraries are synchronized in parallel. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 4
    Volcano

    Volcano

    A Cloud Native Batch System (Project under CNCF)

    ...It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workload including machine learning/deep learning, bioinformatics/genomics, and other "big data" applications. These types of applications typically run on generalized domain frameworks like TensorFlow, Spark, Ray, PyTorch, MPI, etc, which Volcano integrates with. Volcano builds upon a decade and a half of experience running a wide variety of high-performance workloads at scale using several systems and platforms, combined with best-of-breed ideas and practices from the open-source community. ...
    Downloads: 293 This Week
    Last Update:
    See Project
  • Easily Host LLMs and Web Apps on Cloud Run Icon
    Easily Host LLMs and Web Apps on Cloud Run

    Run everything from popular models with on-demand NVIDIA L4 GPUs to web apps without infrastructure management.

    Run frontend and backend services, batch jobs, host LLMs, and queue processing workloads without the need to manage infrastructure. Cloud Run gives you on-demand GPU access for hosting LLMs and running real-time AI—with 5-second cold starts and automatic scale-to-zero so you only pay for actual usage. New customers get $300 in free credit to start.
    Try Cloud Run Free
  • 5
    Numaflow

    Numaflow

    Kubernetes-native platform to run massively parallel data/streaming

    Numaflow is a Kubernetes-native tool for running massively parallel stream processing. A Numaflow Pipeline is implemented as a Kubernetes custom resource and consists of one or more source, data processing, and sink vertices. Numaflow installs in a few minutes and is easier and cheaper to use for simple data processing applications than a full-featured stream processing platform.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    kpt Kubernetes

    kpt Kubernetes

    Automate Kubernetes Configuration Editing

    kpt is a package-centric toolchain that enables a WYSIWYG configuration authoring, automation, and delivery experience, which simplifies managing Kubernetes platforms and KRM-driven infrastructure (e.g., Config Connector, Crossplane) at scale by manipulating declarative Configuration as Data.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Neosync

    Neosync

    Open Source Data Security Platform for Developers to Monitor

    Neosync is a secure, open-source platform to generate, mask, and sync realistic test data across environments. It helps engineering teams create privacy-compliant datasets using synthetic data, transformations, and pseudonymization techniques. Designed with extensibility and data governance in mind, Neosync integrates with common databases and cloud services, enabling safe test environments for development and QA.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Functional programming library golang

    Functional programming library golang

    functional programming library for golang

    ...For each data type, there exists a small set of composition functions. These functions are called the same across all data types, so you only have to learn a small number of function names. The semantic of functions of the same name is consistent across all data types.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    GeoIP

    GeoIP

    This project automatically generates GeoIP files in multiple formats

    GeoIP is a community-maintained project that generates and publishes enhanced GeoIP/Geo-database and IP-location/routing data in multiple formats (e.g. V2Ray .dat, MaxMind .mmdb, and others) to support proxy, VPN, or routing tools requiring IP-to-country/region resolution. Rather than depending solely on the official GeoLite2 data, geoip augments and merges data sources (especially for certain regions) to improve coverage or tailor by use-case (e.g. proxy-specific rules, private networks, or region-based classification). ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Build on Google Cloud with $300 in Free Credit Icon
    Build on Google Cloud with $300 in Free Credit

    New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.
    Start Free Trial
  • 10
    kpt

    kpt

    Automate Kubernetes Configuration Editing

    kpt is a package-centric toolchain that enables a WYSIWYG configuration authoring, automation, and delivery experience, which simplifies managing Kubernetes platforms and KRM-driven infrastructure (e.g., Config Connector, Crossplane) at scale by manipulating declarative Configuration as Data. Any general-purpose or domain-specific language can be used to create functions to transform and/or validate the YAML KRM input/output format, but we provide SDKs to simplify the function authoring process, in Go, Typescript, and Starlark, a Python-like embedded language. A catalog of off-the-shelf, tested functions. kpt makes configuration easy to create and transform, via reusable functions. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Kapacitor

    Kapacitor

    Open source framework for processing, monitoring, and alerting

    Open source framework for processing, monitoring, and alerting on time series data. Kapacitor is a real-time data processing engine for monitoring and alerting, specifically designed to work with time-series data from InfluxDB.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    SFTPGo

    SFTPGo

    Fully featured and highly configurable SFTP server with optional HTTP

    Fully featured and highly configurable SFTP server with optional HTTP/S, FTP/S and WebDAV support. Several storage backends are supported: local filesystem, encrypted local filesystem, S3 (compatible) Object Storage, Google Cloud Storage, Azure Blob Storage, SFTP. SFTPGo is an Open Source project and you can of course use it for free but please don't ask for free support as well. Support for serving local filesystem, encrypted local filesystem, S3 Compatible Object Storage, Google Cloud...
    Downloads: 79 This Week
    Last Update:
    See Project
  • 13
    Bacalhau

    Bacalhau

    Community-driven, simple, yet powerful framework

    Bacalhau is a decentralized compute platform for running jobs on data stored across distributed networks, like IPFS or Filecoin, without moving the data to centralized cloud environments. It allows developers to run containerized workloads close to where the data lives, reducing latency, cost, and privacy risks. Bacalhau supports various runtime environments and is designed to make decentralized data processing as accessible as traditional cloud computing. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Excelize

    Excelize

    Go language library for reading and writing Microsoft Excel

    ...You can build charts based on data in your worksheet or generate charts without any data in your worksheet at all.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Blue Whale Configuration Platform

    Blue Whale Configuration Platform

    Blue Whale smart cloud configuration platform

    Has accumulated experience in supporting hundreds of Tencent businesses, compatible with various complex system architectures, born in operation and maintenance, and proficient in operation and maintenance. From configuration management to job execution, task scheduling and monitoring self-healing, and then through operation and maintenance big data analysis to assist operational decision-making, it covers the full-cycle assurance management of business operations in a comprehensive manner. The open PaaS has a powerful development framework and scheduling engine, as well as a complete operation and maintenance development training system, which helps the rapid transformation and upgrading of operation and maintenance. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    decimal

    decimal

    Arbitrary-precision fixed-point decimal numbers in go

    Arbitrary-precision fixed-point decimal numbers in go. Note: Decimal library can "only" represent numbers with a maximum of 2^31 digits after the decimal point. The zero-value is 0, and is safe to use without initialization. Addition, subtraction, and multiplication with no loss of precision. Division with specified precision. Database/sql serialization/deserialization. JSON and XML serialization/deserialization. big.Int's API is built to reduce the number of memory allocations for maximal...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    Pachyderm

    Pachyderm

    Data-Centric Pipelines and Data Versioning

    Data-driven pipelines automatically trigger based on detecting data changes. Automatic immutable data lineage and data versioning of all data types. Autoscaling and parallel processing built on Kubernetes for resource orchestration. Uses standard object stores for data storage with automatic deduplication. Runs across all major cloud providers and on-premises installations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Nuclio

    Nuclio

    High-Performance Serverless event and data processing platform

    Nuclio is an open source and managed serverless platform used to minimize development and maintenance overhead and automate the deployment of data-science-based applications. Real-time performance running up to 400,000 function invocations per second. Portable across low laptops, edge, on-prem and multi-cloud deployments. The first serverless platform supporting GPUs for optimized utilization and sharing. Automated deployment to production in a few clicks from Jupyter notebook. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Loggie

    Loggie

    A lightweight, cloud-native data transfer agent and aggregator

    Loggie is a lightweight, high-performance, cloud-native agent and aggregator based on Golang. Loggie includes LogConfig/ClusterLogConfig/Interceptor/Sink CRDs, allowing for the creation of data collection, transfer, processing, and sending pipelines through simple YAML file creation. Supports deployment as an independent intermediate machine, which can receive aggregated data sent by Loggie Agent and can also be used to consume and process various data sources. Configure Filebeat and Loggie to collect logs, and send them to a Kafka topic without using client compression, with the Kafka topic partition configured as 3. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    tracetest

    tracetest

    Build integration and end-to-end tests in minutes

    Tracetest is a trace-based testing tool for integration and end-to-end testing using OpenTelemetry traces. Verify end-to-end transactions and side effects across microservices & event-driven apps by using trace data as test specs. Cypress and Selenium are constrained by using the browser for testing. Tracetest bypasses this entirely by using your existing OpenTelemetry instrumentation and trace data to run tests and assertions against traces in every step of a request transaction.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    Render

    Render

    Go package for easily rendering JSON, XML, binary data, and HTML

    ...XML: Uses the encoding/xml package to marshal data into an XML-encoded response. Binary data: Passes the incoming data straight through to the HTTP.ResponseWriter. Text: Passes the incoming string straight through to the http.ResponseWriter. Render comes with a variety of configuration options. By default Render will attempt to load templates with a '.tmpl' extension from the "templates" directory.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    INI

    INI

    Package ini provides INI file read and write functionality in Go

    ...Map back and save when you get the work done. Auto-type conversion, candidate value limitation, quick slice generation, in-fly data validation. More than you can ever imagine! Multiple configuration load policies, custom data validation rules, key name and value mappers. Start hacking it now!
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    MinIO

    MinIO

    High performance object storage server compatible with Amazon S3 APIs

    MinIO is a high performance object storage server that is API compatible with Amazon S3 cloud storage service. MinIO makes it easy to build high performance, cloud native data infrastructure for machine learning, analytics and application data workloads. It is incredibly fast, enabling object storage to operate as the primary storage tier for a diverse set of workloads. It is also built to be cloud native and enterprise ready. MinIO is being used worldwide in various production deployments, and is leading the way as the most downloaded object storage server in the industry.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 24
    Netcap

    Netcap

    A framework for secure and scalable network traffic analysis

    The Netcap (NETwork CAPture) framework efficiently converts a stream of network packets into platform-neutral type-safe structured audit records that represent specific protocols or custom abstractions. These audit records can be stored on disk or exchanged over the network, and are well-suited as a data source for machine learning algorithms. Since parsing of untrusted input can be dangerous and network data is potentially malicious, a programming language that provides a garbage-collected memory-safe runtime is used for the implementation.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    Grafana Pyroscope

    Grafana Pyroscope

    Continuous Profiling Platform. Debug performance issues

    ...Collect, store, and analyze profiles from various external profiling tools in one central location. Link to your Open Telemetry tracing data and get request-specific or span-specific profiles to enhance other observability data like traces and logs.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.
Try Free →