Showing 23554 open source projects for "data"

View related business solutions
  • Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud Icon
    Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud

    Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

    Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.
    Try Cloud SQL Free
  • Build AI Apps with Gemini 3 on Vertex AI Icon
    Build AI Apps with Gemini 3 on Vertex AI

    Access Google’s most capable multimodal models. Train, test, and deploy AI with 200+ foundation models on one platform.

    Vertex AI gives developers access to Gemini 3—Google’s most advanced reasoning and coding model—plus 200+ foundation models including Claude, Llama, and Gemma. Build generative AI apps with Vertex AI Studio, customize with fine-tuning, and deploy to production with enterprise-grade MLOps. New customers get $300 in free credits.
    Try Vertex AI Free
  • 1
    data-diff

    data-diff

    Efficiently diff rows across two different databases

    We're excited to announce the launch of a new open-source product, data-diff that makes comparing datasets across databases fast at any scale. data-diff automates data quality checks for data replication and migration. In modern data platforms, data is constantly moving between systems, and at the modern data volume and complexity, systems go out of sync all the time. Until now, there has not been any tooling to ensure that when the data is correctly copied. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    atk4/data

    atk4/data

    Data Access PHP Framework for SQL & high-latency databases

    ATK Data is a data persistence and modeling framework for PHP, developed as part of the Agile Toolkit. It provides a high-level abstraction for working with databases, making it easier to define and manipulate data models with minimal boilerplate code. It supports various SQL and NoSQL databases and integrates seamlessly with Agile UI and other PHP frameworks.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    Explorer

    Explorer

    Series (one-dimensional) and dataframes (two-dimensional)

    Explorer brings series (one-dimensional) and data frames (two-dimensional) to Elixir for fast data exploration.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Dynamic Data

    Dynamic Data

    Reactive collections based on Rx.Net

    ...However, typical applications are much more complicated and may apply a filter, transform the original dto and apply a sort. Even with these simple everyday operations, the complexity of the code is quickly magnified. Dynamic data has been developed to remove the tedious code of dynamically maintaining collections. It has grown to become functionally very rich with at least 60 collection-based operations which amongst other things enable filtering, sorting, grouping, joining different sources, transforms, binding, pagination, data virtualization, expiration, disposal management plus more.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Build on Google Cloud with $300 in Free Credit Icon
    Build on Google Cloud with $300 in Free Credit

    New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.
    Start Free Trial
  • 5
    Form-Data

    Form-Data

    A module to create readable `"multipart/form-data"` streams

    A library to create readable "multipart/form-data" streams. Can be used to submit forms and file uploads to other web applications.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Data-Juicer

    Data-Juicer

    Data processing for and with foundation models

    Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    MDN data

    MDN data

    This repository contains general data for Web technologies

    This repository contains general data for Web technologies and is maintained by the MDN team at Mozilla.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    data.table

    data.table

    Extends base R’s data for high-performance data manipulation

    data.table is an R package that extends base R’s data.frame for high-performance data manipulation. It offers concise syntax, blazing speed, and memory-efficient operations. It supports fast file reading/writing, joins, grouping, reshaping, and updates by reference. It is heavily used in large data workflows, big data in R, production pipelines, etc. Extremely efficient grouping/aggregation/summarization; can handle very large datasets (hundreds of millions to billions of rows) in memory (if available). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Data Formulator

    Data Formulator

    Create rich visualizations with AI

    To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need not only proficiency in data transformation and visualization tools but also efforts to manage the branching history consisting of many different versions of data and charts. Recent LLM-powered AI systems have greatly improved visualization authoring experiences, for example by mitigating manual data transformation barriers via LLMs' code generation ability. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Run Any Workload on Compute Engine VMs Icon
    Run Any Workload on Compute Engine VMs

    From dev environments to AI training, choose preset or custom VMs with 1–96 vCPUs and industry-leading 99.95% uptime SLA.

    Compute Engine delivers high-performance virtual machines for web apps, databases, containers, and AI workloads. Choose from general-purpose, compute-optimized, or GPU/TPU-accelerated machine types—or build custom VMs to match your exact specs. With live migration and automatic failover, your workloads stay online. New customers get $300 in free credits.
    Try Compute Engine
  • 10
    Orange Data Mining

    Orange Data Mining

    Orange: Interactive data analysis

    Open source machine learning and data visualization. Build data analysis workflows visually, with a large, diverse toolbox. Perform simple data analysis with clever data visualization. Explore statistical distributions, box plots and scatter plots, or dive deeper with decision trees, hierarchical clustering, heatmaps, MDS and linear projections. Even your multidimensional data can become sensible in 2D, especially with clever attribute ranking and selections. ...
    Downloads: 47 This Week
    Last Update:
    See Project
  • 11
    Profile Data

    Profile Data

    Analyze computation-communication overlap in V3/R1

    profile-data is a repository that publishes profiling traces and metrics from DeepSeek’s training and inference infrastructure (especially during DeepSeek-V3 / R1 experiments). The profiling data targets insights into computation-communication overlap, pipeline scheduling (e.g. DualPipe), and how MoE / EP / parallelism strategies interact in real systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Laravel Data

    Laravel Data

    Powerful data objects for Laravel

    This package enables the creation of rich data objects which can be used in various ways. Using this package you only need to describe your data once.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Micronaut Data

    Micronaut Data

    Ahead of Time Data Repositories

    ...The problem is worse when combined with Hibernate which maintains its own meta-model as you end up with duplicate meta-models. Micronaut Data instead moves this model into the compiler. Both GORM and Spring Data use regular expressions and pattern matching in combination with runtime generated proxies to translate a method definition on a Java interface into a query at runtime. No such runtime translation exists in Micronaut Data and this work is carried out by the Micronaut compiler at compilation time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Azure Data Studio

    Azure Data Studio

    A data management tool that enables working with other SQL tools

    Azure Data Studio is a cross-platform database tool for data professionals who use on-premises and cloud data platforms on Windows, macOS, and Linux. Azure Data Studio offers a modern editor experience with IntelliSense, code snippets, source control integration, and an integrated terminal. It's engineered with the data platform user in mind, with the built-in charting of query result sets and customizable dashboards.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 15
    Spring Data JPA

    Spring Data JPA

    Simplifies the development of creating a JPA-based data access layer

    Spring Data JPA, part of the larger Spring Data family, makes it easy to easily implement JPA-based repositories. This module deals with enhanced support for JPA-based data access layers. It makes it easier to build Spring-powered applications that use data access technologies. Implementing a data access layer of an application has been cumbersome for quite a while.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    Data Structures for PHP 7

    Data Structures for PHP 7

    An extension providing efficient data structures for PHP 7

    An extension providing specialized data structures as efficient alternatives to the PHP array. Highlights the API, performance and other benefits. PHP has one data structure to rule them all. The array is a complex, flexible, master-of-none, hybrid data structure, combining the behaviour of a list and a linked map. But we use it for everything, because PHP is pragmatic: “dealing with things sensibly and realistically in a way that is based on practical rather than theoretical considerations”. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    Spring Data MongoDB

    Spring Data MongoDB

    Provide support to increase developer productivity in Java

    The primary goal of the Spring Data project is to make it easier to build Spring-powered applications that use new data access technologies such as non-relational databases, map-reduce frameworks, and cloud-based data services. The Spring Data MongoDB project aims to provide a familiar and consistent Spring-based programming model for new datastores while retaining store-specific features and capabilities.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    An AWS Professional Service open-source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data-related services. Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses, and Databases. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Dash Data Agent

    Dash Data Agent

    Self-learning data agent that grounds its answers in layers of content

    Dash is a self-learning data agent built by the Agno AI community that generates grounded answers to English queries over structured data by synthesizing SQL and reasoning based on six layers of context, improving automatically with each run. It sidesteps common limitations of simple text-to-SQL agents by incorporating multiple context layers — including schema structure, human annotations, known query patterns, institutional knowledge from docs, machine-discovered error patterns, and live runtime context — to generate SQL queries that are both technically correct and semantically meaningful. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    The Data Engineering Handbook

    The Data Engineering Handbook

    Links to everything you'd ever want to learn about data engineering

    The Data Engineering Handbook is a comprehensive, community-curated repository that aggregates essential learning resources for anyone interested in becoming a professional data engineer. Rather than being a code project itself, it’s a learning handbook that links to books, articles, tutorials, community groups, boot camps, and real-world project examples that collectively form a roadmap to mastering data engineering skills.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Spring Data Redis

    Spring Data Redis

    Provides support to increase developer productivity in Java

    Provides support to increase developer productivity in Java when using Redis, a key-value store. Uses familiar Spring concepts such as a template class for core API usage and lightweight repository-style data access. The primary goal of the Spring Data project is to make it easier to build Spring-powered applications that use new data access technologies such as non-relational databases, map-reduce frameworks, and cloud-based data services. Connection package as low-level abstraction across multiple Redis drivers (Lettuce and Jedis). Exception translation to Spring’s portable Data Access exception hierarchy for Redis driver exceptions. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Cookiecutter Data Science

    Cookiecutter Data Science

    Project structure for doing and sharing data science work

    A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. When we think about data analysis, we often think just about the resulting reports, insights, or visualizations. While these end products are generally the main event, it's easy to focus on making the products look nice and ignore the quality of the code that generates them. Because these end products are created programmatically, code quality is still important! ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    NYC Taxi Data

    NYC Taxi Data

    Import public NYC taxi and for-hire vehicle (Uber, Lyft)

    The nyc-taxi-data repository is a rich dataset and exploratory project around New York City taxi trip records. It collects and preprocesses large-scale trip datasets (fares, pickup/dropoff, timestamps, locations, passenger counts) to enable data analysis, modeling, and visualization efforts. The project includes scripts and notebooks for cleaning and filtering the raw data, memory-efficient processing for large CSV/Parquet files, and aggregation workflows (e.g. trips per hour, heatmaps of pickups/dropoffs). ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    browser-compat-data

    browser-compat-data

    This repository contains compatibility data for Web technologies

    The browser-compat-data ("BCD") project contains machine-readable browser (and JavaScript runtime) compatibility data for Web technologies, such as Web APIs, JavaScript features, CSS properties, and more. Our goal is to document accurate compatibility data for Web technologies, so web developers may write cross-browser compatible websites more easily. BCD is used in web apps and software such as MDN Web Docs, CanIUse, Visual Studio Code, WebStorm and more.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    sq data wrangler

    sq data wrangler

    sq data wrangler

    sq is a command line tool that provides jq-style access to structured data sources: SQL databases, or document formats like CSV or Excel. sq executes jq-like queries, or database-native SQL. It can join across sources: join a CSV file to a Postgres table, or MySQL with Excel. sq outputs to a multitude of formats including JSON, Excel, CSV, HTML, Markdown and XML, and can insert query results directly to a SQL database. sq can also inspect sources to view metadata about the source structure (tables, columns, size). ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.
Try Free →