Showing 3124 open source projects for "data"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud Icon
    Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud

    Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

    Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.
    Try Cloud SQL Free
  • 1
    TensorFlow Privacy

    TensorFlow Privacy

    Library for training machine learning models with privacy for data

    Library for training machine learning models with privacy for training data. This repository contains the source code for TensorFlow Privacy, a Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy. The library comes with tutorials and analysis tools for computing the privacy guarantees provided.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Tally

    Tally

    Let agents classify your bank transactions

    Tally is an open-source, AI-assisted tool designed to automate the classification of personal financial transactions, helping users turn raw bank data into meaningful categories without manual tagging. At its core, Tally pairs a local rule engine with large language models so that an AI assistant (like Claude Code, Copilot, or any CLI agent) interprets, suggests, and categorizes expenses, savings, subscriptions, and income events based on your own rules and behavior. It generates human-readable reports and can produce HTML, JSON, or Markdown outputs to suit dashboards or personal finance workflows. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    MedGemma

    MedGemma

    Collection of Gemma 3 variants that are trained for performance

    ...It includes multiple variants such as a 4 billion-parameter multimodal model that can process both medical images and text and a 27 billion-parameter text-only (and multimodal) model that offers deeper clinical reasoning and understanding at higher capacity, making it suitable for complex tasks like medical question answering, summarization of clinical notes, or generating reports from radiology images. The multimodal versions pair a SigLIP-based image encoder pre-trained on diverse de-identified medical imaging data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    WhisperSpeech

    WhisperSpeech

    An Open Source text-to-speech system built by inverting Whisper

    ...The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data. Its architecture follows a token-based, multi-stage pipeline inspired by AudioLM and SPEAR-TTS: Whisper is used to produce semantic tokens, EnCodec compresses the waveform into acoustic tokens, and Vocos reconstructs high-fidelity audio from those tokens. The repository includes notebooks and scripts for inference, long-form synthesis, and finetuning, as well as pre-trained models and converted datasets hosted on Hugging Face. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Build AI Apps with Gemini 3 on Vertex AI Icon
    Build AI Apps with Gemini 3 on Vertex AI

    Access Google’s most capable multimodal models. Train, test, and deploy AI with 200+ foundation models on one platform.

    Vertex AI gives developers access to Gemini 3—Google’s most advanced reasoning and coding model—plus 200+ foundation models including Claude, Llama, and Gemma. Build generative AI apps with Vertex AI Studio, customize with fine-tuning, and deploy to production with enterprise-grade MLOps. New customers get $300 in free credits.
    Try Vertex AI Free
  • 5
    CLIP

    CLIP

    CLIP, Predict the most relevant text snippet given an image

    CLIP (Contrastive Language-Image Pretraining) is a neural model that links images and text in a shared embedding space, allowing zero-shot image classification, similarity search, and multimodal alignment. It was trained on large sets of (image, caption) pairs using a contrastive objective: images and their matching text are pulled together in embedding space, while mismatches are pushed apart. Once trained, you can give it any text labels and ask it to pick which label best matches a given...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    NVIDIA AgentIQ

    NVIDIA AgentIQ

    The NVIDIA AgentIQ toolkit is an open-source library

    NVIDIA AgentIQ is an open-source toolkit designed to efficiently connect, evaluate, and accelerate teams of AI agents. It provides a framework-agnostic platform that integrates seamlessly with various data sources and tools, enabling developers to build composable and reusable agentic workflows. By treating agents, tools, and workflows as simple function calls, AgentIQ facilitates rapid development and optimization of AI-driven applications, enhancing collaboration and efficiency in complex tasks. ​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    FastUI

    FastUI

    Build better UIs faster

    FastUI is a library that lets developers build interactive user interfaces for FastAPI applications using Pydantic models. It automatically generates frontend components based on data schemas and endpoint logic, reducing the need for manual UI development. Designed to be type-safe, reactive, and fast, FastUI streamlines the creation of web dashboards, admin panels, and internal tools within a FastAPI backend.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    PyMC

    PyMC

    Bayesian Modeling and Probabilistic Programming in Python

    ...Built on top of computational tools like Aesara and NumPy, PyMC allows users to define models using intuitive syntax and perform inference using MCMC, variational inference, and other advanced algorithms. It’s widely used in scientific research, data science, and decision modeling.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Pyodide

    Pyodide

    Pyodide is a Python distribution for the browser and Node.js

    ...It allows developers to run Python code directly in web browsers without a server, supporting packages like NumPy, Pandas, and Matplotlib. Pyodide opens up new possibilities for interactive data analysis, scientific computing, and educational tools in web environments, all while integrating seamlessly with JavaScript.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Pixoo

    Pixoo

    A library to help you make the most out of your Pixoo 64

    Pixoo is a Python-based library for controlling Divoom Pixoo LED displays using Bluetooth Low Energy (BLE). It allows users to send images, animations, or text to Pixoo devices, enabling creative integrations like desktop widgets, real-time data displays, or custom artwork.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Xiyan MCP Server

    Xiyan MCP Server

    A Model Context Protocol (MCP) server

    The XiYan MCP Server is a Model Context Protocol (MCP) server that enables natural language queries to databases, powered by XiYan-SQL, a state-of-the-art text-to-SQL model. It allows users to interact with databases using conversational language, simplifying data retrieval processes. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    fugue

    fugue

    A unified interface for distributed computing

    Fugue is a unified interface for distributed computing that lets users execute Python, Pandas, and SQL code on Spark, Dask, and Ray with minimal rewrites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    iX

    iX

    Autonomous GPT-4 agent platform

    IX is a platform for designing and deploying autonomous and [semi]-autonomous LLM-powered agents and workflows. IX provides a flexible and scalable solution for delegating tasks to AI-powered agents. Agents created with the platform can automate a wide variety of tasks while running in parallel and communicating with each other.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    DrissionPage

    DrissionPage

    Python based web automation tool. Powerful and elegant

    DrissionPage is a Python-based automation framework that blends the capabilities of Selenium for browser automation with Requests-HTML for fast, headless web data extraction. It enables seamless switching between browser-controlled and headless HTTP sessions within the same interface. Ideal for web scraping, testing, and automation, DrissionPage is lightweight and highly efficient, offering more flexibility than standard Selenium or Requests usage alone.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Dataherald

    Dataherald

    Interact with your SQL database, Natural Language to SQL using LLMs

    Dataherald is a platform that allows users to query structured databases using natural language, automatically converting plain English into SQL. It is designed to enable real-time, self-service analytics without needing technical knowledge of databases, making business data easily accessible to non-technical users. Dataherald focuses on speed, accuracy, and scalability for enterprise settings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    ArangoDB-Community/pyArango

    ArangoDB-Community/pyArango

    Python Driver for ArangoDB with built-in validation

    PyArango is a Python driver for ArangoDB, a multi-model NoSQL database. It provides a Pythonic way to interact with ArangoDB, allowing developers to manage collections, execute AQL queries, and integrate ArangoDB's document, graph, and key-value storage models into Python applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    LangKit

    LangKit

    An open-source toolkit for monitoring Language Learning Models (LLMs)

    LangKit is an open-source text metrics toolkit for monitoring language models. It offers an array of methods for extracting relevant signals from the input and/or output text, which are compatible with the open-source data logging library whylogs. Productionizing language models, including LLMs, comes with a range of risks due to the infinite amount of input combinations, which can elicit an infinite amount of outputs. The unstructured nature of text poses a challenge in the ML observability space - a challenge worth solving, since the lack of visibility on the model's behavior can have serious consequences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    TimesFM

    TimesFM

    Pretrained time-series foundation model developed by Google Research

    TimesFM is a pretrained time-series foundation model from Google Research built for forecasting tasks, designed to generalize across many domains without requiring extensive per-dataset retraining. It provides a decoder-only model approach to forecasting, aiming for strong performance even in zero-shot or low-data settings where traditional models often struggle. The project includes code and an inference API intended to make it practical to run forecasts programmatically, with options to use different backends such as Torch or Flax depending on your environment and performance needs. Newer releases emphasize expanded context handling and more flexible forecasting outputs, including quantile forecasting so users can get uncertainty estimates rather than only point predictions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Hello Python

    Hello Python

    Comprehensive tutorial repository aimed at teaching the Python program

    ...It includes over 100 classes and about 44 hours of video instruction, combined with code samples, projects, and a chat community for support. The material covers the fundamentals—variables, data types, loops, functions—as well as intermediate topics like date handling, list comprehensions, file IO, regular expressions, modules, and packages. The course is designed to be accessible: no prior programming experience required, and the resources are freely available. In addition, it is accompanied by a practical coding approach (projects) and is maintained as an open-source repository under Apache-2.0 license. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    ZeusDB Vector Database

    ZeusDB Vector Database

    Blazing-fast vector DB with similarity search and metadata filtering

    ...Hybrid search is a core design goal, allowing you to mix vector, keyword, and filter queries in a single request for practical relevance. Observability and safety round out the system, with metrics, tracing, and guardrails to manage recalls, deletions, and privacy-sensitive data at scale.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    LangExtract

    LangExtract

    A Python library for extracting structured information

    LangExtract is a Python library developed by Google that leverages large language models (LLMs) to extract structured information from unstructured text—such as clinical notes, research papers, or literary works—based on user-defined instructions. It is designed to transform free-form text into reliable, schema-constrained data while maintaining traceability back to the source material. Each extracted entity is precisely grounded in its original context, allowing visual inspection and validation via automatically generated interactive HTML visualizations. LangExtract supports a wide range of models, including Google Gemini, OpenAI GPT, and local LLMs via Ollama, making it adaptable to different deployment environments and compliance needs. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Animated Drawings

    Animated Drawings

    Code to accompany "A Method for Animating Children's Drawings"

    ...Users can provide rough keyframes or control constraints (pose anchors), and the system fills intermediate frames with fluid animation. The repository includes demonstration apps and notebooks where you can upload or draw shapes and watch animations play. Because the approach is data-driven, it generalizes to new drawings even with varying proportions or stylizations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    DeepEP

    DeepEP

    DeepEP: an efficient expert-parallel communication library

    ...Its core role is to implement high-throughput, low-latency all-to-all GPU communication kernels, which handle the dispatching of tokens to different experts (or shards) and then combining expert outputs back into the main data flow. Because MoE architectures require routing inputs to different experts, communication overhead can become a bottleneck — DeepEP addresses that by providing optimized GPU kernels and efficient dispatch/combining logic. The library also supports low-precision operations (such as FP8) to reduce memory and bandwidth usage during communication. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    The Missing Semester

    The Missing Semester

    The Missing Semester of Your CS Education

    The Missing Semester is a course and repository that teaches the engineering skills often skipped in traditional computer science curricula: command-line fluency, shell scripting, editors, version control, debugging, data wrangling, and automation. It includes lecture notes, exercises, and sample solutions that encourage hands-on practice rather than passive reading. The curriculum demystifies tools like bash, vim, git, and make, showing how to combine them into efficient workflows that scale from homework to production systems. Lessons dig into practical topics such as environment management, job control, shell pipelines, profiling, and reproducibility, with an emphasis on habits that save time and prevent errors. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Mem0

    Mem0

    The Memory layer for AI Agents

    ...It remembers user preferences, adapts to individual needs, and continuously improves over time. Key features include enhancing future conversations by building smarter AI that learns from every interaction, reducing LLM costs by up to 80% through intelligent data filtering, delivering more accurate and personalized AI outputs by leveraging historical context, and offering easy integration compatible with platforms like OpenAI and Claude. Mem0 is perfect for projects such as customer support, where chatbots remember past interactions to reduce repetition and speed up resolution times; personal AI companions that recall preferences and past conversations for more meaningful interactions; AI agents that learn from each interaction to become more personalized and effective over time.
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB