Showing 184 open source projects for "data"

View related business solutions
  • Ship AI Apps Faster with Vertex AI Icon
    Ship AI Apps Faster with Vertex AI

    Go from idea to deployed AI app without managing infrastructure. Vertex AI offers one platform for the entire AI development lifecycle.

    Ship AI apps and features faster with Vertex AI—your end-to-end AI platform. Access Gemini 3 and 200+ foundation models, fine-tune for your needs, and deploy with enterprise-grade MLOps. Build chatbots, agents, or custom models. New customers get $300 in free credit.
    Try Vertex AI Free
  • Cut Cloud Costs with Google Compute Engine Icon
    Cut Cloud Costs with Google Compute Engine

    Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

    Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.
    Try Compute Engine
  • 1
    mosdepth

    mosdepth

    fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing

    mosdepth is a fast BAM/CRAM depth calculation tool for genomic data, allowing efficient computation of sequencing coverage.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Homemade Machine Learning

    Homemade Machine Learning

    Python examples of popular machine learning algorithms

    homemade-machine-learning is a repository by Oleksii Trekhleb containing Python implementations of classic machine-learning algorithms done “from scratch”, meaning you don’t rely heavily on high-level libraries but instead write the logic yourself to deepen understanding. Each algorithm is accompanied by mathematical explanations, visualizations (often via Jupyter notebooks), and interactive demos so you can tweak parameters, data, and observe outcomes in real time. The purpose is pedagogical: you’ll see linear regression, logistic regression, k-means clustering, neural nets, decision trees, etc., built in Python using fundamentals like NumPy and Matplotlib, not hidden behind API calls. It is well suited for learners who want to move beyond library usage to understand how algorithms operate internally—how cost functions, gradients, updates and predictions work.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Comprehensive Python Cheatsheet

    Comprehensive Python Cheatsheet

    Comprehensive Python Cheatsheet

    ...The project is designed to help developers quickly recall language features without digging through full documentation, making it especially useful for both beginners and experienced programmers. It covers a broad range of topics including data structures, control flow, functions, object-oriented programming, standard library usage, and common patterns. The repository includes both web and printable versions, allowing users to access the material in multiple formats depending on their workflow. Because it is continuously maintained, the cheatsheet reflects modern Python usage and practical conventions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    zpdf

    zpdf

    Zero-copy PDF text extraction library written in Zig

    zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches. It implements multiple PDF decompression filters and handles common font encoding pathways, which are essential for turning raw PDF content streams into readable text. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud Icon
    Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud

    Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

    Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.
    Try Cloud SQL Free
  • 5
    Computer Science Flash Cards

    Computer Science Flash Cards

    Mini website for testing both general CS knowledge and enforce coding

    This repository collects concise flash cards that cover the core ideas of a traditional computer science curriculum with a focus on interview readiness. The cards distill topics like time and space complexity, classic data structures, algorithmic paradigms, operating systems, networking, and databases into short, testable prompts. They are designed for spaced-repetition style study so you can cycle frequently through fundamentals until recall feels automatic. Many cards point at canonical definitions or contrasts (e.g., stack vs. queue, BFS vs. DFS) to strengthen conceptual boundaries. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Shumai

    Shumai

    Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun

    Shumai is an experimental differentiable tensor library for TypeScript and JavaScript, developed by Facebook Research. It provides a high-performance framework for numerical computing and machine learning within modern JavaScript runtimes. Built on Bun and Flashlight, with ArrayFire as its numerical backend, Shumai brings GPU-accelerated tensor operations, automatic differentiation, and scientific computing tools directly to JavaScript developers. It allows seamless integration of machine...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    spaCy

    spaCy

    Industrial-strength Natural Language Processing (NLP)

    spaCy is a library built on the very latest research for advanced Natural Language Processing (NLP) in Python and Cython. Since its inception it was designed to be used for real world applications-- for building real products and gathering real insights. It comes with pretrained statistical models and word vectors, convolutional neural network models, easy deep learning integration and so much more. spaCy is the fastest syntactic parser in the world according to independent benchmarks, with...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    PyExcelerate

    PyExcelerate

    Accelerated Excel XLSX Writing Library for Python 2/3

    Accelerated Excel XLSX writing library for Python. PyExcelerate is a Python for writing Excel-compatible XLSX spreadsheet files, with an emphasis on speed.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    Impacket

    A collection of Python classes for working with network protocols

    ...It features several protocols, including Ethernet, IP, TCP, UDP, ICMP, IGMP, ARP, NMB and SMB1, SMB2 and SMB3 and more. Impacket's object oriented API makes it easy to work with deep hierarchies of protocols. It can construct packets from scratch, as well as parse them from raw data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Cut Data Warehouse Costs up to 54% with BigQuery Icon
    Cut Data Warehouse Costs up to 54% with BigQuery

    Migrate from Snowflake, Databricks, or Redshift with free migration tools. Exabyte scale without the Exabyte price.

    BigQuery delivers up to 54% lower TCO than cloud alternatives. Migrate from legacy or competing warehouses using free BigQuery Migration Service with automated SQL translation. Get serverless scale with no infrastructure to manage, compressed storage, and flexible pricing—pay per query or commit for deeper discounts. New customers get $300 in free credit.
    Try BigQuery Free
  • 10
    Lightweight' GAN

    Lightweight' GAN

    Implementation of 'lightweight' GAN, proposed in ICLR 2021

    ...Quoting the one-line summary "converge on single gpu with few hours' training, on 1024 resolution sub-hundred images". Augmentation is essential for Lightweight GAN to work effectively in a low data setting. You can test and see how your images will be augmented before they pass into a neural network (if you use augmentation). The general recommendation is to use suitable augs for your data and as many as possible, then after some time of training disable the most destructive (for image) augs. You can turn on automatic mixed precision with one flag --amp. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    AudioCraft

    AudioCraft

    Audiocraft is a library for audio processing and generation

    ...The repo provides inference scripts, checkpoints, and simple Python APIs so you can generate clips from prompts or incorporate the models into applications. It also contains training code and recipes, so researchers can fine-tune on custom data or explore new objectives without building infrastructure from scratch. Example notebooks, CLI tools, and audio utilities help with prompt design, conditioning on reference audio, and post-processing to produce ready-to-share outputs.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    go1pylib

    go1pylib

    go1pylib is a Python library designed to control the Go1 robot

    go1pylib is a Python library designed to control the Go1 robot by Unitree Robotics. It provides an easy-to-use interface for robot movement, state management, collision avoidance, battery monitoring, and MQTT communication. Ideal for research and robotics development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    LangExtract

    LangExtract

    A Python library for extracting structured information

    LangExtract is a Python library developed by Google that leverages large language models (LLMs) to extract structured information from unstructured text—such as clinical notes, research papers, or literary works—based on user-defined instructions. It is designed to transform free-form text into reliable, schema-constrained data while maintaining traceability back to the source material. Each extracted entity is precisely grounded in its original context, allowing visual inspection and validation via automatically generated interactive HTML visualizations. LangExtract supports a wide range of models, including Google Gemini, OpenAI GPT, and local LLMs via Ollama, making it adaptable to different deployment environments and compliance needs. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    DeepEP

    DeepEP

    DeepEP: an efficient expert-parallel communication library

    ...Its core role is to implement high-throughput, low-latency all-to-all GPU communication kernels, which handle the dispatching of tokens to different experts (or shards) and then combining expert outputs back into the main data flow. Because MoE architectures require routing inputs to different experts, communication overhead can become a bottleneck — DeepEP addresses that by providing optimized GPU kernels and efficient dispatch/combining logic. The library also supports low-precision operations (such as FP8) to reduce memory and bandwidth usage during communication. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    peepDB

    peepDB

    CLI tool and python library to inspect databases fast

    peepDB is an open-source command-line tool and Python library designed for developers and database administrators who need a fast and efficient way to inspect their database tables without writing SQL queries. With support for MySQL, PostgreSQL, and MariaDB, peepDB is lightweight, secure, and incredibly easy to use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Kornia

    Kornia

    Open Source Differentiable Computer Vision Library

    Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer vision problems. At its core, the package uses PyTorch as its main backend both for efficiency and to take advantage of the reverse-mode auto-differentiation to define and compute the gradient of complex functions. Inspired by existing packages, this library is composed by a subset of packages containing operators that can be inserted within...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Stanza

    Stanza

    Stanford NLP Python library for many human languages

    ...The toolkit is designed to be parallel among more than 70 languages, using the Universal Dependencies formalism. Stanza is built with highly accurate neural network components that also enable efficient training and evaluation with your own annotated data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    PyOpenCL

    PyOpenCL

    OpenCL integration for Python, plus shiny features

    ...PyOpenCL also includes convenient features for managing memory, compiling kernels, and interfacing with NumPy, making it a preferred choice in scientific computing, data analysis, and machine learning workflows that demand acceleration.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Django Notebook

    Django Notebook

    Django + shell_plus + Jupyter notebooks made easy

    Django + shell_plus + Jupyter notebooks made easy. A Jupyter notebook with access to objects from the Django ORM is a powerful tool to introspect data and run ad-hoc queries. Built-in integration with the imported objects from django-extensions shell_plus. Saves the state between sessions so you don't need to remember what you did. Inheritance diagrams on any object, including ORM models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Anomalib

    Anomalib

    An anomaly detection library comprising state-of-the-art algorithms

    ...Anomalib emphasizes flexibility and reproducibility: you can use its simple APIs to plug in custom models, track experiments, tune hyperparameters, and generate visualizations that highlight anomalous regions. Its design supports unsupervised or semi-supervised paradigms, making it especially powerful for scenarios where only “normal” data is readily available and defects must be detected without exhaustive labeling. Combined with its CLI and integration with optimization tools like OpenVINO, it’s suitable for both research and edge deployment tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    claude-code-transcripts

    claude-code-transcripts

    Tools for publishing transcripts for Claude Code sessions

    claude-code-transcripts is a command-line utility that takes session files exported from Claude Code (in JSON or JSONL format) and turns them into clean, navigable HTML transcripts that can be viewed in any modern web browser. It is designed to make the often dense and verbose outputs from AI coding sessions easier to read, share, and archive by breaking conversations into paginated, annotated pages with navigable timelines of prompts and responses. Users can run this tool locally or fetch...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Penzai

    Penzai

    A JAX research toolkit to build, edit, & visualize neural networks

    Penzai, developed by Google DeepMind, is a JAX-based library for representing, visualizing, and manipulating neural network models as functional pytree data structures. It is designed to make machine learning research more interpretable and interactive, particularly for tasks like model surgery, ablation studies, architecture debugging, and interpretability research. Unlike conventional neural network libraries, Penzai exposes the full internal structure of models, enabling fine-grained inspection and modification after training. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Prompt Engineering Interactive Tutorial

    Prompt Engineering Interactive Tutorial

    Anthropic's Interactive Prompt Engineering Tutorial

    Prompt-eng-interactive-tutorial is a comprehensive, hands-on tutorial that teaches the craft of prompt engineering with Claude through guided, executable lessons. It starts with the anatomy of a good prompt and moves into techniques that deliver the “80/20” gains—separating instructions from data, specifying schemas, and setting evaluation criteria. The course leans heavily on realistic failure modes (ambiguity, hallucination, brittle instructions) and shows how to iteratively debug prompts the way you would debug code. Lessons include building prompts from scratch for common tasks like extraction, classification, transformation, and step-by-step reasoning, with checkpoints that let you compare your outputs against solid baselines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PyG

    PyG

    Graph Neural Network Library for PyTorch

    PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data. It consists of various methods for deep learning on graphs and other irregular structures, also known as geometric deep learning, from a variety of published papers. In addition, it consists of easy-to-use mini-batch loaders for operating on many small and single giant graphs, multi GPU-support, DataPipe support, distributed graph learning via Quiver, a large number of common benchmark datasets (based on simple interfaces to create your own), the GraphGym experiment manager, and helpful transforms, both for learning on arbitrary graphs as well as on 3D meshes or point clouds. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB