Showing 52 open source projects for "dataset"

View related business solutions
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    Passport Index Dataset

    Passport Index Dataset

    Passport Index 2023: visa requirements for 199 countries, in .csv

    There are 6 datasets with identical visa requirements data. Three datasets are matrix and three are long (tidy) formats. Each comes in 3 versions: with country codes as specified in ISO-2 (two-letter codes), ISO-3 (three-letter codes), and full country names from no particular standard. In distance matrices (files with matrix in the filename), the first column represents a passport (=from), each remaining column represents a destination (=to). Files in tidy format (with tidy in filename)...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    CO3D (Common Objects in 3D)

    CO3D (Common Objects in 3D)

    Tooling for the Common Objects In 3D dataset

    CO3Dv2 (Common Objects in 3D, version 2) is a large-scale 3D computer vision dataset and toolkit from Facebook Research designed for training and evaluating category-level 3D reconstruction methods using real-world data. It builds upon the original CO3Dv1 dataset, expanding both scale and quality—featuring 2× more sequences and 4× more frames, with improved image fidelity, more accurate segmentation masks, and enhanced annotations for object-centric 3D reconstruction. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    all AI news

    all AI news

    A list of online news & info sources in the AI/ML/Data Science space

    ...Overall, it provides a foundational dataset for tracking AI industry trends and updates.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    NYC Taxi Data

    NYC Taxi Data

    Import public NYC taxi and for-hire vehicle (Uber, Lyft)

    ...It also contains example analyses—spatial and temporal visualizations like maps, time-series plots, and hotspot detection—highlighting insights such as patterns of demand, peak times, and geospatial distributions. The repository is often used as a benchmark dataset and example for teaching, benchmarking, and demonstration purposes in the data science and urban analytics communities.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    ARC-AGI

    ARC-AGI

    The Abstraction and Reasoning Corpus

    ...The dataset is structured as grid-based puzzles, where each task requires understanding transformations such as symmetry, counting, or spatial manipulation. Unlike traditional machine learning benchmarks, ARC emphasizes generalization and reasoning over statistical pattern recognition, making it particularly challenging for current AI systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    BIMserver

    BIMserver

    The open source BIMserver platform

    ...The main advantage of this approach is the ability to query, merge and filter the BIM model and generate IFC output (i.e. files) on the fly. Thanks to its multi-user support, multiple people can work on their own part of the dataset, while the complete dataset is updated on the fly. Other users can get notifications when the model (or a part of it) is updated.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    emojilib

    emojilib

    Emoji keyword library

    Emoji keyword library. Make emoji searchable with this keyword library. If you are looking for the unicode emoji dataset, including version, grouping, ordering, and skin tone support flag, check out unicode-emoji-json.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    React Chart.js

    React Chart.js

    React components for Chart.js, the most popular charting library

    ...In order to improve performance, offer new features, and improve maintainability, it was necessary to break backwards compatibility, but we aimed to do so only when worth the benefit. You will find that any event which causes the chart to re-render, such as hover tooltips, etc., will cause the first dataset to be copied over to other datasets, causing your lines and bars to merge together. This is because to track changes in the dataset series, the library needs a key to be specified. If none is found, it can't tell the difference between the datasets while updating. Specify a different property to be used as a key by passing a datasetIdKey prop to your chart component.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    SponsorBlock

    SponsorBlock

    Skip YouTube video sponsors (browser extension)

    SponsorBlock is an open-source crowdsourced browser extension and open API for skipping sponsor segments in YouTube videos. Users submit when a sponsor happens from the extension, and the extension automatically skips sponsors it knows about using a privacy-preserving query system. It also supports skipping other categories, such as intros, outros, and reminders to subscribe, and skipping to the point with highlights. The extension also features an upvote/downvote system with a weighted...
    Downloads: 25 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    UCO3D

    UCO3D

    Uncommon Objects in 3D dataset

    uCO3D is a large-scale 3D vision dataset and toolkit centered on turn-table videos of everyday objects drawn from the LVIS taxonomy. It provides about 170,000 full videos per object instance rather than still frames, along with per-video annotations including object masks, calibrated camera poses, and multiple flavors of point clouds. Each sequence also ships with a precomputed 3D Gaussian Splat reconstruction, enabling fast, differentiable rendering workflows and modern implicit/point-based modeling experiments. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Petastorm

    Petastorm

    Petastorm library enables single machine or distributed training

    ...This library enables single machine or distributed training and evaluation of deep learning models directly from datasets in Apache Parquet format. Petastorm supports popular Python-based machine learning (ML) frameworks such as Tensorflow, PyTorch, and PySpark. It can also be used from pure Python code. A dataset created using Petastorm is stored in Apache Parquet format. On top of a Parquet schema, petastorm also stores higher-level schema information that makes multidimensional arrays into a native part of a petastorm dataset. Petastorm supports extensible data codecs. These enable a user to use one of the standard data compressions (jpeg, png) or implement her own.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Karpathy-Inspired Claude Code Guidelines

    Karpathy-Inspired Claude Code Guidelines

    A single CLAUDE.md file to improve Claude Code behavior

    Karpathy-Inspired Claude Code Guidelines is a curated learning and experimentation repository inspired by the work and teaching philosophy of Andrej Karpathy, designed to help learners build practical competence in deep learning, neural networks, and AI infrastructure. The project organizes a progressive path through exercises, notebooks, code examples, and practical mini-projects that echo Karpathy’s approach to “learning by doing,” where students build core concepts from first principles...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 13
    v2ray-rules-dat

    v2ray-rules-dat

    V2Ray routing rules file enhanced version, which can replace V2Ray

    v2ray-rules-dat is a repository that compiles and distributes enhanced rule data (domain lists, geo-IP/geo-domain data, block/proxy/detect lists) intended for use with tools like V2Ray, Xray-core, and similar network/proxy frameworks. The dataset serves as an alternative or supplement to official geoip/ geosite data files, often providing more up-to-date, community-curated entries — enabling better routing, blocking, or traffic management when using those proxy tools. The repository is regularly updated (weekly sync upstream) and provides releases containing comprehensive data files (e.g. geoip.dat, geosite.dat, plus multiple .txt rule lists) plus checksums for integrity verification. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 14
    User Agents

    User Agents

    A JavaScript library for generating random user agents with data

    ...Unlike simpler random user agent generators, it uses frequency-weighted datasets to ensure that generated values reflect how browsers are actually used in the wild. The dataset is updated automatically on a daily basis, ensuring that generated user agents remain current and relevant over time. In addition to user agent strings, the library can produce detailed browser fingerprint data such as screen size, platform, connection type, and device category. It also includes flexible filtering capabilities that allow developers to generate user agents matching specific criteria such as device type, operating system, or browser version.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    LLM Datasets

    LLM Datasets

    Curated list of datasets and tools for post-training

    ...Quality is a recurring theme: examples and utilities help filter low-value samples, enforce length limits, and split train/validation consistently so results are comparable. Licensing and provenance are surfaced to encourage compliant usage and to guide dataset selection in commercial settings. For practitioners, the repo is a practical “starting pantry” that accelerates experimentation and helps keep data wrangling from dominating the project timeline.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Recursive Language Models

    Recursive Language Models

    General plug-and-play inference library for Recursive Language Models

    ...It provides a consistent API that abstracts away many of the repetitive engineering patterns in RL research and application work, letting developers focus on modeling, experimentation, and fine-tuning rather than infrastructure plumbing. Within the framework, you can define custom agents, environments, policy networks, and reward structures while leveraging built-in dataset utilities, logging, and checkpointing for reproducible experiments. RLM also includes integration with popular simulation environments and benchmark suites, giving researchers a ready-made playground for algorithm comparison and performance tracking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    fastMRI

    fastMRI

    A large open dataset + tools to speed up MRI scans using ML

    fastMRI is a large-scale collaborative research project by Facebook AI Research (FAIR) and NYU Langone Health that explores how deep learning can accelerate magnetic resonance imaging (MRI) acquisition without compromising image quality. By enabling reconstruction of high-fidelity MR images from significantly fewer measurements, fastMRI aims to make MRI scanning faster, cheaper, and more accessible in clinical settings. The repository provides an open-source PyTorch framework with data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Jraph

    Jraph

    A Graph Neural Network Library in Jax

    Jraph (pronounced “giraffe”) is a lightweight JAX library developed by Google DeepMind for building and experimenting with graph neural networks (GNNs). It provides an efficient and flexible framework for representing, manipulating, and training models on graph-structured data. The core of Jraph is the GraphsTuple data structure, which enables users to define graphs with arbitrary node, edge, and global attributes, and to batch variable-sized graphs efficiently for JAX’s just-in-time...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Augmentor.jl

    Augmentor.jl

    A fast image augmentation library in Julia for machine learning

    A fast library for increasing the number of training images by applying various transformations. Augmentor is a real-time image augmentation library designed to render the process of artificial dataset enlargement more convenient, less error prone, and easier to reproduce. It offers the user the ability to build a stochastic image-processing pipeline (or simply augmentation pipeline) using image operations as building blocks. In other words, an augmentation pipeline is little more but a sequence of operations for which the parameters can (but need not) be random variables, as the following code snippet demonstrates.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    SVoice (Speech Voice Separation)

    SVoice (Speech Voice Separation)

    We provide a PyTorch implementation of the paper Voice Separation

    ...Separate models are trained for different speaker counts, and the largest-capacity model dynamically determines the actual number of speakers in a mixture. The repository includes all necessary scripts for training, dataset preparation, distributed training, evaluation, and audio separation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    TensorFlow Examples

    TensorFlow Examples

    TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)

    ...For clarity and educational value, each example is accompanied by explanatory comments or markdown cells to illustrate what the code does and why — a design that makes it especially suitable for self-learners or students following along with real data. Besides raw implementations, the repo often shows best practices using higher-level constructs (e.g. dataset pipelines, estimators, layers) which reflect modern TensorFlow workflows rather than only textbook-style code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Spleeter

    Spleeter

    Deezer source separation library including pretrained models

    Spleeter is the Deezer source separation library with pretrained models written in Python and using Tensorflow. It makes it easy to train music source separation models (assuming you have a dataset of isolated sources), and provides already trained state of the art models for performing various flavours of separation. 2 stems and 4 stems models have state of the art performances on the musdb dataset. Spleeter is also very fast as it can perform separation of audio files to 4 stems 100x faster than real-time when run on a GPU. ...
    Downloads: 89 This Week
    Last Update:
    See Project
  • 23
    ReinventCommunity

    ReinventCommunity

    Jupyter Notebook tutorials for REINVENT 3.2

    This repository is a collection of useful jupyter notebooks, code snippets and example JSON files illustrating the use of Reinvent 3.2.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Nerfies

    Nerfies

    This is the code for Deformable Neural Radiance Fields

    ...The training pipeline handles imperfect captures by modeling camera poses, exposure variations, and background segmentation, producing stable geometry and appearance. A set of utilities manages dataset preparation, pose estimation, and checkpoints so researchers can reproduce results on their own footage. The work sits at the intersection of graphics and vision, showing how learned volumetric rendering can handle human motion without dense markers or studio rigs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    YOLO ROS

    YOLO ROS

    YOLO ROS: Real-Time Object Detection for ROS

    This is a ROS package developed for object detection in camera images. You only look once (YOLO) is a state-of-the-art, real-time object detection system. In the following ROS package, you are able to use YOLO (V3) on GPU and CPU. The pre-trained model of the convolutional neural network is able to detect pre-trained classes including the data set from VOC and COCO, or you can also create a network with your own detection objects. The YOLO packages have been tested under ROS Noetic and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB