A powerful tool for creating datasets for LLM fine-tuning
This dataset code generates mathematical question and answer pairs
Unsplash images made available for research and machine learning
Photorealistic Synthetic Dataset for Holistic Indoor Scene
JSON to DataSet and DataSet to JSON converter for Delphi and Lazarus
Passport Index 2023: visa requirements for 199 countries, in .csv
ExDARK dataset is the largest collection of low-light images
The first large-scale public benchmark dataset for image harmonization
Fluid, elastic data abstraction and acceleration for BigData/AI apps
GeoIP lookup over DAG-CBOR dataset loaded from IPFS
Framework to easily create LLM powered bots over any dataset
Dataset Management Framework, a Python library and a CLI tool to build
A dataset consists of 15,140 ChatGPT prompts from Reddit
Julia implementation of Parquet columnar file format reader
Tooling for the Common Objects In 3D dataset
Unified open dataset enabling cross-embodiment learning for robotics
Data and tools for generating and inspecting OLMo pre-training data
Easily turn large sets of image urls to an image dataset
10x faster string search, split, sort, and shuffle for long strings
Hub of ready-to-use datasets for ML models
Save and load data in the HDF5 file format from Julia
An in-memory database that persists on disk
An open source implementation of CLIP
Import public NYC taxi and for-hire vehicle (Uber, Lyft)
A tool for semi-automatic cell type classification, harmonization