A powerful tool for creating datasets for LLM fine-tuning
This dataset code generates mathematical question and answer pairs
Unsplash images made available for research and machine learning
Photorealistic Synthetic Dataset for Holistic Indoor Scene
JSON to DataSet and DataSet to JSON converter for Delphi and Lazarus
Passport Index 2023: visa requirements for 199 countries, in .csv
ExDARK dataset is the largest collection of low-light images
The first large-scale public benchmark dataset for image harmonization
Fluid, elastic data abstraction and acceleration for BigData/AI apps
GeoIP lookup over DAG-CBOR dataset loaded from IPFS
Framework to easily create LLM powered bots over any dataset
A dataset consists of 15,140 ChatGPT prompts from Reddit
Julia implementation of Parquet columnar file format reader
Dataset Management Framework, a Python library and a CLI tool to build
Tooling for the Common Objects In 3D dataset
Unified open dataset enabling cross-embodiment learning for robotics
Data and tools for generating and inspecting OLMo pre-training data
Hub of ready-to-use datasets for ML models
Image polygonal annotation with Python
A list of online news & info sources in the AI/ML/Data Science space
Easily turn large sets of image urls to an image dataset
An in-memory database that persists on disk
Synthetic data curation for post-training and data extraction
Save and load data in the HDF5 file format from Julia
An open source implementation of CLIP