Download Latest Version 4.8.4 source code.tar.gz (2.2 MB)
Email in envelope

Get an email when there's a new version of Datasets

Home / 4.8.0
Name Modified Size InfoDownloads / Week
Parent folder
4.8.0 source code.tar.gz 2026-03-16 2.2 MB
4.8.0 source code.zip 2026-03-16 2.4 MB
README.md 2026-03-16 2.0 kB
Totals: 3 Items   4.5 MB 0

Dataset Features

python from datasets import load_dataset # load raw data from a Storage Bucket on HF ds = load_dataset("buckets/username/data-bucket", data_files=["*.jsonl"]) # or manually, using hf:// paths ds = load_dataset("json", data_files=["hf://buckets/username/data-bucket/*.jsonl"]) # process, filter ds = ds.map(...).filter(...) # publish the AI-ready dataset ds.push_to_hub("username/my-dataset-ready-for-training")

This also fixes multiprocessed push_to_hub on macos that was causing segfault (now it uses spawn instead of fork). And it bumps dill and multiprocess versions to support python 3.14

  • Datasets streaming iterable packaged improvements and fixes by @Michael-RDev in https://github.com/huggingface/datasets/pull/8068
  • added max_shard_size to IterableDataset.push_to_hub
  • more arrow-native iterable operations for IterableDataset
  • better support of glob patterns in archives, e.g. zip://*.jsonl::hf://datasets/username/dataset-name/data.zip
  • fixes for to_pandas, videofolder, load_dataset_builder kwargs

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/datasets/compare/4.7.0...4.8.0

Source: README.md, updated 2026-03-16