| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| 4.8.0 source code.tar.gz | 2026-03-16 | 2.2 MB | |
| 4.8.0 source code.zip | 2026-03-16 | 2.4 MB | |
| README.md | 2026-03-16 | 2.0 kB | |
| Totals: 3 Items | 4.5 MB | 0 | |
Dataset Features
- Read (and write) from HF Storage Buckets: load raw data, process and save to Dataset Repos by @lhoestq in https://github.com/huggingface/datasets/pull/8064
python
from datasets import load_dataset
# load raw data from a Storage Bucket on HF
ds = load_dataset("buckets/username/data-bucket", data_files=["*.jsonl"])
# or manually, using hf:// paths
ds = load_dataset("json", data_files=["hf://buckets/username/data-bucket/*.jsonl"])
# process, filter
ds = ds.map(...).filter(...)
# publish the AI-ready dataset
ds.push_to_hub("username/my-dataset-ready-for-training")
This also fixes multiprocessed push_to_hub on macos that was causing segfault (now it uses spawn instead of fork).
And it bumps dill and multiprocess versions to support python 3.14
- Datasets streaming iterable packaged improvements and fixes by @Michael-RDev in https://github.com/huggingface/datasets/pull/8068
- added
max_shard_sizeto IterableDataset.push_to_hub - more arrow-native iterable operations for IterableDataset
- better support of glob patterns in archives, e.g.
zip://*.jsonl::hf://datasets/username/dataset-name/data.zip - fixes for to_pandas, videofolder, load_dataset_builder kwargs
What's Changed
- fix reshard_data_sources by @lhoestq in https://github.com/huggingface/datasets/pull/8061
- Improve error message for invalid data_files pattern format by @kushalkkb in https://github.com/huggingface/datasets/pull/8060
- fix null filling in missing jsonl columns by @lhoestq in https://github.com/huggingface/datasets/pull/8069
New Contributors
- @kushalkkb made their first contribution in https://github.com/huggingface/datasets/pull/8060
- @Michael-RDev made their first contribution in https://github.com/huggingface/datasets/pull/8068
Full Changelog: https://github.com/huggingface/datasets/compare/4.7.0...4.8.0