| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| 4.3.0 source code.tar.gz | 2025-10-23 | 2.1 MB | |
| 4.3.0 source code.zip | 2025-10-23 | 2.2 MB | |
| README.md | 2025-10-23 | 1.7 kB | |
| Totals: 3 Items | 4.3 MB | 2 | |
Dataset Features
Enable large scale distributed dataset streaming:
- Keep hffs cache in workers when streaming by @lhoestq in https://github.com/huggingface/datasets/pull/7820
- Retry open hf file by @lhoestq in https://github.com/huggingface/datasets/pull/7822
These improvements require huggingface_hub>=0.36.0 to take full effect
What's Changed
- fix conda deps by @lhoestq in https://github.com/huggingface/datasets/pull/7810
- Add pyarrow's binary view to features by @delta003 in https://github.com/huggingface/datasets/pull/7795
- Fix polars cast column image by @CloseChoice in https://github.com/huggingface/datasets/pull/7800
- Allow streaming hdf5 files by @lhoestq in https://github.com/huggingface/datasets/pull/7814
- Fix batch_size default description in to_polars docstrings by @albertvillanova in https://github.com/huggingface/datasets/pull/7824
- docs: document_dataset PDFs & OCR by @ethanknights in https://github.com/huggingface/datasets/pull/7812
- Add custom fingerprint support to
from_generatorby @simonreise in https://github.com/huggingface/datasets/pull/7533 - picklable batch_fn by @lhoestq in https://github.com/huggingface/datasets/pull/7826
New Contributors
- @delta003 made their first contribution in https://github.com/huggingface/datasets/pull/7795
- @CloseChoice made their first contribution in https://github.com/huggingface/datasets/pull/7800
- @ethanknights made their first contribution in https://github.com/huggingface/datasets/pull/7812
- @simonreise made their first contribution in https://github.com/huggingface/datasets/pull/7533
Full Changelog: https://github.com/huggingface/datasets/compare/4.2.0...4.3.0