Download Latest Version 4.4.1 source code.tar.gz (2.2 MB)
Email in envelope

Get an email when there's a new version of Datasets

Home / 4.2.0
Name Modified Size InfoDownloads / Week
Parent folder
4.2.0 source code.tar.gz 2025-10-09 2.1 MB
4.2.0 source code.zip 2025-10-09 2.2 MB
README.md 2025-10-09 2.1 kB
Totals: 3 Items   4.3 MB 0

Dataset Features

python ds = interleave_datasets(datasets, stopping_strategy="all_exhausted_without_replacement")

python ds = load_dataset(parquet_dataset_id, on_bad_files="warn")

python ds = load_dataset(parquet_dataset_id, columns=["col_0", "col_1"]) ds = load_dataset(parquet_dataset_id, filters=[("col_0", "==", 0)]) * new argument to control buffering and caching when streaming

python fragment_scan_options = pyarrow.dataset.ParquetFragmentScanOptions(cache_options=pyarrow.CacheOptions(prefetch_limit=1, range_size_limit=128 << 20)) ds = load_dataset(parquet_dataset_id, streaming=True, fragment_scan_options=fragment_scan_options)

What's Changed

New Contributors

Full Changelog: https://github.com/huggingface/datasets/compare/4.1.1...4.2.0

Source: README.md, updated 2025-10-09