Datumaro
Dataset Management Framework, a Python library and a CLI tool to build
Datumaro is a flexible Python-based dataset management framework and command-line tool for building, analyzing, transforming, and converting computer vision datasets in many popular formats. It supports importing and exporting annotations and images across a wide variety of standards like COCO, PASCAL VOC, YOLO, ImageNet, Cityscapes, and many more, enabling easy integration with different training pipelines and tools. Datumaro makes it easy to merge datasets, split them into training/validation/test subsets, filter or transform annotations, and validate annotation quality — all while preserving metadata and supporting detailed statistics. ...