| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| py_data_juicer-1.5.0-py3-none-any.whl | 2026-02-27 | 2.1 MB | |
| README.md | 2026-02-26 | 3.2 kB | |
| Release v1.5.0_ Partitioned Ray Executor_ Embodied-AI OPs_ OP-level Env Management source code.tar.gz | 2026-02-26 | 51.5 MB | |
| Release v1.5.0_ Partitioned Ray Executor_ Embodied-AI OPs_ OP-level Env Management source code.zip | 2026-02-26 | 52.4 MB | |
| Totals: 4 Items | 106.1 MB | 0 | |
Major Updates
- 📊 Stats: 244 files changed with 22,394 additions and 2,053 deletions, from 12 contributors
- 🗂️ New partitioned ray executor: [#748]
- Support data partitioning, checkpointing, event logging in ray mode.
- Improved fault tolerence, extensibility, observability, flexibility, and processing performance.
- 🤖 New OPs for embodied AI: improved processing capability to handle camera-view videos.
- 🧩 Support OP-level isolated environment maintaining in ray mode to help resolve the dependency confliction issue from different OPs. [#892]
- Allow to merge possible environments from different OPs that share common dependencies in different strategies and reuse the created environments.
- Based on ray runtime environment.
New OPs
video_camera_calibration_static_deepcalib_mapper: Compute the camera intrinsics and field of view (FOV) for a static camera using DeepCalib. [#871]video_camera_calibration_static_moge_mapper: Compute the camera intrinsics and field of view (FOV) for a static camera using Moge-2. [#871]video_undistort_mapper: Undistort raw videos with corresponding camera intrinsics and distortion coefficients. [#871]video_hand_reconstruction_hawor_mapper: Use HaWoR and MoGe-2 for hand reconstruction. [#893]video_camera_pose_mapper: Extract camera poses with MegaSaM and MoGe-2. [#894]
Enhancements
- Allow batch inference for
image_captioning_mapperto improve processing performance. [#901] - Optimize the logics of a branch by avoiding unnecessary function calls. [#903] '
- Refactor Operator Search and Metadata Extraction for Enhanced Accuracy. [#889]
- Allow to return meta infos only for extract_keyframes func and remove the sample info in error logs to reduce the size of logs. [#904]
- Reduce the memory usage in convert_to_absolute_paths func by iterating only over the specified columns. [#907]
- Reorganize the main readme and update the tutorials in the playground to the latest version. [#908]
- Optimize issue templates: emphasize English usage and add Q&A Copilot check. [#912]
- Convert abs path for dataset in object store. [#913]
Fixed Bugs
- Fix the bug to make minhash deduplicator be able to trace all duplicate items. [#906]
- Fix the "multiple values for num_proc" bug in TextFormmater. [#905]
- Fix the homepage rendering issue and remove outdated OP docs. [#910]
- Fix several bugs in test stability and robustness. [#918]
Acknowledgement
- @dubin555 helps to improve the processing performance of some OPs and funcs. [#901] [#903]
- @HunterLine helps to fix a bug in minhash deduplicator to trace all duplicate items. [#906]
New Contributors
- @HunterLine made their first contribution in https://github.com/datajuicer/data-juicer/pull/906
- @Dludora made their first contribution in https://github.com/datajuicer/data-juicer/pull/907
All Contributors
@HYLcool @dubin555 @claude @Qirui-jiao @cmgzn @Cathy0908 @Dludora @yxdyc @gemini-code-assist @HunterLine @ext.wanghao204 @cyruszhang
Full Changelog: https://github.com/datajuicer/data-juicer/compare/v1.4.6...v1.5.0