Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
mineru-2.5.0-py3-none-any.whl | 2025-09-19 | 1.3 MB | |
mineru-2.5.0-released source code.tar.gz | 2025-09-18 | 7.1 MB | |
mineru-2.5.0-released source code.zip | 2025-09-18 | 7.2 MB | |
README.md | 2025-09-18 | 6.0 kB | |
Totals: 4 Items | 15.6 MB | 2 |
What's Changed
- 2025/09/19 2.5.0 Released
We are officially releasing MinerU2.5, currently the most powerful multimodal large model for document parsing. With only 1.2B parameters, MinerU2.5's accuracy on the OmniDocBench benchmark comprehensively surpasses top-tier multimodal models like Gemini 2.5 Pro, GPT-4o, and Qwen2.5-VL-72B. It also significantly outperforms leading specialized models such as dots.ocr, MonkeyOCR, and PP-StructureV3. The model has been released on HuggingFace and ModelScope platforms. Welcome to download and use! - Core Highlights: - SOTA Performance with Extreme Efficiency: As a 1.2B model, it achieves State-of-the-Art (SOTA) results that exceed models in the 10B and 100B+ classes, redefining the performance-per-parameter standard in document AI. - Advanced Architecture for Across-the-Board Leadership: By combining a two-stage inference pipeline (decoupling layout analysis from content recognition) with a native high-resolution architecture, it achieves SOTA performance across five key areas: layout analysis, text recognition, formula recognition, table recognition, and reading order. - Key Capability Enhancements: - Layout Detection: Delivers more complete results by accurately covering non-body content like headers, footers, and page numbers. It also provides more precise element localization and natural format reconstruction for lists and references. - Table Parsing: Drastically improves parsing for challenging cases, including rotated tables, borderless/semi-structured tables, and long/complex tables. - Formula Recognition: Significantly boosts accuracy for complex, long-form, and hybrid Chinese-English formulas, greatly enhancing the parsing capability for mathematical documents.
Additionally, with the release of vlm 2.5, we have made some adjustments to the repository:
- The vlm backend has been upgraded to version 2.5, supporting the MinerU2.5 model and no longer compatible with the MinerU2.0-2505-0.9B model. The last version supporting the 2.0 model is mineru-2.2.2.
- VLM inference-related code has been moved to mineru_vl_utils, reducing coupling with the main mineru repository and facilitating independent iteration in the future.
- The vlm accelerated inference framework has been switched from sglang
to vllm
, achieving full compatibility with the vllm ecosystem, allowing users to use the MinerU2.5 model and accelerated inference on any platform that supports the vllm framework.
- Due to major upgrades in the vlm model supporting more layout types, we have made some adjustments to the structure of the parsing intermediate file middle.json
and result file content_list.json
. Please refer to the documentation for details.
Other repository optimizations: - Removed file extension whitelist validation for input files. When input files are PDF documents or images, there are no longer requirements for file extensions, improving usability.
- 2025/09/19 2.5.0 发布 我们正式发布 MinerU2.5,当前最强文档解析多模态大模型。仅凭 1.2B 参数,MinerU2.5 在 OmniDocBench 文档解析评测中,精度已全面超越 Gemini2.5-Pro、GPT-4o、Qwen2.5-VL-72B等顶级多模态大模型,并显著领先于主流文档解析专用模型(如 dots.ocr, MonkeyOCR, PP-StructureV3 等)。 模型已发布至HuggingFace和ModelScope平台,欢迎大家下载使用!
- 核心亮点
- 极致能效,性能SOTA: 以 1.2B 的轻量化规模,实现了超越百亿乃至千亿级模型的SOTA性能,重新定义了文档解析的能效比。
- 先进架构,全面领先: 通过 “两阶段推理” (解耦布局分析与内容识别) 与 原生高分辨率架构 的结合,在布局分析、文本识别、公式识别、表格识别及阅读顺序五大方面均达到 SOTA 水平。
- 关键能力提升
- 布局检测: 结果更完整,精准覆盖页眉、页脚、页码等非正文内容;同时提供更精准的元素定位与更自然的格式还原(如列表、参考文献)。
- 表格解析: 大幅优化了对旋转表格、无线/少线表、以及长难表格的解析能力。
- 公式识别: 显著提升中英混合公式及复杂长公式的识别准确率,大幅改善数学类文档解析能力。
此外,伴随vlm 2.5的发布,我们对仓库做出一些调整:
- vlm后端升级至2.5版本,支持MinerU2.5模型,不再兼容MinerU2.0-2505-0.9B模型,最后一个支持2.0模型的版本为mineru-2.2.2。
- vlm推理相关代码已移至mineru_vl_utils,降低与mineru主仓库的耦合度,便于后续独立迭代。
- vlm加速推理框架从sglang
切换至vllm
,并实现对vllm生态的完全兼容,使得用户可以在任何支持vllm框架的平台上使用MinerU2.5模型并加速推理。
- 由于vlm模型的重大升级,支持更多layout type,因此我们对解析的中间文件middle.json
和结果文件content_list.json
的结构做出一些调整,请参考文档了解详情。
其他仓库优化: - 移除对输入文件的后缀名白名单校验,当输入文件为PDF文档或图片时,对文件的后缀名不再有要求,提升易用性。
New Contributors
- @e06084 made their first contribution in https://github.com/opendatalab/MinerU/pull/3489
Full Changelog: https://github.com/opendatalab/MinerU/compare/mineru-2.2.2-released...mineru-2.5.0-released