Download Latest Version mineru-2.5.4-released source code.tar.gz (7.1 MB)
Email in envelope

Get an email when there's a new version of MinerU

Home / mineru-2.5.0-released
Name Modified Size InfoDownloads / Week
Parent folder
mineru-2.5.0-py3-none-any.whl 2025-09-19 1.3 MB
mineru-2.5.0-released source code.tar.gz 2025-09-18 7.1 MB
mineru-2.5.0-released source code.zip 2025-09-18 7.2 MB
README.md 2025-09-18 6.0 kB
Totals: 4 Items   15.6 MB 2

What's Changed

  • 2025/09/19 2.5.0 Released

We are officially releasing MinerU2.5, currently the most powerful multimodal large model for document parsing. With only 1.2B parameters, MinerU2.5's accuracy on the OmniDocBench benchmark comprehensively surpasses top-tier multimodal models like Gemini 2.5 Pro, GPT-4o, and Qwen2.5-VL-72B. It also significantly outperforms leading specialized models such as dots.ocr, MonkeyOCR, and PP-StructureV3. The model has been released on HuggingFace and ModelScope platforms. Welcome to download and use! - Core Highlights: - SOTA Performance with Extreme Efficiency: As a 1.2B model, it achieves State-of-the-Art (SOTA) results that exceed models in the 10B and 100B+ classes, redefining the performance-per-parameter standard in document AI. - Advanced Architecture for Across-the-Board Leadership: By combining a two-stage inference pipeline (decoupling layout analysis from content recognition) with a native high-resolution architecture, it achieves SOTA performance across five key areas: layout analysis, text recognition, formula recognition, table recognition, and reading order. - Key Capability Enhancements: - Layout Detection: Delivers more complete results by accurately covering non-body content like headers, footers, and page numbers. It also provides more precise element localization and natural format reconstruction for lists and references. - Table Parsing: Drastically improves parsing for challenging cases, including rotated tables, borderless/semi-structured tables, and long/complex tables. - Formula Recognition: Significantly boosts accuracy for complex, long-form, and hybrid Chinese-English formulas, greatly enhancing the parsing capability for mathematical documents.

Additionally, with the release of vlm 2.5, we have made some adjustments to the repository: - The vlm backend has been upgraded to version 2.5, supporting the MinerU2.5 model and no longer compatible with the MinerU2.0-2505-0.9B model. The last version supporting the 2.0 model is mineru-2.2.2. - VLM inference-related code has been moved to mineru_vl_utils, reducing coupling with the main mineru repository and facilitating independent iteration in the future. - The vlm accelerated inference framework has been switched from sglang to vllm, achieving full compatibility with the vllm ecosystem, allowing users to use the MinerU2.5 model and accelerated inference on any platform that supports the vllm framework. - Due to major upgrades in the vlm model supporting more layout types, we have made some adjustments to the structure of the parsing intermediate file middle.json and result file content_list.json. Please refer to the documentation for details.

Other repository optimizations: - Removed file extension whitelist validation for input files. When input files are PDF documents or images, there are no longer requirements for file extensions, improving usability.

  • 2025/09/19 2.5.0 发布 我们正式发布 MinerU2.5,当前最强文档解析多模态大模型。仅凭 1.2B 参数,MinerU2.5 在 OmniDocBench 文档解析评测中,精度已全面超越 Gemini2.5-Pro、GPT-4o、Qwen2.5-VL-72B等顶级多模态大模型,并显著领先于主流文档解析专用模型(如 dots.ocr, MonkeyOCR, PP-StructureV3 等)。 模型已发布至HuggingFaceModelScope平台,欢迎大家下载使用!
  • 核心亮点
    • 极致能效,性能SOTA: 以 1.2B 的轻量化规模,实现了超越百亿乃至千亿级模型的SOTA性能,重新定义了文档解析的能效比。
    • 先进架构,全面领先: 通过 “两阶段推理” (解耦布局分析与内容识别) 与 原生高分辨率架构 的结合,在布局分析、文本识别、公式识别、表格识别及阅读顺序五大方面均达到 SOTA 水平。
  • 关键能力提升
    • 布局检测: 结果更完整,精准覆盖页眉、页脚、页码等非正文内容;同时提供更精准的元素定位与更自然的格式还原(如列表、参考文献)。
    • 表格解析: 大幅优化了对旋转表格、无线/少线表、以及长难表格的解析能力。
    • 公式识别: 显著提升中英混合公式及复杂长公式的识别准确率,大幅改善数学类文档解析能力。

此外,伴随vlm 2.5的发布,我们对仓库做出一些调整: - vlm后端升级至2.5版本,支持MinerU2.5模型,不再兼容MinerU2.0-2505-0.9B模型,最后一个支持2.0模型的版本为mineru-2.2.2。 - vlm推理相关代码已移至mineru_vl_utils,降低与mineru主仓库的耦合度,便于后续独立迭代。 - vlm加速推理框架从sglang切换至vllm,并实现对vllm生态的完全兼容,使得用户可以在任何支持vllm框架的平台上使用MinerU2.5模型并加速推理。 - 由于vlm模型的重大升级,支持更多layout type,因此我们对解析的中间文件middle.json和结果文件content_list.json的结构做出一些调整,请参考文档了解详情。

其他仓库优化: - 移除对输入文件的后缀名白名单校验,当输入文件为PDF文档或图片时,对文件的后缀名不再有要求,提升易用性。

New Contributors

Full Changelog: https://github.com/opendatalab/MinerU/compare/mineru-2.2.2-released...mineru-2.5.0-released

Source: README.md, updated 2025-09-18