Download Latest Version 0.5.1.post1 source code.tar.gz (1.9 MB)
Email in envelope

Get an email when there's a new version of OpenCompass

Home / 0.5.0
Name Modified Size InfoDownloads / Week
Parent folder
0.5.0 source code.tar.gz 2025-09-01 1.8 MB
0.5.0 source code.zip 2025-09-01 3.9 MB
README.md 2025-09-01 4.2 kB
Totals: 3 Items   5.8 MB 0

OpenCompass v0.5.0 Release Notes

🌟 Highlights

✨ ​Comprehensive Scientific Benchmarks: Integrated 10+ specialized datasets (MedXpertQA, ClimaQA, SmolInstruct, etc.), covering multiple scientific fields such as chemistry, physics, biology, and earth sciences ✨ Cascade Evaluator: Supported cascading eval methods from rules to LLM judgments. ✨ New Runner: Supported for Rjob Runner has now been completed. ✨ OpenAISDK Streaming: Provided a more stable OpenAI API method. ✨ New Evaluation Examples: ο»ΏPublished the real-time evaluation config of CompassAcademic Leaderboard and the Intern-S1 related benchmark evaluation config.


πŸš€ New Features

πŸ”§ Cascade Evaluator (#1992) πŸ”§ Rjob Runner (#2144) πŸ”§ OpenAISDK Streaming (#2208) πŸ”§ Evaluation Example for CompassAcademic Leaderboard. (#2202) πŸ”§ Evaluation Example for Intern-S1 and Scientific Benchmarks. (#2220) πŸ”§ So Many New Scientific Datasets! 1. MedXpertQA for expert-level medical knowledge evaluation (#2002)
2. ClimaQA for climate question evaluation (#2017) 3. HealthBench for better measuring capabilities of AI systems for health (#2099) 4. ProteinLMBench for protein related tasks (#2064) ...


πŸ“– Documentation

πŸ“ Fixed 404 links between Chinese/English docs (#2001)
πŸ“ Added CompassAcademic Leaderboard task tutorial (#2202) πŸ“ Added Intern-S1 evaluation task tutorial (#2220) πŸ“ Fixed format problems of the dataset statistics page (#2170) πŸ“ Align NIAH CLI command guide to the actual CLI argument parser (#2194) πŸ“ Set correct paths for the examples (#2198)


πŸ›Bug Fixes

πŸ”§ Fixed compare error base_evaluator (#2010) πŸ”§ Fixed OpenICL Math Evaluator Config (#2007) πŸ”§ Added Error Case for content filter (#2167) πŸ”§ Fix the OpenAI SDK to adapt to gpt-5 (#2236) πŸ”§ Fixed dataset repeat by concatenating (#2039) πŸ”§ Concat OpenaiSDK reasoning content (#2041)


βš™ Enhancements and Refactors###

βš™ Infrastructure Refactors: - Set dump-eval-details as default behavior (#1999)
- Refactorized openicl eval task (#1990)
- Added openai_extra_kwargs for API customization (#2210)

βš™ CI/CD Improvements: - Fixed baseline score (#2000) - Updated baseline for kernal change of vllm and lmdeploy (#2011) - Updated baseline and fix lmdeploy version (#2098) - Added check rule (#2101) - Updated testcases' baseline (#2184) ...


πŸŽ‰ Welcome New Contributors

A warm welcome to our newest contributors: - @Yejin0111 for MedXpertQA clinical dataset (#2002) - @smgjch for matbench development (#2021) - @taolinzhang for rewardbench dataset (#2029) - @xiexinch for fixing lawbench evaluation (#2037) - @xuxuxuxuxuxjh for ClinicBench, PubMedQA and ScienceQA datasets (#2061) - @mar-cry for NEJM AI benchmark (#2063) - @bio-mlhui for CARDBiomedBench dataset (#2071) - @tchenglv520 for Lifescience subset support for MMLU & SciEval (#2059) - @Flaick for MMLU Pro Biomedical version support (#2081) - @yuehua-s for o4-mini model (#2083) - @kkscilife for adding CI check rule (#2101) - @yusun-nlp for SmolInstruct dataset (#2127) - @soki123 for SRbench dataset (#2105) - @suencgo for PHYBench dataset (#2125) - @Zhouzone for updating Earth Silver benchmark (#2140) - @uyzhang for R-Bench dataset (ICML 2025) (#2091) - @f14-bertolotti for stabilizing MBPP evaluation (#2111) - @fly2tomato for debugging Rjob runner (#2171) - @debuggingworld for fixing Qwen3 model config field error (#2152) - @blueternalness for aligning NIAH CLI command guide (#2194) - @KADCA21 for BlueLM-2.5 API (#2193) - @dbinthesky for KCLE feature (#2224) - @FarongWen for EESE dataset and configs (#2223) - @mazihan880 for CodeCompass dataset and configs (#2214)


Full Changelog: https://github.com/open-compass/opencompass/compare/0.4.2...0.5.0

Thank you for using OpenCompass! These updates empower deeper insights and more reliable evaluations. Keep exploring, and stay tuned for future innovations! 🌟

Source: README.md, updated 2025-09-01