Evaluate your LLM's response with Prometheus and GPT4
An Efficient Web-enhanced Question Answering System
Open-source evaluation toolkit of large multi-modality models (LMMs)
The open source post-building layer for agents
On the Structural Pruning of Large Language Models
Uncertainty Quantification for Language Models, is a Python package
Leaderboard Comparing LLM Performance at Producing Hallucinations
Test and evaluate LLMs and model configurations
Code for Language models can explain neurons in language models paper
Beyond the Imitation Game collaborative benchmark for measuring