[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
Natural Gradient Boosting for Probabilistic Prediction
Evaluate your LLM's response with Prometheus and GPT4
A Tree Search Library with Flexible API for LLM Inference-Time Scaling
Autonomous harness engineering
Open-source AI marketing skills for Claude Code
An Efficient Web-enhanced Question Answering System
Agent Zero AI framework
Open source platform for the machine learning lifecycle
CLIP, Predict the most relevant text snippet given an image
Workflow that turns every post into a calibrated experiment
RAG Search API
The highest-scoring AI memory system ever benchmarked
"VideoRAG: Chat with Your Videos
The most accurate natural language detection library for Python
The open source post-building layer for agents
Requirement-driven evaluation harness for AI agents and LLM
A specialized Claude Code workspace for creating long-form
Open-source evaluation toolkit of large multi-modality models (LMMs)
Multimodal embedding and reranking models built on Qwen3-VL
Language Model Reinforcement Learning Environments frameworks
Unified Model Serving Framework
On the Structural Pruning of Large Language Models
Uncertainty Quantification for Language Models, is a Python package
Leaderboard Comparing LLM Performance at Producing Hallucinations