[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
Natural Gradient Boosting for Probabilistic Prediction
Evaluate your LLM's response with Prometheus and GPT4
A Tree Search Library with Flexible API for LLM Inference-Time Scaling
Autonomous harness engineering
Open-source AI marketing skills for Claude Code
CTFs as you need them
An Efficient Web-enhanced Question Answering System
Agent Zero AI framework
A 7-layer memory operating system for Hermes Agent
Open source platform for the machine learning lifecycle
A simple tool for reading in poorly redacted documents
CLIP, Predict the most relevant text snippet given an image
Workflow that turns every post into a calibrated experiment
RAG Search API
The highest-scoring AI memory system ever benchmarked
Feature Store for Machine Learning
"VideoRAG: Chat with Your Videos
The most accurate natural language detection library for Python
The open source post-building layer for agents
Requirement-driven evaluation harness for AI agents and LLM
A specialized Claude Code workspace for creating long-form
Open-source evaluation toolkit of large multi-modality models (LMMs)
Multimodal embedding and reranking models built on Qwen3-VL
Language Model Reinforcement Learning Environments frameworks