Requirement-driven evaluation harness for AI agents and LLM
A specialized Claude Code workspace for creating long-form
Multimodal embedding and reranking models built on Qwen3-VL
Language Model Reinforcement Learning Environments frameworks
Unified Model Serving Framework
#1 Persistent memory for AI coding agents
On the Structural Pruning of Large Language Models
Uncertainty Quantification for Language Models, is a Python package
Leaderboard Comparing LLM Performance at Producing Hallucinations
A batteries-included library for building AI-powered software
Provider-agnostic, open-source evaluation infrastructure
Test and evaluate LLMs and model configurations
Code for Language models can explain neurons in language models paper
AI-powered code reviews for GitLab & Azure DevOps. Zero setup. Powered
Beyond the Imitation Game collaborative benchmark for measuring
Your Automatic Prompt Engineering Assistant for GenAI Applications
Learning to rank in TensorFlow
Natural language detection library for Rust
Jupyter Notebook tutorials for REINVENT 3.2
Platform of neural models for natural language processing
Generate embeddings from large-scale graph-structured data
TextRank implementation for Python 3
JavaScript coding game where players program a warrior to win battles
Beautiful visualizations of how language differs among document types
Lexicon and rule-based sentiment analysis tool