Fast stable diffusion on CPU and AI PC
Find the local LLM that actually runs and performs best
A Heterogeneous Benchmark for Information Retrieval
Geometric deep learning extension library for PyTorch
ChatGLM2-6B: An Open Bilingual Chat LLM
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Agentic, Reasoning, and Coding (ARC) foundation models
A.S.E (AICGSecEval) is a repository-level AI-generated code security
Oobabooga - The definitive Web UI for local AI, with powerful features
Benchmarking synthetic data generation methods
Code for the paper "Evaluating Large Language Models Trained on Code"
LongBench v2 and LongBench (ACL 25'&24')
Visual Causal Flow
MTEB: Massive Text Embedding Benchmark
Meta Agents Research Environments is a comprehensive platform
A TTS that fits in your CPU (and pocket)
Code for running inference and finetuning with SAM 3 model
State-of-the-art TTS model under 25MB
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Leaderboard Comparing LLM Performance at Producing Hallucinations
Python-based neural networks API
Benchmark LLMs by fighting in Street Fighter 3
Autonomous harness engineering
Provider-agnostic, open-source evaluation infrastructure
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model