Collaborative & Open-Source Quality Assurance for all AI models
Arcade Tool Development Kit (TDK), Worker, Evals, and CLI
Evaluation suite designed to assess the performance of LLMs
AI agent harness for AI coding agents
General proxy performance testing tool based on Clash using Telegram
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
TextWorld is a sandbox learning environment for the training
Lightweight framework for evaluating large language model performance
Autonomous harness engineering
Tools like web browser, computer access and code runner for LLMs
Build high-quality LLM apps
The open source post-building layer for agents
A framework that facilitates all stages of LLM development
A high-performance ML model serving framework, offers dynamic batching
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Learn how to develop, deploy and iterate on production-grade ML
Leaderboard Comparing LLM Performance at Producing Hallucinations
Open source platform for managing, testing, and deploying AI apps
ComfyUI wrapper nodes for WanVideo and related models
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real
A Conversational Speech Generation Model
Code repo for "WebArena to build Autonomous Agents
The most simple, flexible, and comprehensive OpenAI Gym trading
Hypergraph Transformer for Skeleton-based Action Recognition