Test-Time Reinforcement Learning
A modular graph-based Retrieval-Augmented Generation (RAG) system
Simple, Pythonic building blocks to evaluate LLM applications
AI Agent Evaluator & Red Team Platform
A powerful tool for automated LLM fuzzing
ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat
Tools like web browser, computer access and code runner for LLMs
Open-weight, large-scale hybrid-attention reasoning model
Free ChatGPT&DeepSeek API Key
A state-of-the-art open visual language model
Benchmark LLMs by fighting in Street Fighter 3
LongBench v2 and LongBench (ACL 25'&24')
A.S.E (AICGSecEval) is a repository-level AI-generated code security
An agentless approach to automatically solve software development
A security scanner for custom LLM applications
Run LLMs locally on Cloud Workstations
Code for Language models can explain neurons in language models paper
Ray Aviary - evaluate multiple LLMs easily
Beyond the Imitation Game collaborative benchmark for measuring
AI agent that streamlines the entire process of data analysis
AI R&D Efficiency Improvement Research: Do-It-Yourself Training LoRA
Community for applying LLMs to robotics and a robot simulator
Implements a reference architecture for creating information systems
8.5K high quality grade school math problems