Evaluate your LLM's response with Prometheus and GPT4
CyberStrikeAI is an AI-native security testing platform built in Go
Autoresearch-inspired autonomous skill optimization for Claude Code
Autonomous harness engineering
A skill file for removing AI tells from prose
An open-source visual programming environment
An AI-based intelligent recipe generation platform
Open-source AI marketing skills for Claude Code
An Efficient Web-enhanced Question Answering System
Open source platform for the machine learning lifecycle
CLIP, Predict the most relevant text snippet given an image
Memory engine and app that is extremely fast, scalable
The highest-scoring AI memory system ever benchmarked
Workflow that turns every post into a calibrated experiment
Open-source evaluation toolkit of large multi-modality models (LMMs)
RAG Search API
Your agent writes bad React
The open source post-building layer for agents
Requirement-driven evaluation harness for AI agents and LLM
A specialized Claude Code workspace for creating long-form
Multimodal embedding and reranking models built on Qwen3-VL
Language Model Reinforcement Learning Environments frameworks
On the Structural Pruning of Large Language Models
Uncertainty Quantification for Language Models, is a Python package
Leaderboard Comparing LLM Performance at Producing Hallucinations