benchmark free download

AutoAgent AI

Autonomous harness engineering

...Instead of manually tuning prompts or workflows, developers define high-level goals in a configuration file, and the system continuously modifies its own tools, orchestration, and logic based on benchmark performance. It operates through a loop of testing, analyzing failures, and refining the agent’s configuration to maximize a scoring metric. The framework uses a single-file agent harness combined with structured tasks and evaluation suites to guide optimization. It runs inside Docker for safe execution and reproducibility. This approach shifts agent development from manual design to automated optimization. ...

Downloads: 2 This Week

Last Update: 2026-04-28

See Project

CUDA Agent

Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

...The system operates in a ReAct-style loop where the agent profiles baseline implementations, writes CUDA code, compiles it in a sandbox, and iteratively refines performance. CUDA-Agent has demonstrated strong benchmark results, achieving high pass rates and significant speedups compared with compiler baselines such as torch.compile.

Downloads: 0 This Week

Last Update: 2026-03-03

See Project

Diplomacy Cicero

Code for Cicero, an AI agent that plays the game of Diplomacy

...The codebase is implemented primarily in Python with performance-critical components in C++ (via pybind11 bindings) and is configured to run in a high‐GPU cluster environment. Configuration is managed via protobuf files to define tasks such as self-play, benchmark agent comparisons, and RL training. The project is now archived and read-only, reflecting that it is no longer actively developed but remains publicly available for research use.

Downloads: 2 This Week

Last Update: 2 days ago

See Project

Agentex

Open source codebase for Scale Agentex

AgentEX is an open framework from Scale for building, running, and evaluating agentic workflows, with an emphasis on reproducibility and measurable outcomes rather than ad-hoc demos. It treats an “agent” as a composition of a policy (the LLM), tools, memory, and an execution runtime so you can test the whole loop, not just prompting. The repo focuses on structured experiments: standardized tasks, canonical tool interfaces, and logs that make it possible to compare models, prompts, and tool...

Downloads: 0 This Week

Last Update: 5 days ago

See Project

Habitat-Lab

A modular high-level library to train embodied AI agents

...Providing algorithms for single and multi-agent training (via imitation or reinforcement learning, or no learning at all as in SensePlanAct pipelines), as well as tools to benchmark their performance on the defined tasks using standard metrics.

Downloads: 0 This Week

Last Update: 2026-05-07

See Project

Agent S

Agent S: an open agentic framework that uses computers like a human

...Built to operate graphical user interfaces like a human, it allows AI agents to perceive screens, reason about tasks, and execute actions across macOS, Windows, and Linux systems. The latest version, Agent S3, surpasses human-level performance on the OSWorld benchmark, demonstrating state-of-the-art results in complex multi-step computer tasks. Agent S combines powerful foundation models (such as GPT-5) with grounding models like UI-TARS to translate visual inputs into precise executable actions. It supports flexible deployment via CLI, SDK, or cloud, and integrates with multiple model providers including OpenAI, Anthropic, Gemini, Azure, and Hugging Face endpoints. ...

Downloads: 0 This Week

Last Update: 2025-12-16

See Project

OWL

Optimized Workforce Learning for General Multi-Agent Assistance

Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation. OWL (Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation) is an advanced framework designed to enhance multi-agent collaboration, improving task automation across various domains. By utilizing dynamic agent interactions, OWL aims to streamline and optimize complex workflows, making AI collaboration more natural, efficient, and adaptable. It is built on...

1 Review

Downloads: 0 This Week

Last Update: 2025-03-13

See Project

Search Results for "benchmark"

Showing 7 open source projects for "benchmark"

AutoAgent AI

CUDA Agent

Diplomacy Cicero

Agentex

Habitat-Lab

Agent S

OWL

Search Results for "benchmark"

Showing 7 open source projects for "benchmark"

AutoAgent AI

CUDA Agent

Diplomacy Cicero

Agentex

Habitat-Lab

Agent S

OWL

Related Searches

Related Categories