Cradle is an open-source framework designed to enable AI agents to perform complex computer tasks by interacting with software environments in a way similar to human users. The system introduces the concept of General Computer Control, where AI agents receive screenshots as input and perform actions through simulated keyboard and mouse operations. This approach allows agents to interact with any software interface without relying on specialized APIs or predefined automation scripts. The framework integrates reasoning, planning, and memory modules that help the agent understand its environment and execute long sequences of actions. Cradle agents are capable of performing tasks across a wide variety of environments, including computer applications and video games, demonstrating the generality of the approach. The architecture includes modules that allow agents to observe their environment, reflect on past actions, plan future steps, and accumulate useful skills for later tasks.
Features
- Framework for General Computer Control using screenshots as input and keyboard or mouse actions as output
- Modular architecture with reasoning, planning, and skill learning components
- Ability to interact with arbitrary software without specialized APIs
- Support for long-horizon tasks requiring multi-step decision making
- Execution across diverse environments including productivity software and games
- Skill accumulation system allowing agents to reuse learned strategies