OmAgent
Build multimodal language agents for fast prototype and production
...The framework provides abstractions and infrastructure for building AI agents that operate on text, images, video, and audio while maintaining a relatively simple interface for developers. Instead of forcing developers to implement complex orchestration logic manually, the system manages task scheduling, worker coordination, and node optimization behind the scenes. Its architecture uses a graph-based workflow engine where tasks are represented as nodes in a directed workflow, enabling modular composition of complex reasoning pipelines. The framework also includes support for various reasoning strategies commonly used in language agents, such as chain-of-thought prompting, self-consistency reasoning, and ReAct-style decision loops.