A framework to enable multimodal models to operate a computer
...The framework supports features like Optical Character Recognition (OCR) and Set-of-Mark (SoM) prompting to enhance visual grounding capabilities. It is designed to be compatible with macOS, Windows, and Linux (with X server installed), and is released under the MIT license.
This project is a quest for conscious artificial intelligence. A number of prototypes will be developed as the project progresses.
This project has 2 subprojects:
Object Pascal based CAI NEURAL API - https://github.com/joaopauloschuler/neural-api
Python based K-CAI NEURAL API - https://github.com/joaopauloschuler/k-neural-api
A video from the first prototype has been made:
http://www.youtube.com/watch?v=qH-IQgYy9zg
Above video shows a popperian agent collecting mining ore from 3...
Sakura is a Knowledge Navigator and User Interface for UNIX, which implements HyperMedia and its own windowing and packing system, both in the main program and in an extensive API for Tcl and other languages.