LLM Vision is an open-source integration for Home Assistant that adds multimodal large language model capabilities to smart home environments. The project enables Home Assistant to analyze images, video files, and live camera feeds using vision-capable AI models. Instead of relying only on traditional object detection pipelines, it allows users to send prompts about visual content and receive contextual descriptions or answers about what is happening in camera footage. The system can process events from surveillance platforms such as Frigate and convert them into meaningful summaries, notifications, or structured data for automation workflows. It also maintains a timeline of analyzed camera events that can be displayed in dashboards or queried through the assistant interface.
Features
- Multimodal analysis of images, video files, and live camera streams
- Integration with Home Assistant and surveillance tools such as Frigate
- Natural language prompts to query visual events or camera snapshots
- Timeline tracking of analyzed events for dashboards and history views
- Memory capability to recognize people, pets, and objects across events
- Support for multiple AI providers and OpenAI-compatible endpoints