This repo contains the code for 1D tokenizer and generator
A framework to enable multimodal models to operate a computer
Witness the aha moment of VLM with less than $3
An open phone agent model & framework
LTX-Video Support for ComfyUI
Reference PyTorch implementation and models for DINOv3
Unified Multimodal Understanding and Generation Models
Extensible workflow development framework
The most powerful Android RPA agent framework
The library to build & auto-optimize LLM applications
Official implementation of Watermark Anything with Localized Messages
Just a Better Chatbot. Powered by MCP Client & Workflows
SAPIEN Manipulation Skill Framework
Python inference and LoRA trainer package for the LTX-2 audio–video
Generating Immersive, Explorable, and Interactive 3D Worlds
Lightning fast C++/CUDA neural network framework
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Open source MVVM framework for Web Apps
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Open-source framework for conversational voice AI agents
Interactively analyze ML models to understand their behavior
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Build programmatically custom agentic workflows, AI Agents, RAG system
Virtual AI anchor that combines state-of-the-art technology
Taming Stable Diffusion for Lip Sync