A suite of advanced multi-modal LLMs
Director, Screenwriter, Producer, and Video Generator All-in-One
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Gemma open-weight LLM library, from Google DeepMind
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Build cross-modal and multimodal applications on the cloud
Build LLM powered Agents and "Agentic workflows"
Build multi-modal Agents with memory, knowledge, tools and reasoning
A tool to use the Ai2 Open Coding Agents Soft-Verified Agents
The all-in-one Desktop & Docker AI application with full RAG and AI
Fast and customizable framework for automatic ML model creation
Qwen3-omni is a natively end-to-end, omni-modal LLM
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
High-performance, multiplayer code editor from the creators of Atom
A Family of Open Sourced Music Foundation Models
An Open Source implementation of Notebook LM with more flexibility
Phi-3.5 for Mac: Locally-run Vision and Language Models
A Python library for learning and evaluating knowledge graph embedding
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Benchmarking Multimodal Agents for Open-Ended Tasks
Moonshot's most powerful AI model
Deep Learning-based Image Fusion: A Survey
Superfast AI decision making and processing of multi-modal data
CogView4, CogView3-Plus and CogView3(ECCV 2024)
SGLang is a fast serving framework for large language models