Wan2.1: Open and Advanced Large-Scale Video Generative Model
A neural network that transforms a design mock-up into static websites
SAPIEN Manipulation Skill Framework
Python inference and LoRA trainer package for the LTX-2 audio–video
Reference PyTorch implementation and models for DINOv3
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Lets make video diffusion practical
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Agent S: an open agentic framework that uses computers like a human
Master the fundamentals of machine learning, deep learning
Open-source evaluation toolkit of large multi-modality models (LMMs)
Full-stack AI Red Teaming platform
The most powerful Android RPA agent framework
Official implementation of Watermark Anything with Localized Messages
Label Studio is a multi-type data labeling and annotation tool
An open phone agent model & framework
Agent Skill for generating 2D sprite sheets and map, transparent PNG
A frontier, first-principles handbook
AI tool that converts GitHub repositories into interactive diagrams
Extension of Google Research’s PaperBanana
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Qwen3-omni is a natively end-to-end, omni-modal LLM
All-in-one AI productivity platform with agents, workflows, and IM
State-of-the-art Image & Video CLIP, Multimodal Large Language Models