Fast and Universal 3D reconstruction model for versatile tasks
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Pokee Deep Research Model Open Source Repo
A state-of-the-art open visual language model
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Diffusion Transformer with Fine-Grained Chinese Understanding
Qwen2.5-VL is the multimodal large language model series
Phi-3.5 for Mac: Locally-run Vision and Language Models
Revolutionizing Database Interactions with Private LLM Technology
Sharp Monocular Metric Depth in Less Than a Second
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Inference framework for 1-bit LLMs
Capable of understanding text, audio, vision, video
The official PyTorch implementation of Google's Gemma models
CodeGeeX2: A More Powerful Multilingual Code Generation Model
Chat & pretrained large vision language model
Repo of Qwen2-Audio chat & pretrained large audio language model
Implementation of "MobileCLIP" CVPR 2024
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
Towards Real-World Vision-Language Understanding
CLIP, Predict the most relevant text snippet given an image
Ling is a MoE LLM provided and open-sourced by InclusionAI
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation