A state-of-the-art open visual language model
Phi-3.5 for Mac: Locally-run Vision and Language Models
Revolutionizing Database Interactions with Private LLM Technology
Qwen3-omni is a natively end-to-end, omni-modal LLM
Programmatic access to the AlphaGenome model
OCR expert VLM powered by Hunyuan's native multimodal architecture
Chat & pretrained large audio language model proposed by Alibaba Cloud
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Unified Multimodal Understanding and Generation Models
code for Mesh R-CNN, ICCV 2019
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Language modeling in a sentence representation space
An AI-powered security review GitHub Action using Claude
Dataset of GPT-2 outputs for research in detection, biases, and more
A series of math-specific large language models of our Qwen2 series
Implementation of the Surya Foundation Model for Heliophysics
Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project
Implementation of "MobileCLIP" CVPR 2024
Official implementation of Watermark Anything with Localized Messages
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
Ling is a MoE LLM provided and open-sourced by InclusionAI
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Personalize Any Characters with a Scalable Diffusion Transformer