NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Open-source framework for intelligent speech interaction
Language modeling in a sentence representation space
GLM-4 series: Open Multilingual Multimodal Chat LMs
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Renderer for the harmony response format to be used with gpt-oss
ICLR2024 Spotlight: curation/training code, metadata, distribution
Memory-efficient and performant finetuning of Mistral's models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Tooling for the Common Objects In 3D dataset
Qwen2.5-VL is the multimodal large language model series
PyTorch code and models for the DINOv2 self-supervised learning
Implementation of "MobileCLIP" CVPR 2024
Tool for exploring and debugging transformer model behaviors
CLIP, Predict the most relevant text snippet given an image
Pretrained time-series foundation model developed by Google Research
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
The ChatGPT Retrieval Plugin lets you easily find personal documents
OCR expert VLM powered by Hunyuan's native multimodal architecture
Release for Improved Denoising Diffusion Probabilistic Models
StudioOllamaUI is a local, portable interface for Ollama
Open Multilingual Multimodal Chat LMs
Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project
Official repo for consistency models
Chinese LLaMA & Alpaca large language model + local CPU/GPU training