Multimodal-Driven Architecture for Customized Video Generation
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Open-source framework for intelligent speech interaction
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Designed for text embedding and ranking tasks
High-Fidelity and Controllable Generation of Textured 3D Assets
An Efficient Agentic Model for Computer Use
The official PyTorch implementation of Google's Gemma models
Inference code for scalable emulation of protein equilibrium ensembles
26m function call model that runs on incredibly small devices
OpenTinker is an RL-as-a-Service infrastructure for foundation models
High-resolution models for human tasks
Personalize Any Characters with a Scalable Diffusion Transformer
Pretrained time-series foundation model developed by Google Research
Open-source deep-learning framework
A PyTorch library for implementing flow matching algorithms
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Controllable & emotion-expressive zero-shot TTS
DeepMind model for tracking arbitrary points across videos & robotics
Language modeling in a sentence representation space
Diversity-driven optimization and large-model reasoning ability
Multi-modal large language model designed for audio understanding
Large Multimodal Models for Video Understanding and Editing