A Pragmatic VLA Foundation Model
Tiny vision language model
Language Model Reinforcement Learning Environments frameworks
OCR expert VLM powered by Hunyuan's native multimodal architecture
Build, evaluate and train General Multi-Agent Assistance with ease
Meta Agents Research Environments is a comprehensive platform
Talk to Your AI Agents from Anywhere
Inference code for scalable emulation of protein equilibrium ensembles
48khz stereo neural audio codec for general audio
Optax is a gradient processing and optimization library for JAX
A very simple framework for state-of-the-art NLP
Mentat - The AI Coding Assistant
Seamlessly integrate LLMs into scikit-learn
Neural Search
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Generate blog articles from video or audio
Controllable and fast Text-to-Speech for over 7000 languages
Towards Human-Level Text-to-Speech through Style Diffusion
AI discovers 520000 stable inorganic crystal structures for research
DeepMind model for tracking arbitrary points across videos & robotics
code for Mesh R-CNN, ICCV 2019
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
An open sourced end-to-end VLM-based GUI Agent
Leveraging BERT and c-TF-IDF to create easily interpretable topics
Deep learning optimization library: makes distributed training easy