DeepSeek Coder: Let the Code Write Itself
Official repository for LTX-Video
A Unified Framework for Text-to-3D and Image-to-3D Generation
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Hackable and optimized Transformers building blocks
Example Discord bot written in Python that uses the completions API
Open-source multi-speaker long-form text-to-speech model
ChatGPT interface with better UI
CogView4, CogView3-Plus and CogView3(ECCV 2024)
CodeGeeX2: A More Powerful Multilingual Code Generation Model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Unified Multimodal Understanding and Generation Models
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Video understanding codebase from FAIR for reproducing video models
Tool for exploring and debugging transformer model behaviors
Multimodal Diffusion with Representation Alignment
Controllable & emotion-expressive zero-shot TTS
Release for Improved Denoising Diffusion Probabilistic Models
This repository contains the official implementation of FastVLM
One-click local MCP server installation in desktop apps
Pushing the Limits of Mathematical Reasoning in Open Language Models
Official implementation of DreamCraft3D
Research code artifacts for Code World Model (CWM)
Capable of understanding text, audio, vision, video
Sharp Monocular Metric Depth in Less Than a Second