Qwen2.5-VL is the multimodal large language model series
Encoder of greater-than-word length text trained on a variety of data
Multimodal model achieving SOTA performance
[CVPR 2025 Best Paper Award] VGGT
Towards Human-Level Text-to-Speech through Style Diffusion
A MATLAB package for modelling multivariate stimulus-response data
VITS2 backbone with multilingual-bert
Generate 3D objects conditioned on text or images
An open-source framework for training large multimodal models
Basaran, an open-source alternative to the OpenAI text completion API
Meta-Transformer for Unified Multimodal Learning
Neural machine translation and sequence learning using TensorFlow
Singing voice change based on whisper, lora for singing voice clone
Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion
Transformer related optimization, including BERT, GPT
CPT: A Pre-Trained Unbalanced Transformer
Text-conditional image generation model based on OpenAI's unCLIP
A latent text-to-image diffusion model
A High Performance Library for Sequence Processing and Generation
PyTorch implementation of MAE
Deep learning PyTorch library for time series forecasting
Reformer, the efficient Transformer, in Pytorch
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Facebook AI Research Sequence-to-Sequence Toolkit
ALIbaba's Collection of Encoder-decoders from MinD