Multimodal-Driven Architecture for Customized Video Generation
Course to get into Large Language Models (LLMs)
A tool to snap pixels to a perfect grid
Synchronized Translation for Videos
Analyze computation-communication overlap in V3/R1
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
The best ChatGPT that $100 can buy
A Unified Framework for Image Customization
Bash is all you need, write a claude code with only 16 line code
Repo for external large-scale work
PyTorch original implementation of Cross-lingual Language Model
DeepMind's Tacotron-2 Tensorflow implementation
Speech recognition application builder and library
CLIP model fine-tuned for zero-shot fashion product classification
Efficient English embedding model for semantic search and retrieval