Scalable generative AI framework built for researchers and developers
Repo of Qwen2-Audio chat & pretrained large audio language model
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
OCR expert VLM powered by Hunyuan's native multimodal architecture
Open-source MCP server that gives your coding agent
Machine Learning Systems: Design and Implementation
VITS2 backbone with multilingual-bert
Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project
Implements weak-to-strong learning for training stronger ML models
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Multi-Voice and Prompt-Controlled TTS Engine
High-Resolution Image Synthesis with Latent Diffusion Models
Dataset of GPT-2 outputs for research in detection, biases, and more
Official code for Style Aligned Image Generation via Shared Attention
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
A Conversational Speech Generation Model
Powerful open source image generation model
Open Multilingual Multimodal Chat LMs
Python 3 package for easy bypass reCAPTCHA/reCAPTCHA Mobile/hCaptcha
Best practice TTS based on BERT and VITS
Open source implementation of Microsoft's VALL-E X zero-shot TTS model
CoTracker is a model for tracking any point (pixel) on a video
FAIR's research platform for object detection research
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Text-to-Image generation. The repo for NeurIPS 2021 paper