A Python library for audio data augmentation
Clone a voice in 5 seconds to generate arbitrary speech in real-time
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Test-Time Reinforcement Learning
MiroThinker is an open source deep research agent
Block Diffusion for Ultra-Fast Speculative Decoding
Anthropic's original performance take-home, now open for you to try
DeepMind model for tracking arbitrary points across videos & robotics
A unified framework for scalable computing
Low-latency REST API for serving text-embeddings
Z80-μLM is a 2-bit quantized language model
Jittor is a high-performance deep learning framework
Large-language-model & vision-language-model based on Linear Attention
2D and 3D Face alignment library build using pytorch
Decomposable Multiscale Mixing for Time Series Forecasting
Agent framework that enables tool-use agent tasks
Driving with Graph Visual Question Answering
Unleashing 10,000+ Word Generation from Long Context LLMs
LISA: Reasoning Segmentation via Large Language Model
Anomaly detection related books, papers, videos, and toolboxes
ICLR2024 Spotlight: curation/training code, metadata, distribution
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Openai style api for open large language models
OCR expert VLM powered by Hunyuan's native multimodal architecture
RGBD video generation model conditioned on camera input