Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Algorithms for outlier, adversarial and drift detection
Search all of YouTube from the command line
Open source NLP guide with models, methods, and real use cases
Interface for OuteTTS models
Visual Causal Flow
Bidirectional token-classification model for identifiable info
HY-Motion model for 3D character animation generation
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Quick illustration of how one can easily read books together with LLMs
Large-language-model & vision-language-model based on Linear Attention
Fast multimodal LLM for real-time voice interaction and AI apps
Autoregressive Model Beats Diffusion
General-purpose image editing model that delivers high-fidelity
Diffusion Transformer with Fine-Grained Chinese Understanding
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
A Web UI for easy subtitle using whisper model
Build Vision Agents quickly with any model or video provider
MARS5 speech model (TTS) from CAMB.AI
Extract schema, statistics and entities from datasets
Context-aware desktop AI assistant that understands screen content
Data Infrastructure providing an approach to multimodal AI workloads
Build multimodal language agents for fast prototype and production
Generate Any 3D Scene in Seconds