High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Official Python inference and LoRA trainer package
Official inference repo for FLUX.2 models
AI tool that removes hardcoded subtitles and text from videos locally
Synthesizing and manipulating 2048x1024 images with conditional GANs
Reverse engineering Gemini's SynthID detection
This repository contains the official implementation of FastVLM
Repo for SeedVR2 & SeedVR
OCRmyPDF adds an OCR text layer to scanned PDF files
High-Resolution Image Synthesis with Latent Diffusion Models
Recovering the Visual Space from Any Views
Qwen2.5-VL is the multimodal large language model series
Generate high-definition story short videos with one click using AI
Knowledge Graph Generation from Any Text
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Stable Diffusion web UI
A Customizable Image-to-Video Model based on HunyuanVideo
Reference PyTorch implementation and models for DINOv3
Official repository for LTX-Video
Native and Compact Structured Latents for 3D Generation
GPT4V-level open-source multi-modal model based on Llama3-8B
PyTorch extensions for fast R&D prototyping and Kaggle farming
Open image model at the forefront of design
Tokenizer-Free TTS for Multilingual Speech Generation
A full spaCy pipeline and models for scientific/biomedical documents