Advancing Open-source World Models
GPT4V-level open-source multi-modal model based on Llama3-8B
The most powerful and modular diffusion model GUI, api and backend
Swing Music is a beautiful, self-hosted music player
Lightweight Python library for adding real-time multi-object tracking
The Shiptest Codebase
Harmonized and Coherent Human Image Animation
Ffree local self hosted video compressor webui
Open source terminal session recorder
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Private chat with local GPT with document, images, video, etc.
PyTorch code and models for VJEPA2 self-supervised learning from video
Persepolis Download Manager is a GUI for aria2
An unsupervised and free tool for image and video dataset analysis
Bazarr is a companion application to Sonarr and Radarr
Code for running inference and finetuning with SAM 3 model
NBA Stats API via Basketball Reference
Python Socket.IO server and client
Implementation of a U-net complete with efficient attention
The Ren'Py Visual Novel Engine
Python data, Leaflet.js maps
Official code for StoryMem: Multi-shot Long Video Storytelling
Streaming Real-time Audio-Driven Avatar Generation
NVR with realtime local object detection for IP cameras
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming