Fast, offline video captioning tool optimized for entry-level GPUs.
Generate audiobooks from EPUBs, PDFs and text with captions
Easily turn large sets of image urls to an image dataset
A robust, efficient, low-latency speech-to-text library
Let's use AI to Earn
A state-of-the-art open visual language model
Simple HTML5, YouTube and Vimeo player
Abstraction layer over YouTube's internal API
Automated YouTube Shorts pipeline
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
CLIP, Predict the most relevant text snippet given an image
4M: Massively Multimodal Masked Modeling
A simple screen parsing tool towards pure vision based GUI agent
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
ShanaEncoder is audio/video encoding program based on FFmpeg.
OpenAI swift async text to image for SwiftUI app using OpenAI
Software version control visualization
Towards Real-World Vision-Language Understanding
An enhanced HTML 5 file input for Bootstrap 5.x/4.x./3.x
A standalone lightweight auxiliary CLI video player for BlackVideo.
Implementation of Dreambooth
Packages with more than 80 components for all delphi versions