A state-of-the-art open visual language model
Generate audiobooks from EPUBs, PDFs and text with captions
Easily turn large sets of image urls to an image dataset
A robust, efficient, low-latency speech-to-text library
Simple HTML5, YouTube and Vimeo player
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
A simple screen parsing tool towards pure vision based GUI agent
Towards Real-World Vision-Language Understanding
CLIP, Predict the most relevant text snippet given an image
4M: Massively Multimodal Masked Modeling
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
OpenAI swift async text to image for SwiftUI app using OpenAI
An enhanced HTML 5 file input for Bootstrap 5.x/4.x./3.x
Software version control visualization
ShanaEncoder is audio/video encoding program based on FFmpeg.
Modern Firefox based web browser for Windows Vista & 7!
Packages with more than 80 components for all delphi versions
An open-source framework for training large multimodal models
A convenient and easy to use image viewer for your iOS app
The ultimate tool to automate custom telegram message forwarding
Elegant, responsive, flexible and lightweight modal plugin with jQuery
Take back your faith
A GUI interface written "Around" LuaMacros