A state-of-the-art open visual language model
Generate audiobooks from EPUBs, PDFs and text with captions
Easily turn large sets of image urls to an image dataset
A robust, efficient, low-latency speech-to-text library
Abstraction layer over YouTube's internal API
Simple HTML5, YouTube and Vimeo player
Let's use AI to Earn
Automated YouTube Shorts pipeline
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
CLIP, Predict the most relevant text snippet given an image
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
4M: Massively Multimodal Masked Modeling
OpenAI swift async text to image for SwiftUI app using OpenAI
Towards Real-World Vision-Language Understanding
An enhanced HTML 5 file input for Bootstrap 5.x/4.x./3.x
Implementation of Dreambooth
Packages with more than 80 components for all delphi versions
An open-source framework for training large multimodal models
The ultimate tool to automate custom telegram message forwarding
Elegant, responsive, flexible and lightweight modal plugin with jQuery
A simple yet powerful JQuery star rating plugin with fractional rating
A lightweight, dependency-free Python library
Official implementation for UniVL video and language training models
Quickly create custom webpages from your content