Multilingual Document Layout Parsing in a Single Vision-Language Model
Code release for Cut and Learn for Unsupervised Object Detection
Automatic SQL injection and database takeover tool
Bailing is a voice dialogue robot similar to GPT-4o
Chinese XLNet pre-trained model
Framework for building neural networks
Refer and Ground Anything Anywhere at Any Granularity
Convert AI papers to GUI
End-to-end speech processing toolkit
PyTorch code and models for VJEPA2 self-supervised learning from video
Language modeling in a sentence representation space
Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition
Qwen3-omni is a natively end-to-end, omni-modal LLM
Refractoring ChatBot+LLM, Gpt-3.5-turbo, ChatGPT Bot/Voice Assistant
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Multi-modal large language model designed for audio understanding
Chat & pretrained large vision language model
Graphical User Interface Face Anonymization Tool
Visual Automation IDE — automate anything you see on screen
Mice speech to text with MX Cinnamon OS ISO
Turns the YouTube Music site into a desktop application.
A subtitle generator for Japanese Adult Videos.
A Python application to add watermarks (text or image) to PDF files
mice stt tts
A fast, powerful, and simple hierarchical vision transformer