A Simple and Universal Swarm Intelligence Engine
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Deep learning library
Python tool for converting files and office documents to Markdown
Time series Timeseries Deep Learning Machine Learning Pytorch fastai
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Personal AI, On Personal Devices
OCRmyPDF adds an OCR text layer to scanned PDF files
The agent that grows with you
Qwen3-omni is a natively end-to-end, omni-modal LLM
Run Local LLMs on Any Device. Open-source
Capable of understanding text, audio, vision, video
Official Python inference and LoRA trainer package
A simple, high-quality voice conversion tool focused on ease of use
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Open-source, high-performance AI model with advanced reasoning
AI Fully Automated Short Video Engine
Effortless data labeling with AI support from Segment Anything
Awesome multilingual OCR toolkits based on PaddlePaddle
Robust Speech Recognition via Large-Scale Weak Supervision
3D reconstruction software
Public repository for Agent Skills
Automatic Speech Recognition with Word-level Timestamps