AutoGluon: AutoML for Image, Text, and Tabular Data
Build Vision Agents quickly with any model or video provider
Generate audiobooks from e-books
OCR model for complex documents with layout-aware structured outputs
Foundation model for image generation
Qwen2.5-VL is the multimodal large language model series
A speech-text foundation model for real time dialogue
Document content and metadata extraction microservice
The Python code to reproduce illustrations from Machine Learning Book
Python library for scraping and analyzing online news articles easily
Controllable and fast Text-to-Speech for over 7000 languages
A Python package for segmenting geospatial data with the SAM
lightweight package to simplify LLM API calls
Fast stable diffusion on CPU and AI PC
Adding guardrails to large language models
Qwen3-ASR is an open-source series of ASR models
Instant voice cloning by MIT and MyShell. Audio foundation model
Open-source multi-speaker long-form text-to-speech model
Scalable data pre processing and curation toolkit for LLMs
The open-source data curation platform for LLMs
User toolkit for analyzing and interfacing with Large Language Models
Open source terminal session recorder
Public opinion analysis system
Interface for OuteTTS models
A very simple framework for state-of-the-art NLP