A high-throughput and memory-efficient inference and serving engine
State-of-the-art Parameter-Efficient Fine-Tuning
Multilingual sentence & image embeddings with BERT
MobileLLM Optimizing Sub-billion Parameter Language Models
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Low-code framework for building custom LLMs, neural networks
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Operating LLMs in production
Gemma open-weight LLM library, from Google DeepMind
Replace OpenAI GPT with another LLM in your app
Qwen3-Coder is the code version of Qwen3
Framework and no-code GUI for fine-tuning LLMs
Unified KV Cache Compression Methods for Auto-Regressive Models
A series of math-specific large language models of our Qwen2 series
Designed for text embedding and ranking tasks
Toolkit for conversational AI
A Simple and Universal Swarm Intelligence Engine
Qwen3-omni is a natively end-to-end, omni-modal LLM
Run Local LLMs on Any Device. Open-source
Capable of understanding text, audio, vision, video
Open-source, high-performance AI model with advanced reasoning
A state-of-the-art open visual language model
Powerful AI language model (MoE) optimized for efficiency/performance
Advanced language and coding AI model
Agentic, Reasoning, and Coding (ARC) foundation models