Gemma 4 12B

Gemma 4 12B is Google DeepMind’s unified open-weight multimodal model designed for efficient local reasoning, coding, and multimodal understanding. Unlike other Gemma 4 models that rely on separate encoders, the 12B Unified model uses an encoder-free architecture that projects raw image patches and audio waveforms directly into the language model’s embedding space, reducing multimodal latency and simplifying fine-tuning. It supports text, image, audio, and video inputs with text output, making it useful for transcription, image understanding, video analysis, coding, and agentic workflows. The model has 11.95B parameters, 48 layers, a 256K-token context window, and support for over 140 languages. It also includes configurable thinking modes, native system prompt support, function calling, and strong benchmark performance for its size. It is optimized for consumer GPUs, workstations, and streamlined local deployment.

Features

Encoder-free unified multimodal architecture
Supports text, image, audio, and video inputs
11.95B-parameter dense transformer model
256K-token context window for long-context tasks
Configurable thinking mode for reasoning workflows
Native function calling for agentic applications
Supports 140+ languages with strong multilingual coverage
Optimized for consumer GPUs and local deployment

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Gemma 4 12B

Gemma 4 12B Web Site

Other Useful Business Software

Fully Managed MySQL, PostgreSQL, and SQL Server

Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free

Rate This Project

User Reviews

Be the first to post a review of Gemma 4 12B!

Additional Project Details

Registered

2026-06-03

Similar Business Software

Google AI Studio

Google AI Studio is a unified development platform that helps teams explore, build, and deploy applications using Google’s most advanced AI models, including Gemini 3.5. It brings text, image, audio, and video models together in one interactive playground. With vibe coding, developers can use...

See Software
LM-Kit.NET

LM-Kit.NET is a complete local AI runtime for .NET that lets engineering teams ship AI-powered features without cloud dependencies, per-token costs, or data leaving the network. Most .NET AI integrations stop at inference. LM-Kit.NET covers the full range of capabilities production...

See Software
Gemini Enterprise Agent Platform

Gemini Enterprise Agent Platform is a comprehensive solution from Google Cloud designed to help organizations build, scale, govern, and optimize AI agents. It represents the evolution of Vertex AI, combining advanced model development with new capabilities for agent orchestration and...

See Software
TranslateGemma

TranslateGemma is a new suite of open machine translation models from Google built on the Gemma 3 foundation that lets people and systems communicate across 55 languages with high-quality AI translation while maintaining efficiency and broad deployment flexibility. Available in 4 B, 12 B, and 27...

See Software
MiniMax M3

MiniMax M3 is an open-weight multimodal AI model designed for coding, agentic workflows, long-context reasoning, and complex automation tasks. The model combines frontier-level coding performance, native multimodal understanding, and a context window of up to 1 million tokens. MiniMax M3 uses...

See Software
Gemma 3n

Gemma 3n is our state-of-the-art open multimodal model, engineered for on-device performance and efficiency. Made for responsive, low-footprint local inference, Gemma 3n empowers a new wave of intelligent, on-the-go applications. It analyzes and responds to combined images and text, with video...

See Software