image analysis algorithm free download

FLUX.2-klein-4B

Flux 2 image generation model pure C inference

FLUX.2-klein-4B is a compact, high-performance C library implementation of the Flux optimization algorithm — an iterative approach for solving large-scale optimization problems common in scientific computing, machine learning, and numerical simulation. Written with a strong emphasis on simplicity, correctness, and performance, it abstracts the core logic of flux-based optimization into a minimal C API that can be embedded in broader applications without pulling in heavy dependencies. Because...

Downloads: 4 This Week

Last Update: 2026-02-13

See Project

DeepSeek-OCR

Contexts Optical Compression

DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body...

Downloads: 3 This Week

Last Update: 2026-01-27

See Project

GLM-4.5V

GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding,...

Downloads: 0 This Week

Last Update: 2026-05-16

See Project

Janus

Unified Multimodal Understanding and Generation Models

Janus is a sophisticated open-source project from DeepSeek AI that aims to unify both visual understanding and image generation in a single model architecture. Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations.

Downloads: 2 This Week

Last Update: 2025-10-20

See Project

MiniCPM-o

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...

Downloads: 1 This Week

Last Update: 2025-05-15

See Project

GLM-4.6V

GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

GLM-4.6V represents the latest generation of the GLM-V family and marks a major step forward in multimodal AI by combining advanced vision-language understanding with native “tool-call” capabilities, long-context reasoning, and strong generalization across domains. Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and...

Downloads: 0 This Week

Last Update: 2026-05-16

See Project

MediaPipe Face Detection

Detect faces in an image

The MediaPipe Face Detection model is a high-performance, real-time face detection solution that uses machine learning to identify faces in images and video streams. It is optimized for mobile and embedded platforms, offering fast and accurate face detection while maintaining a small memory footprint. This model supports multiple face detections and is highly efficient, making it suitable for a variety of applications such as augmented reality, user authentication, and facial expression analysis.

Downloads: 2 This Week

Last Update: 2025-03-19

See Project

MAE (Masked Autoencoders)

PyTorch implementation of MAE

MAE (Masked Autoencoders) is a self-supervised learning framework for visual representation learning using masked image modeling. It trains a Vision Transformer (ViT) by randomly masking a high percentage of image patches (typically 75%) and reconstructing the missing content from the remaining visible patches. This forces the model to learn semantic structure and global context without supervision. The encoder processes only the visible patches, while a lightweight decoder reconstructs the...

Downloads: 0 This Week

Last Update: 2025-10-06

See Project

Gemma 4 12B

Unified multimodal Gemma model for local coding and reasoning

Gemma 4 12B is Google DeepMind’s unified open-weight multimodal model designed for efficient local reasoning, coding, and multimodal understanding. Unlike other Gemma 4 models that rely on separate encoders, the 12B Unified model uses an encoder-free architecture that projects raw image patches and audio waveforms directly into the language model’s embedding space, reducing multimodal latency and simplifying fine-tuning. It supports text, image, audio, and video inputs with text output, making it useful for transcription, image understanding, video analysis, coding, and agentic workflows. The model has 11.95B parameters, 48 layers, a 256K-token context window, and support for over 140 languages. ...

Downloads: 0 This Week

Last Update: 2026-06-03

See Project

Devstral Small 2

Lightweight 24B agentic coding model with vision and long context

Devstral Small 2 is a compact agentic language model designed for software engineering workflows, excelling at tool usage, codebase exploration, and multi-file editing. With 24B parameters and FP8 instruct tuning, it delivers strong instruction following while remaining lightweight enough for local and on-device deployment. The model achieves competitive performance on SWE-bench, validating its effectiveness for real-world coding and automation tasks. It introduces vision capabilities,...

Downloads: 0 This Week

Last Update: 2026-01-16

See Project

Gemma 4

Google’s flagship dense multimodal model for coding and reasoning

Gemma 4 is Google DeepMind’s flagship dense open-weight multimodal model, designed for high-end reasoning, coding, agentic workflows, and multimodal understanding. The model contains approximately 30.7B parameters and supports text and image inputs with text generation output, while also processing video as image-frame sequences. Built as the most capable model in the Gemma 4 family, it combines strong reasoning performance with a large 256K-token context window and configurable thinking modes. Gemma 4 31B supports native function calling, structured outputs, and more than 140 languages, making it suitable for enterprise assistants, coding agents, document analysis, and multilingual applications. ...

Downloads: 0 This Week

Last Update: 2026-06-03

See Project

Ministral 3 3B Base 2512

Small 3B-base multimodal model ideal for custom AI on edge hardware

Ministral 3 3B Base 2512 is the smallest model in the Ministral 3 family, offering a compact yet capable multimodal architecture suited for lightweight AI applications. It combines a 3.4B-parameter language model with a 0.4B vision encoder, enabling both text and image understanding in a tiny footprint. As the base pretrained model, it is not fine-tuned for instructions or reasoning, making it the ideal foundation for custom post-training, domain adaptation, or specialized downstream tasks....

Downloads: 0 This Week

Last Update: 2025-12-03

See Project

Ministral 3 3B Reasoning 2512

Compact 3B-param multimodal model for efficient on-device reasoning

Ministral 3 3B Reasoning 2512 is the smallest reasoning-capable model in the Ministal-3 family, yet delivers a surprisingly capable multimodal and multilingual base for lightweight AI applications. It pairs a 3.4B-parameter language model with a 0.4B-parameter vision encoder, enabling it to understand both text and image inputs. This reasoning-tuned variant is optimized for tasks like math, coding, and other STEM-related problem solving, making it suitable for applications that require logical reasoning, analysis, or structured thinking. Despite its modest size, the model is designed for edge deployment and can run locally, fitting in ~16 GB of VRAM in BF16 or under 8 GB of RAM/VRAM when quantized. ...

Downloads: 0 This Week

Last Update: 2025-12-03

See Project

Ministral 3 14B Base 2512

Powerful 14B-base multimodal model — flexible base for fine-tuning

Ministral 3 14B Base 2512 is the largest model in the Ministral 3 line, offering state-of-the-art language and vision capabilities in a dense, base-pretrained form. It combines a 13.5B-parameter language model with a 0.4B-parameter vision encoder, enabling both high-quality text understanding/generation and image-aware tasks. As a “base” model (i.e. not fine-tuned for instruction or reasoning), it provides a flexible foundation ideal for custom fine-tuning or downstream specialization. The...

Downloads: 0 This Week

Last Update: 2025-12-03

See Project

Search Results for "image analysis algorithm"

Showing 14 open source projects for "image analysis algorithm"

FLUX.2-klein-4B

DeepSeek-OCR

GLM-4.5V

Janus

MiniCPM-o

GLM-4.6V

MediaPipe Face Detection

MAE (Masked Autoencoders)

Gemma 4 12B

Devstral Small 2

Gemma 4

Ministral 3 3B Base 2512

Ministral 3 3B Reasoning 2512

Ministral 3 14B Base 2512

Search Results for "image analysis algorithm"

Showing 14 open source projects for "image analysis algorithm"

FLUX.2-klein-4B

DeepSeek-OCR

GLM-4.5V

Janus

MiniCPM-o

GLM-4.6V

MediaPipe Face Detection

MAE (Masked Autoencoders)

Gemma 4 12B

Devstral Small 2

Gemma 4

Ministral 3 3B Base 2512

Ministral 3 3B Reasoning 2512

Ministral 3 14B Base 2512

Related Searches

Related Categories