computer vision free download

Phi-3-MLX

Phi-3.5 for Mac: Locally-run Vision and Language Models

Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.

Downloads: 4 This Week

Last Update: 2025-03-13

See Project

hCaptcha Challenger

Gracefully face hCaptcha challenge with multimodal llms

hCaptcha Challenger is an open-source automation framework designed to solve hCaptcha verification challenges using computer vision models and multimodal reasoning techniques. The project integrates machine learning models capable of analyzing visual captcha tasks and identifying the correct responses required to pass the verification process. Instead of relying on third-party captcha-solving services or browser scripts, the system operates independently by using pretrained neural networks that can classify images, detect objects, and interpret spatial relationships. ...

Downloads: 4 This Week

Last Update: 2026-03-06

See Project

InternGPT

Open source demo platform where you can easily showcase your AI models

InternGPT is an open-source multimodal AI framework designed to extend large language models beyond text interactions into visual reasoning and image manipulation tasks. The system integrates conversational AI with computer vision models so users can interact with images, videos, and visual environments through natural language instructions. Unlike traditional chat systems that rely solely on text prompts, InternGPT allows users to interact with visual content using both language and nonverbal signals such as pointing or highlighting objects within images. ...

Downloads: 0 This Week

Last Update: 2026-03-05

See Project

InternVL

A Pioneering Open-Source Alternative to GPT-4o

InternVL is a large-scale multimodal foundation model designed to integrate computer vision and language understanding within a unified architecture. The project focuses on scaling vision models and aligning them with large language models so that they can perform tasks involving both visual and textual information. InternVL is trained on massive collections of image-text data, enabling it to learn representations that capture both visual patterns and semantic meaning. ...

Downloads: 0 This Week

Last Update: 2026-03-04

See Project

Torch Pruning

DepGraph: Towards Any Structural Pruning

...Torch-Pruning physically removes parameters rather than masking them, which results in smaller and faster models during both training and inference. The toolkit supports a wide variety of architectures used in computer vision and large language models, making it a flexible solution for model compression tasks.

Downloads: 3 This Week

Last Update: 2026-03-05

See Project

CogView4

CogView4, CogView3-Plus and CogView3(ECCV 2024)

...It emphasizes bilingual usability, making it well-suited for cross-lingual multimodal applications. The model also supports fine-tuning and downstream customization, extending its applicability to creative content generation, human–computer interaction, and research on vision-language alignment.

Downloads: 1 This Week

Last Update: 5 days ago

See Project

StarVector

StarVector is a foundation model for SVG generation

StarVector is a multimodal foundation model designed for generating Scalable Vector Graphics (SVG) from images or textual descriptions. The system treats vector graphics creation as a code generation problem, producing SVG code that can render detailed vector images. Its architecture combines computer vision techniques with language modeling capabilities so it can understand visual inputs and textual prompts simultaneously. The model converts raster images or text instructions into structured vector representations, enabling high-quality vectorization and design generation. This approach allows StarVector to create scalable graphics that maintain visual quality regardless of resolution, which is especially useful for design tools and illustration workflows. ...

Downloads: 0 This Week

Last Update: 2026-03-05

See Project

HuixiangDou

Overcoming Group Chat Scenarios with LLM-based Technical Assistance

...This design allows the system to participate in group discussions without flooding the chat with unnecessary messages. The assistant uses retrieval and ranking methods along with language model reasoning to produce accurate answers for technical topics such as computer vision and machine learning projects. It can be integrated into messaging platforms such as WeChat or other team collaboration tools to assist developer communities.

Downloads: 4 This Week

Last Update: 2026-03-06

See Project

LLaVA

Visual Instruction Tuning: Large Language-and-Vision Assistant

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.

Downloads: 5 This Week

Last Update: 2024-02-04

See Project

EvaDB

Database system for building simpler and faster AI-powered application

Over the last decade, AI models have radically changed the world of natural language processing and computer vision. They are accurate on various tasks ranging from question answering to object tracking in videos. To use an AI model, the user needs to program against multiple low-level libraries, like PyTorch, Hugging Face, Open AI, etc. This tedious process often leads to a complex AI app that glues together these libraries to accomplish the given task.

Downloads: 3 This Week

Last Update: 2023-11-19

See Project

Search Results for "computer vision"

Showing 10 open source projects for "computer vision"

Phi-3-MLX

hCaptcha Challenger

InternGPT

InternVL

Torch Pruning

CogView4

StarVector

HuixiangDou

LLaVA

EvaDB

Search Results for "computer vision"

Showing 10 open source projects for "computer vision"

Phi-3-MLX

hCaptcha Challenger

InternGPT

InternVL

Torch Pruning

CogView4

StarVector

HuixiangDou

LLaVA

EvaDB

Related Searches

Related Categories