Showing 369 open source projects for "visual"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    DTW is intended to be a Voice in -> Pictures + Text out program written in java using Sphinx from CMU. This is intended to be useful to people who have good oral/visual literacy skills but poor written literacy skills.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    MEM Net - Mote EMulator Network. This project will focus on: 1) MEM - Wireless Sensor Node (mote) emulator 2) MEM Net - network of emulated motes So far, the only released package is “visual-sim-slides”. More comming next !
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    An attempt at developing an Artificial Inteligence software, based on the notion that inteligence is a set of conditional choices (in programming terms, it means a whole lot of if statements).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Visual CCG is a set of tools to help one view and manipulate XML-based CCG (Complex Categorical Grammars) tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Add Two Lines of Code. Get Full APM. Icon
    Add Two Lines of Code. Get Full APM.

    AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

    Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.
    Start Free
  • 5
    Pangaea will be a robust and feature filled game engine built using Allegro (http://alleg.sourceforge.net) It will be similar to Final Fantasy 1-3, etc., complete with map editor/world builder.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    This project aims to create a block oriented visual workbench for general purpose systems modelling and simulation. In the first milestone, we`ll build a simple GUI and the basic blocks to model neural network training and internetwork topologies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Medical decision making algorithm tool. Visual design tool generates Tcl/Tk code. Non-programmers can design interactive algorithms. Generates notes for inclusion in medical record. Runs freestanding or in Tcl Plugin.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    A collection of user contributed applications which use the Open Computer Vision Library
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Qwen2.5-VL-7B-Instruct

    Qwen2.5-VL-7B-Instruct

    Multimodal 7B model for image, video, and text understanding tasks

    Qwen2.5-VL-7B-Instruct is a multimodal vision-language model developed by the Qwen team, designed to handle text, images, and long videos with high precision. Fine-tuned from Qwen2.5-VL, this 7-billion-parameter model can interpret visual content such as charts, documents, and user interfaces, as well as recognize common objects. It supports complex tasks like visual question answering, localization with bounding boxes, and structured output generation from documents. The model is also capable of video understanding with dynamic frame sampling and temporal reasoning, enabling it to analyze and respond to long-form videos. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    lastest

    lastest

    AI-supported visual verification and tests you can actually trust.

    Lastest.cloud is a free, open-source verification of development, self-hosted visual regression and end-to-end testing platform for web applications. An AI agent records you clicking through your running app and generates Playwright tests with multi-selector fallback. Replays are deterministic and token-free, so your CI/CD bill doesn't scale with your test suite. Lastest ships three diff engines side-by-side — pixel (pixelmatch), structural (SSIM), and perceptual (Butteraugli) — so flaky pixel diffs stop crying wolf. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Qwen2.5-VL-3B-Instruct

    Qwen2.5-VL-3B-Instruct

    Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video

    ...As part of the Qwen2.5 series, it supports image-text-to-text generation with capabilities like chart reading, object localization, and structured data extraction. The model can serve as an intelligent visual agent capable of interacting with digital interfaces and understanding long-form videos by dynamically sampling resolution and frame rate. It uses a SwiGLU and RMSNorm-enhanced ViT architecture and introduces mRoPE updates for robust temporal and spatial understanding. The model supports flexible image input (file path, URL, base64) and outputs structured responses like bounding boxes or JSON, making it highly versatile in commercial and research settings. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Kimi K2.6

    Kimi K2.6

    Multimodal agent model for coding, orchestration, and autonomy

    ...It is designed to handle complex end-to-end software workflows across multiple languages and domains, including front-end development, DevOps, performance optimization, and coding-driven design. Beyond coding, it can transform prompts and visual inputs into production-ready interfaces and lightweight full-stack outputs with structured layouts, interactivity, and polished visual detail. One of its most distinctive capabilities is horizontal agent scaling, supporting up to 300 sub-agents and 4,000 coordinated steps in a single run, which enables parallel task decomposition and end-to-end completion of outputs such as documents, websites, and spreadsheets. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    NoobAI XL 1.1

    NoobAI XL 1.1

    Open, non-commercial SDXL model for quality image generation

    ...The model encourages open-source collaboration, requiring derivative models and LoRAs to be shared under the same terms. Sponsored by Lanyun Cloud, it represents a community-driven effort to democratize advanced, high-quality visual generation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    FLUX.1-Krea-dev

    FLUX.1-Krea-dev

    Text-to-image model optimized for artistic quality and safe generation

    FLUX.1-Krea-dev is a 12 billion parameter rectified flow transformer for text-to-image generation, developed by Black Forest Labs in collaboration with Krea. It delivers aesthetic, high-quality outputs focused on photography and visual coherence, making it a strong competitor to closed-source models. Trained using guidance distillation, it offers efficient inference while preserving creative fidelity. The model is distributed under a non-commercial license, with conditions to prevent misuse and support ethical AI development. FLUX.1-Krea-dev is available via Diffusers and ComfyUI, and integrates with the FluxPipeline for streamlined usage. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Qwen-Image-Edit

    Qwen-Image-Edit

    An advanced bilingual image editing with semantic control

    Qwen-Image-Edit is the image editing extension of Qwen-Image, a 20B parameter model that combines advanced visual and text-rendering capabilities for creative and precise editing. It leverages both Qwen2.5-VL for semantic control and a VAE Encoder for appearance control, enabling users to edit at both the content and detail level. The model excels at semantic edits like style transfer, object rotation, and novel view synthesis, while also handling precise appearance edits such as adding or removing elements without altering surrounding regions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    OpenVLA 7B

    OpenVLA 7B

    Vision-language-action model for robot control via images and text

    ...It takes camera images and natural language instructions as input and outputs normalized 7-DoF robot actions, enabling control of multiple robot types across various domains. Built on top of LLaMA-2 and DINOv2/SigLIP visual backbones, it allows both zero-shot inference for known robot setups and parameter-efficient fine-tuning for new domains. The model supports real-world robotics tasks, with robust generalization to environments seen in pretraining. Its actions include delta values for position, orientation, and gripper status, and can be un-normalized based on robot-specific statistics. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    fashion-clip

    fashion-clip

    CLIP model fine-tuned for zero-shot fashion product classification

    ...It supports multilingual fashion queries and works best with clean, product-style images against white backgrounds. The model can be used for product search, recommendation systems, or visual tagging in e-commerce platforms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Ministral 3 3B Base 2512

    Ministral 3 3B Base 2512

    Small 3B-base multimodal model ideal for custom AI on edge hardware

    Ministral 3 3B Base 2512 is the smallest model in the Ministral 3 family, offering a compact yet capable multimodal architecture suited for lightweight AI applications. It combines a 3.4B-parameter language model with a 0.4B vision encoder, enabling both text and image understanding in a tiny footprint. As the base pretrained model, it is not fine-tuned for instructions or reasoning, making it the ideal foundation for custom post-training, domain adaptation, or specialized downstream tasks....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Ministral 3 3B Instruct 2512

    Ministral 3 3B Instruct 2512

    Ultra-efficient 3B multimodal instruct model built for edge deployment

    Ministral 3 3B Instruct 2512 is the smallest model in the Ministral 3 family, offering a lightweight yet capable multimodal architecture designed for edge and low-resource deployments. It includes a 3.4B-parameter language model paired with a 0.4B vision encoder, enabling it to understand both text and visual inputs. As an FP8 instruct-fine-tuned model, it is optimized for chat, instruction following, and compact agentic tasks while maintaining strong adherence to system prompts. Despite its small size, it delivers efficient real-time performance and can run locally on a single 8GB GPU, with further memory reductions through quantization. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB