Showing 559 open source projects for "visual-mingw"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Perception Models

    Perception Models

    State-of-the-art Image & Video CLIP, Multimodal Large Language Models

    ...The project supports a wide range of research applications, from visual recognition and dense prediction to fine-grained multimodal understanding. Additionally, it includes several large-scale open datasets for both image and video perception.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Screenshot to Code

    Screenshot to Code

    A neural network that transforms a design mock-up into static websites

    Screenshot-to-code is a tool or prototype that attempts to convert UI screenshots (e.g., of mobile or web UIs) into code representations, likely generating layouts, HTML, CSS, or markup from image inputs. It is part of a research/proof-of-concept domain in UI automation and image-to-UI code generation. Mapping visual design to code constructs. Code/UI layout (HTML, CSS, or markup). Examples/demo scripts showing “image UI code”.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    ManiSkill

    ManiSkill

    SAPIEN Manipulation Skill Framework

    ...Developed by Hao Su Lab, it focuses on robotic manipulation with diverse, high-quality 3D tasks designed to challenge perception, control, and planning in robotics. ManiSkill provides both low-level control and visual observation spaces for realistic learning scenarios.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Audiblez

    Audiblez

    Generate audiobooks from e-books

    Audiblez is a tool for generating high-quality .m4b audiobooks directly from .epub e-books using the Kokoro-82M neural text-to-speech model. It focuses on making audiobook creation easy and fast: from a single command, the tool splits an e-book into chapters, synthesizes audio for each section, and then merges the results into a structured audiobook with chapter-based WAV files and a final .m4b container. The Kokoro-82M model it uses is compact (82M parameters) yet natural sounding, trained...
    Downloads: 25 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 5
    PS2 Cover

    PS2 Cover

    PS2 Covers Collection

    ...Its scale and completeness make it one of the most comprehensive resources for retro gaming visuals. Overall, ps2-covers enhances the user experience of emulation by adding organized and accessible visual metadata.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GLM-4.5V

    GLM-4.5V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding, and long-document interpretation. GLM-4.5V emerged from a training framework that leverages scalable reinforcement learning (with curriculum sampling) to boost performance across tasks ranging from STEM problem solving to long-context reasoning, giving it broad applicability beyond narrow benchmarks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Sa2VA

    Sa2VA

    Official Repo For "Sa2VA: Marrying SAM2 with LLaVA

    Sa2VA is a cutting-edge open-source multi-modal large language model (MLLM) developed by ByteDance that unifies dense segmentation, visual understanding, and language-based reasoning across both images and videos. It merges the segmentation power of a state-of-the-art video segmentation model (based on SAM‑2) with the vision-language reasoning capabilities of a strong LLM backbone (derived from models like InternVL2.5 / Qwen-VL series), yielding a system that can answer questions about visual content, perform referring segmentation, and maintain temporal consistency across frames in video. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    VisualGLM-6B

    VisualGLM-6B

    Chinese and English multimodal conversational language model

    VisualGLM-6B is an open-source multimodal conversational language model developed by ZhipuAI that supports both images and text in Chinese and English. It builds on the ChatGLM-6B backbone, with 6.2 billion language parameters, and incorporates a BLIP2-Qformer visual module to connect vision and language. In total, the model has 7.8 billion parameters. Trained on a large bilingual dataset — including 30 million high-quality Chinese image-text pairs from CogView and 300 million English pairs — VisualGLM-6B is designed for image understanding, description, and question answering. Fine-tuning on long visual QA datasets further aligns the model’s responses with human preferences. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    ART ASCII Library

    ART ASCII Library

    ASCII art library for Python

    ASCII art is also known as "computer text art". It involves the smart placement of typed special characters or letters to make a visual shape that is spread over multiple lines of text. ART is a Python lib for text converting to ASCII art fancy.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 10
    DINOv3

    DINOv3

    Reference PyTorch implementation and models for DINOv3

    DINOv3 is the third-generation iteration of Meta’s self-supervised visual representation learning framework, building upon the ideas from DINO and DINOv2. It continues the paradigm of learning strong image representations without labels using teacher–student distillation, but introduces a simplified and more scalable training recipe that performs well across datasets and architectures. DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while maintaining or improving feature quality. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 11
    JoyAI-Echo

    JoyAI-Echo

    Pushing the Frontier of Long Audio-Visual Generation

    ...It is designed to create minute-level, multi-shot video stories from structured prompts while preserving continuity across scenes. The system uses a paired cross-modal memory bank to maintain visual identity and voice consistency over longer sequences. It also uses a distilled DMD generator to reduce inference cost and improve generation speed compared with heavier multi-step pipelines. JoyAI-Echo focuses on text-to-video and multi-shot long-video generation, while image-to-video support is not part of the current release scope. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    SmartNode

    SmartNode

    Visual simulation platform for space-based data backhaul scenarios

    smartNode is a visual simulation platform for space-based intelligent relay and satellite data-return scenarios. It models the relationship between satellites, ground stations, relay links, and content-driven task scheduling. The project includes a Python backend and a browser-based frontend, making it suitable for local simulation, teaching, and secondary development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    NVIDIA AI Blueprint

    NVIDIA AI Blueprint

    Suite of reference architectures for building GPU-accelerated vision

    ...The project is organized around real-time video intelligence, downstream analytics, and agentic offline processing. It supports workflows such as natural-language video search, visual question answering, long-video summarization, clip retrieval, verified alerts, and incident analysis. It is designed for technical users who need deployable reference architectures for smart spaces, warehouse automation, SOP validation, monitoring, and operational video analytics. The repository includes Python agent code, Docker Compose deployment configurations, skills, scripts, and a Next.js-based UI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Qtile

    Qtile

    A full-featured, hackable tiling window manager written in Python

    A full-featured, hackable tiling window manager written and configured in Python. Optimize your workflow by configuring your environment to fit how you work. Efficiently use screen real-estate by automatically arranging windows with minimal visual cruft. Save your wrists from RSI by ditching the mouse and driving with the keyboard. Qtile is simple, small, and extensible. It's easy to write your own layouts, widgets, and built-in commands. Qtile is written and configured entirely in Python. Leverage the full power and flexibility of the language to make it fit your needs. The Qtile community is active and growing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Super Magic

    Super Magic

    All-in-one AI productivity platform with agents, workflows, and IM

    ...Magic centers around a general-purpose AI agent system called Super Magic, which can autonomously understand tasks, plan actions, execute workflows, and perform error correction. Alongside this, Magic includes a visual workflow engine that enables users to design complex AI processes using a drag-and-drop interface without requiring extensive coding knowledge. It also provides an enterprise-grade instant messaging system that integrates AI conversations with internal communication, allowing teams to collaborate while leveraging intelligent assistants. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Pixal3D

    Pixal3D

    Pixel-Aligned 3D Generation from Images

    ...It is useful for researchers, 3D artists, game asset workflows, and generative AI experiments focused on image-conditioned reconstruction. Its main value is combining generative flexibility with reconstruction-like visual fidelity.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 17
    CogVLM

    CogVLM

    A state-of-the-art open visual language model

    CogVLM is an open-source visual–language model suite—and its GUI-oriented sibling CogAgent—aimed at image understanding, grounding, and multi-turn dialogue, with optional agent actions on real UI screenshots. The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490×490 inputs; CogAgent-18B extends this to 1120×1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Pix2Text

    Pix2Text

    Open-Source Python3 tool for recognizing layouts, tables, and math

    An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, and mathematical formulas, and integrate all of these contents into Markdown format. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    LTX-2

    LTX-2

    Python inference and LoRA trainer package for the LTX-2 audio–video

    LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 20
    ROMM

    ROMM

    A beautiful, powerful, self-hosted rom manager and player

    ...The launcher includes a powerful universal search that combs through installed apps, contacts, messages, and web results to deliver quick answers without switching contexts. Romm also supports widgets, customization options, and theme choices so users can tailor the visual experience to their preferences while maintaining performance and responsiveness. Privacy is a highlight, with local indexing and search functions that operate without sending data to external servers unless explicitly permitted.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    Writer Framework

    Writer Framework

    No-code in the front, Python in the back. An open-source framework

    Writer Framework is an open source platform designed to help developers build AI-powered applications by combining a visual interface builder with a Python-based backend architecture. It follows a hybrid approach where user interfaces are created using a drag-and-drop editor while business logic is implemented in Python, allowing teams to balance speed and flexibility without sacrificing control. The framework is particularly focused on AI use cases, enabling developers to integrate large language models, knowledge graphs, and custom machine learning workflows into user-facing applications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    VLMEvalKit

    VLMEvalKit

    Open-source evaluation toolkit of large multi-modality models (LMMs)

    ...VLMEvalKit supports generation-based evaluation methods, allowing models to produce textual responses to visual inputs while measuring performance through techniques such as exact matching or language-model-assisted answer extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    firerpa LAMDA

    firerpa LAMDA

    The most powerful Android RPA agent framework

    lamda is an Android RPA agent framework that provides visual remote desktop control and automation at scale, geared toward testing, automation validation, and device management. It exposes a clean UI to monitor and interact with connected devices and includes tooling to script actions reliably across apps and OS versions. The project emphasizes low-friction setup and powerful control primitives so teams can move from interactive validation to repeatable automation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Watermark Anything

    Watermark Anything

    Official implementation of Watermark Anything with Localized Messages

    ...Developed by Facebook Research, it provides a robust, flexible system that allows users to insert one or multiple watermarks within selected image regions while maintaining visual quality and recoverability. Unlike traditional watermarking methods that rely on uniform embedding, WAM supports spatially localized watermarks, enabling targeted protection of specific image regions or objects. The model is trained to balance imperceptibility, ensuring minimal visual distortion, with robustness against transformations and edits such as cropping or motion.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use. Hybrid architecture combining multimodal transformer blocks and unimodal refinement blocks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo