Showing 559 open source projects for "visual-mingw"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    ...Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and linguistic patterns to produce candidate reconstructions. It accepts a variety of input formats, automatically identifies redacted regions, and then generates text suggestions that are presented alongside visual overlays so users can choose or refine outputs.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    Watermark-Removal

    Watermark-Removal

    Machine learning image inpainting task that removes watermarks

    Watermark-Removal repository is a machine learning project focused on removing visible watermarks from digital images using deep learning and image inpainting techniques. The system analyzes an image containing a watermark and attempts to reconstruct the underlying visual content so that the watermark is removed while preserving the original appearance of the image. The project uses neural network models inspired by research in contextual attention and gated convolution, which are methods commonly applied to image restoration tasks. Through these techniques, the model learns to identify regions of the image affected by the watermark and generate realistic replacements for the missing visual information. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. The model supports text-to-video and image-to-video generation tasks with flexible resolution options suitable for various GPU hardware configurations. ...
    Downloads: 61 This Week
    Last Update:
    See Project
  • 4
    AstronRPA

    AstronRPA

    Agent-ready RPA suite with visual workflow automation tools engine

    Astron RPA is an enterprise-grade robotic process automation platform designed to help organizations and developers build automated workflows for desktop and web applications. It provides a visual workflow designer that supports low-code and no-code development, allowing users to create automation processes through a drag-and-drop interface instead of writing extensive code. It enables automation of common desktop software and browser-based tasks, making it suitable for repetitive business operations and system integrations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • 5
    dnstwist

    dnstwist

    Detects phishing and lookalike domains using DNS fuzzing techniques

    ...Security teams can use the tool to discover potential threats where attackers attempt to deceive users with lookalike domains. dnstwist also helps detect phishing activity by comparing web page content and visual similarity between domains using fuzzy hashing and perceptual hashing techniques. By automating DNS fuzzing and analysis, it provides organizations with an additional source of targeted threat intelligence. The tool can output results in structured formats, making it easier to integrate with security workflows or further analyze suspicious domains.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    AppAgent

    AppAgent

    Multimodal Agents as Smartphone Users, an LLM-based multimodal agent

    AppAgent is an open-source multimodal agent framework designed to enable large language models to operate smartphone applications through natural interactions with graphical user interfaces. The system allows an AI agent to interpret visual information from the screen and translate natural language instructions into actions such as tapping, swiping, and navigating between application screens. Instead of requiring backend access to application APIs, the framework interacts with apps the same way a human user would, making it compatible with a wide variety of mobile applications. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    DeepWiki Open

    DeepWiki Open

    AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories

    ...Users can enter a repository URL and the system will clone the project, build semantic embeddings of its codebase, extract architecture and relationships, generate human-readable documentation, and produce visual diagrams to help explain complex code structure. DeepWiki’s output turns raw repositories into interactive, web-style wikis complete with navigable sections, diagrams, and contextual explanations, making it easier for developers and collaborators to understand unfamiliar code. It includes an “Ask” feature that lets users query the generated wiki using RAG-style retrieval, enabling interactive question-answering and exploration.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    VMZ (Video Model Zoo)

    VMZ (Video Model Zoo)

    VMZ: Model Zoo for Video Modeling

    The codebase was designed to help researchers and practitioners quickly reproduce FAIR’s results and leverage robust pre-trained backbones for downstream tasks. It also integrates Gradient Blending, an audio-visual modeling method that fuses modalities effectively (available in the Caffe2 implementation). Although VMZ is now archived and no longer actively maintained, it remains a valuable reference for understanding early large-scale video model training, transfer learning, and multimodal integration strategies that influenced modern architectures like SlowFast and X3D.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    DeepSeek VL2

    DeepSeek VL2

    Mixture-of-Experts Vision-Language Models for Advanced Multimodal

    ...or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to process visual inputs as context for downstream tasks. The repository includes evaluation results (e.g. image/text alignment scores, common VL benchmarks), configuration files, and model weights (where permitted). While the internal architecture details are not fully documented publicly, the repo suggests that VL2 introduces enhancements over prior vision-language models (e.g. better scaling, cross-modal attention, more robust alignment) to improve grounding and multimodal understanding.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 10
    LatentSync

    LatentSync

    Taming Stable Diffusion for Lip Sync

    ...The system leverages a U-Net diffusion backbone, with cross-attention of audio embeddings (via an audio encoder) and reference video frames to guide generation, and applies a set of loss functions (temporal, perceptual, sync-net based) to enforce lip-sync accuracy, visual fidelity, and temporal consistency. Over versions, LatentSync has improved temporal stability and lowered resource requirements — making inference more practical (e.g. 8 GB VRAM for earlier versions, somewhat higher for latest models).
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    DriveLM

    DriveLM

    Driving with Graph Visual Question Answering

    DriveLM is a research-oriented framework and dataset designed to explore how vision-language models can be integrated into autonomous driving systems. The project introduces a new paradigm called graph visual question answering that structures reasoning about driving scenes through interconnected tasks such as perception, prediction, planning, and motion control. Instead of treating autonomous driving as a purely sensor-driven pipeline, DriveLM frames it as a reasoning problem where models answer structured questions about the environment to guide decision making. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    LlamaGen

    LlamaGen

    Autoregressive Model Beats Diffusion

    LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation. The project explores how scaling autoregressive models and improving image tokenization techniques can produce competitive results compared with modern diffusion-based image generators. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    LISA

    LISA

    LISA: Reasoning Segmentation via Large Language Model

    ...The project introduces a framework where a large language model can interpret natural language instructions and produce segmentation masks that highlight relevant regions in an image. Instead of relying solely on predefined object categories, the model is capable of reasoning about complex textual queries and translating them into visual segmentation outputs. This approach allows the system to identify objects or regions in images based on semantic descriptions, contextual reasoning, and world knowledge. The model integrates multimodal capabilities by combining language understanding with visual perception so that text instructions guide the segmentation process. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    FastVLM

    FastVLM

    This repository contains the official implementation of FastVLM

    ...The repository documents model variants, showcases head-to-head numbers against known baselines, and explains how the encoder integrates with common LLM backbones. Apple’s research brief frames FastVLM as targeting real-time or latency-sensitive scenarios, where lowering visual token pressure is critical to interactive UX. In short, it’s a practical recipe to make VLMs fast without exotic token-selection heuristics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    ML Ferret

    ML Ferret

    Refer and Ground Anything Anywhere at Any Granularity

    Ferret is Apple’s end-to-end multimodal large language model designed specifically for flexible referring and grounding: it can understand references of any granularity (boxes, points, free-form regions) and then ground open-vocabulary descriptions back onto the image. The core idea is a hybrid region representation that mixes discrete coordinates with continuous visual features, so the model can fluidly handle “any-form” referring while maintaining precise spatial localization. The repo presents the vision-language pipeline, model assets, and paper resources that show how Ferret answers questions, follows instructions, and returns grounded outputs rather than just text. In practice, this enables tasks like “find that small red icon next to the chart and describe it” where both the linguistic reference and the visual region are ambiguous without fine spatial reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    MoCo (Momentum Contrast)

    MoCo (Momentum Contrast)

    Self-supervised visual learning using momentum contrast in PyTorch

    MoCo is an open source PyTorch implementation developed by Facebook AI Research (FAIR) for the papers “Momentum Contrast for Unsupervised Visual Representation Learning” (He et al., 2019) and “Improved Baselines with Momentum Contrastive Learning” (Chen et al., 2020). It introduces Momentum Contrast (MoCo), a scalable approach to self-supervised learning that enables visual representation learning without labeled data. The core idea of MoCo is to maintain a dynamic dictionary with a momentum-updated encoder, allowing efficient contrastive learning across large batches. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    GLM-4.6V

    GLM-4.6V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.6V represents the latest generation of the GLM-V family and marks a major step forward in multimodal AI by combining advanced vision-language understanding with native “tool-call” capabilities, long-context reasoning, and strong generalization across domains. Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    HunyuanWorld 1.0

    HunyuanWorld 1.0

    Generating Immersive, Explorable, and Interactive 3D Worlds

    ...The architecture integrates panoramic proxy generation, semantic layering, and hierarchical 3D reconstruction to produce high-quality scene-scale 3D worlds from both text and images. HunyuanWorld-1.0 surpasses existing open-source methods in visual quality and geometric consistency, demonstrated by superior scores in BRISQUE, NIQE, Q-Align, and CLIP metrics.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    GitDiagram

    GitDiagram

    AI tool that converts GitHub repositories into interactive diagrams

    GitDiagram is an open source web application designed to help developers quickly understand the structure and architecture of GitHub repositories by automatically generating interactive diagrams. It analyzes repository metadata such as the file tree and project documentation to build a visual representation of how different components of a project relate to one another. It uses an AI-powered pipeline to interpret repository structure and transform that information into system design diagrams rendered with Mermaid visualization. These diagrams provide a high-level overview of a codebase, making it easier for developers to explore unfamiliar projects or understand large and complex repositories. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Book5_Essentials-Probability-Statistics

    Book5_Essentials-Probability-Statistics

    The book 5 of statistics in simplicity

    Book5_Essentials-of-Probability-and-Statistics is a Visualize-ML educational volume that introduces the statistical and probabilistic concepts underpinning modern data analysis and machine learning. The repository explains topics such as distributions, sampling, inference, and uncertainty using visual demonstrations and intuitive narratives. Its teaching philosophy prioritizes conceptual clarity over heavy formalism, making statistical thinking more approachable for beginners. The material connects probability theory directly to real analytical workflows, helping learners understand how statistics supports predictive modeling. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    GPT-Image2-Skill

    GPT-Image2-Skill

    GPT Image 2 prompt gallery, image prompt library, agentic skill

    GPT-Image2-Skill is a prompt gallery, image prompt library, agent skill, and CLI for OpenAI image generation and editing workflows. It collects curated prompt examples with generated outputs so users can reuse strong visual patterns instead of starting from scratch. The project includes categories such as anime, gaming, cyberpunk, animation, character design, typography, illustration, watercolor, ink, pixel art, isometric scenes, product visuals, and food imagery. It can be installed as an agent skill for supported runtimes or used through a local CLI. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    machine-learning-refined

    machine-learning-refined

    Master the fundamentals of machine learning, deep learning

    machine-learning-refined is an educational repository designed to help students and practitioners understand machine learning algorithms through intuitive explanations and interactive examples. The project accompanies a series of textbooks and teaching materials that focus on making machine learning concepts accessible through visual demonstrations and simple code implementations. Instead of presenting algorithms purely through mathematical derivations, the repository emphasizes geometric intuition, visualization, and step-by-step experimentation. It includes Jupyter notebooks and scripts that illustrate core machine learning topics such as regression, classification, optimization methods, and neural networks. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Book1_Python-For-Beginners

    Book1_Python-For-Beginners

    The Iris Book: Addition, Subtraction, Multiplication, and Division

    Book1_Python-For-Beginners is the introductory volume of the Visualize-ML series, designed to teach Python programming to newcomers with no prior coding experience. The repository emphasizes clarity and gradual skill building, starting from fundamental syntax and moving toward practical programming patterns. It integrates visual aids and annotated code examples to help learners understand not just how Python works but why certain patterns are used. The material is structured to support self-paced learning, making it suitable for students, career switchers, and hobbyists. Because the book is part of a larger data science pathway, it also prepares readers for later work in visualization and machine learning. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Book3_Elements-of-Mathematics

    Book3_Elements-of-Mathematics

    From Addition, Subtraction, Multiplication, and Division to ML

    Book3_Elements-of-Mathematics is an open learning resource in the Visualize-ML collection that introduces core mathematical foundations required for modern data science and AI. The repository presents topics such as algebra, calculus fundamentals, and mathematical reasoning using a highly visual and beginner-friendly approach. Its goal is to reduce the intimidation barrier often associated with formal mathematics by combining diagrams, structured explanations, and applied examples. The content is organized progressively so learners can build confidence before moving into more advanced quantitative subjects. It is particularly useful for self-taught developers and students transitioning into technical fields that require mathematical literacy. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    VGGSfM

    VGGSfM

    VGGSfM: Visual Geometry Grounded Deep Structure From Motion

    VGGSfM is an advanced structure-from-motion (SfM) framework jointly developed by Meta AI Research (GenAI) and the University of Oxford’s Visual Geometry Group (VGG). It reconstructs 3D geometry, dense depth, and camera poses directly from unordered or sequential images and videos. The system combines learned feature matching and geometric optimization to generate high-quality camera calibrations, sparse/dense point clouds, and depth maps in standard COLMAP format. Version 2.0 adds support for dynamic scene handling, dense point cloud export, video-based reconstruction (1000+ frames), and integration with Gaussian Splatting pipelines. ...
    Downloads: 2 This Week
    Last Update:
    See Project
Auth0 Logo