Showing 10 open source projects for "visual-mingw"

View related business solutions
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    R1-V

    R1-V

    Witness the aha moment of VLM with less than $3

    R1-V is an initiative aimed at enhancing the generalization capabilities of Vision-Language Models (VLMs) through Reinforcement Learning in Visual Reasoning (RLVR). The project focuses on building a comprehensive framework that emphasizes algorithm enhancement, efficiency optimization, and task diversity to achieve general vision-language intelligence and visual/GUI agents. The team's long-term goal is to contribute impactful open-source research in this domain.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Screenshot to Code

    Screenshot to Code

    A neural network that transforms a design mock-up into static websites

    Screenshot-to-code is a tool or prototype that attempts to convert UI screenshots (e.g., of mobile or web UIs) into code representations, likely generating layouts, HTML, CSS, or markup from image inputs. It is part of a research/proof-of-concept domain in UI automation and image-to-UI code generation. Mapping visual design to code constructs. Code/UI layout (HTML, CSS, or markup). Examples/demo scripts showing “image UI code”.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Phi-3-MLX

    Phi-3-MLX

    Phi-3.5 for Mac: Locally-run Vision and Language Models

    Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    MetaCLIP

    MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution

    ...It includes utilities to fine-tune vision-language embeddings, compute prompt or adapter updates, and benchmark across transfer and retention metrics. MetaCLIP is especially suited for real-world settings where a model must continuously incorporate new visual categories or domains over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 5
    VGGT

    VGGT

    [CVPR 2025 Best Paper Award] VGGT

    VGGT is a transformer-based framework aimed at unifying classic visual geometry tasks—such as depth estimation, camera pose recovery, point tracking, and correspondence—under a single model. Rather than training separate networks per task, it shares an encoder and leverages geometric heads/decoders to infer structure and motion from images or short clips. The design emphasizes consistent geometric reasoning: outputs from one head (e.g., correspondences or tracks) reinforce others (e.g., pose or depth), making the system more robust to challenging viewpoints and textures. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    OculiX

    OculiX

    Visual Automation IDE — automate anything you see on screen

    ...Key features: - Guided step-by-step recorder with live code preview - Image recognition via OpenCV 4.10 - Dual OCR: Tesseract (built-in) + PaddleOCR (neural, high precision) - Local and remote automation via integrated VNC - SSH tunnels via embedded JSch - Cross-platform: Windows, macOS (Apple Silicon M1-M4), Linux - Scripting: Jython, JRuby, Java, PowerShell, AppleScript - Java 17 recommended (Java 8+ supported) - Full CI/CD with automated builds for all platforms Used worldwide for test automation, RPA, and visual regression testing. MIT License. Maintained by oculix-org.
    Leader badge
    Downloads: 159 This Week
    Last Update:
    See Project
  • 7
    LLaVA

    LLaVA

    Visual Instruction Tuning: Large Language-and-Vision Assistant

    Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    ConvNeXt

    ConvNeXt

    Code release for ConvNeXt model

    ...It revisits classic ResNet-style backbones through the lens of transformer design trends—large kernel sizes, inverted bottlenecks, layer normalization, and GELU activations—to bridge the performance gap between convolutions and attention-based models. ConvNeXt’s clean, hierarchical structure makes it efficient for both pretraining and fine-tuning across a wide range of visual recognition tasks. It achieves competitive or superior results on ImageNet and downstream datasets while being easier to deploy and train than transformers. The repository provides pretrained models, training recipes, and ablation studies demonstrating how incremental design choices collectively yield state-of-the-art performance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DensePose

    DensePose

    A real-time approach for mapping all human pixels of 2D RGB images

    ...The repository includes the DensePose network architecture, training code, pretrained models, and dataset tools for annotation and visualization. DensePose is widely used in augmented reality, motion capture, virtual try-on, and visual effects applications because it enables real-time 3D human mapping from 2D inputs. The model architecture builds on Mask R-CNN, using additional regression heads to predict UV coordinates that map image pixels to 3D surfaces.
    Downloads: 62 This Week
    Last Update:
    See Project
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 10
    PyTorch SimCLR

    PyTorch SimCLR

    PyTorch implementation of SimCLR: A Simple Framework

    For quite some time now, we know about the benefits of transfer learning in Computer Vision (CV) applications. Nowadays, pre-trained Deep Convolution Neural Networks (DCNNs) are the first go-to pre-solutions to learn a new task. These large models are trained on huge supervised corpora, like the ImageNet. And most important, their features are known to adapt well to new problems. This is particularly interesting when annotated training data is scarce. In situations like this, we take the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo