Showing 168 open source projects for "visual\"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    LLaVA

    LLaVA

    Visual Instruction Tuning: Large Language-and-Vision Assistant

    Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    MGIE

    MGIE

    Guiding Instruction-based Image Editing via Multimodal Large Language

    MGIE—Guiding Instruction-based Image Editing—demonstrates how a multimodal LLM can parse natural-language editing instructions and then drive image transformations accordingly. The project focuses on making edits explainable and controllable: the model interprets text guidance, reasons over image content, and outputs edits aligned with user intent. It’s positioned as an ICLR 2024 Spotlight work, with code and references that show how to connect language planning to concrete image operations....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Computer vision projects

    Computer vision projects

    computer vision projects | Fun AI projects related to computer vision

    ...The repository provides examples that combine machine learning models with real-world applications such as robotic arms, video analysis, and automated visual measurement systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    RAGxplorer

    RAGxplorer

    Open-source tool to visualise your RAG

    ...However, RAG systems can be complex because they involve multiple components such as embedding models, vector databases, and retrieval algorithms. RAGxplorer provides visual tools that allow developers to inspect how documents are embedded, retrieved, and used to answer queries. The software can load documents, generate embeddings, and project them into reduced vector spaces so that users can visually explore relationships between queries and retrieved documents. It also includes interactive interfaces that show how retrieval affects the final output of the language model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    AI-Aimbot

    AI-Aimbot

    CS2, Valorant, Fortnite, APEX, every game

    ...The project emphasizes that it is intended for educational purposes to illustrate potential vulnerabilities in game design and anti-cheat systems. Because the system relies solely on visual detection rather than reading game memory, it attempts to bypass certain traditional anti-cheat detection methods.
    Downloads: 686 This Week
    Last Update:
    See Project
  • 6
    solo-learn

    solo-learn

    Library of self-supervised methods for visual representation

    A library of self-supervised methods for visual representation learning powered by Pytorch Lightning. A library of self-supervised methods for unsupervised visual representation learning powered by PyTorch Lightning. We aim at providing SOTA self-supervised methods in a comparable environment while, at the same time, implementing training tricks. The library is self-contained, but it is possible to use the models outside of solo-learn.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Style Aligned

    Style Aligned

    Official code for Style Aligned Image Generation via Shared Attention

    StyleAligned is a diffusion-model editing technique and codebase that preserves the visual “style” of an original image while applying new semantic edits driven by text. Instead of fully re-generating an image—and risking changes to lighting, texture, or rendering choices—the method aligns internal features across denoising steps so the target edit inherits the source style. This alignment acts like a constraint on the model’s evolution, steering composition, palette, and brushwork even as objects or attributes change. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    Taylorplot_Neptune

    Creation of a Taylorplot for several machine learning models

    Here we present the lines of code for creating a taylor plot with python to display several machine learning models. We show the solution for displaying 10 models, but the list and number can be changed simply by modifying the sample list.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    ConsistencyDecoder

    ConsistencyDecoder

    Consistency Distilled Diff VAE

    ...The project provides a simple API for encoding with a Stable Diffusion VAE and decoding using the new consistency model, allowing for side-by-side comparisons with traditional decoders. It demonstrates how consistency models can enhance visual fidelity while maintaining efficiency, reducing artifacts common in GAN-decoded outputs. The repository includes installation instructions, usage examples, and visual comparisons to highlight improvements. Though compact, it provides an accessible entry point for experimenting with advanced decoding strategies in diffusion-based generative models.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    Nougat

    Nougat

    Implementation of Nougat Neural Optical Understanding

    Nougat is a multi-modal generative modeling framework that bridges vision and text modalities with structured generation control (e.g. layout, scene composition) rather than treating images as flat contexts. It combines object-centric modules with transformer-based reasoning to propose, refine, and render scenes in a generative pipeline. The architecture allows you to specify or prompt a layout (which objects should be where) and then the model fills in appearance, context, lighting, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    hloc

    hloc

    Visual localization made easy with hloc

    This is hloc, a modular toolbox for state-of-the-art 6-DoF visual localization. It implements Hierarchical Localization, leveraging image retrieval and feature matching, and is fast, accurate, and scalable. This codebase won the indoor/outdoor localization challenges at CVPR 2020 and ECCV 2020, in combination with SuperGlue, our graph neural network for feature matching. We provide step-by-step guides to localize with Aachen, InLoc, and to generate reference poses for your own data using SfM. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    finetuner

    finetuner

    Task-oriented finetuning for better embeddings on neural search

    ...With Finetuner, you can easily enhance the performance of pre-trained models, making them production-ready without extensive labeling or expensive hardware. Create high-quality embeddings for semantic search, visual similarity search, cross-modal text image search, recommendation systems, clustering, duplication detection, anomaly detection, or other uses. Bring considerable improvements to model performance, making the most out of as little as a few hundred training samples, and finish fine-tuning in as little as an hour.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    TaskMatrix

    TaskMatrix

    Enable sending and receiving images during chatting

    TaskMatrix is an experimental AI ecosystem designed to connect large language models with visual foundation models, APIs, and external systems in order to complete multimodal tasks collaboratively. The project expands beyond traditional chatbot behavior by enabling AI systems to process, generate, edit, and reason about images while coordinating multiple specialized models simultaneously. Originally introduced alongside the Visual ChatGPT concept, TaskMatrix acts as an orchestration framework where a central language model delegates subtasks to domain-specific AI systems such as image generators, segmentation tools, or recognition models. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Stable Diffusion

    Stable Diffusion

    A latent text-to-image diffusion model

    ...The model operates by conditioning a diffusion process on text embeddings produced by a CLIP text encoder, enabling detailed and controllable image synthesis. It was trained on large-scale image datasets and later fine-tuned to produce 512×512 images with strong visual fidelity. Because the system runs efficiently on consumer hardware compared to earlier generative models, it helped popularize local AI image generation workflows. The repository includes reference scripts and model configurations that allow researchers and developers to reproduce, modify, or extend the architecture. Overall, stable-diffusion has become a foundational tool in the generative AI ecosystem for art creation, research, and multimodal experimentation.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 15
    BEVFormer

    BEVFormer

    Implementation of BEVFormer, a camera-only framework

    3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Yellowbrick

    Yellowbrick

    Visual analysis and diagnostic tools to facilitate ML selection

    Yellowbrick extends the Scikit-Learn API to make model selection and hyperparameter tuning easier. Under the hood, it’s using Matplotlib. Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the scikit-learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines scikit-learn with matplotlib in the best tradition of the scikit-learn documentation, but to produce visualizations for your machine learning workflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    ConvNeXt

    ConvNeXt

    Code release for ConvNeXt model

    ...It revisits classic ResNet-style backbones through the lens of transformer design trends—large kernel sizes, inverted bottlenecks, layer normalization, and GELU activations—to bridge the performance gap between convolutions and attention-based models. ConvNeXt’s clean, hierarchical structure makes it efficient for both pretraining and fine-tuning across a wide range of visual recognition tasks. It achieves competitive or superior results on ImageNet and downstream datasets while being easier to deploy and train than transformers. The repository provides pretrained models, training recipes, and ablation studies demonstrating how incremental design choices collectively yield state-of-the-art performance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Machine Learning Glossary

    Machine Learning Glossary

    Machine learning glossary

    Machine Learning Glossary is an open educational project that provides clear explanations of machine learning terminology and concepts through visual diagrams and concise definitions. The goal of the repository is to make machine learning topics easier to understand by presenting definitions alongside examples, visual illustrations, and references for further learning. It covers a wide range of topics including neural networks, regression models, optimization techniques, loss functions, and evaluation metrics. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    MAE (Masked Autoencoders)

    MAE (Masked Autoencoders)

    PyTorch implementation of MAE

    MAE (Masked Autoencoders) is a self-supervised learning framework for visual representation learning using masked image modeling. It trains a Vision Transformer (ViT) by randomly masking a high percentage of image patches (typically 75%) and reconstructing the missing content from the remaining visible patches. This forces the model to learn semantic structure and global context without supervision. The encoder processes only the visible patches, while a lightweight decoder reconstructs the full image—making pretraining computationally efficient. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    GLIDE (Text2Im)

    GLIDE (Text2Im)

    GLIDE: a diffusion-based text-conditional image synthesis model

    glide-text2im is an open source implementation of OpenAI’s GLIDE model, which generates photorealistic images from natural language text prompts. It demonstrates how diffusion-based generative models can be conditioned on text to produce highly detailed and coherent visual outputs. The repository provides both model code and pretrained checkpoints, making it possible for researchers and developers to experiment with text-to-image synthesis. GLIDE includes advanced techniques such as classifier-free guidance, which improves the quality and alignment of generated images with the input text. The project also offers sampling scripts and utilities for exploring how diffusion models can be applied to multimodal tasks. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    GANformer

    GANformer

    Generative Adversarial Transformers

    ...The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. The model iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes. In contrast to the classic transformer architecture, it utilizes multiplicative integration that allows flexible region-based modulation and can thus be seen as a generalization of the successful StyleGAN network. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Deep Feature Rotation Multimodal Image

    Deep Feature Rotation Multimodal Image

    Implementation of Deep Feature Rotation for Multimodal Image

    ...Our approach is a representative of the many ways of augmentation for intermediate feature embedding without consuming too much computational expense. Prepare your content image and style image. I provide some in the data/content and data/style and you can try to use them easily. We provide a visual comparison between other rotation angles that do not appear in the paper. The rotation angles will produce a very diverse number of outputs. This has proven the effectiveness of our method with other methods.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    DeepDanbooru

    DeepDanbooru

    AI based multi-label girl image classification system

    DeepDanbooru is a deep learning system designed to automatically tag anime-style images using neural networks trained on datasets derived from the Danbooru imageboard. The project focuses on multi-label image classification, where a model predicts multiple descriptive tags that represent visual elements in an image. These tags may include characters, styles, clothing, emotions, or other attributes associated with anime artwork. The system uses convolutional neural networks trained on large datasets of tagged images to learn relationships between visual features and textual labels. Because the Danbooru dataset contains millions of images with extensive annotations, it provides a valuable training resource for machine learning models specializing in illustration analysis. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    MoCo v3

    MoCo v3

    PyTorch implementation of MoCo v3

    MoCo v3 is a PyTorch reimplementation of Momentum Contrast v3 (MoCo v3), Facebook Research’s state-of-the-art self-supervised learning framework for visual representation learning using ResNet and Vision Transformer (ViT) backbones. Originally developed in TensorFlow for TPUs, this version faithfully reproduces the paper’s results on GPUs while offering an accessible and scalable PyTorch interface. MoCo v3 introduces improvements for training self-supervised ViTs by combining contrastive learning with transformer-based architectures, achieving strong linear and end-to-end fine-tuning performance on ImageNet benchmarks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    ML workspace

    ML workspace

    All-in-one web-based IDE specialized for machine learning

    ...This workspace is the ultimate tool for developers preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch, Keras, Sklearn) and dev tools (e.g., Jupyter, VS Code, Tensorboard) perfectly configured, optimized, and integrated. Usable as remote kernel (Jupyter) or remote machine (VS Code) via SSH. Easy to deploy on Mac, Linux, and Windows via Docker. Jupyter, JupyterLab, and Visual Studio Code web-based IDEs.By default, the workspace container has no resource constraints and can use as much of a given resource as the host’s kernel scheduler allows.
    Downloads: 2 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB