Showing 488 open source projects for "visual\"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Book5_Essentials-Probability-Statistics

    Book5_Essentials-Probability-Statistics

    The book 5 of statistics in simplicity

    Book5_Essentials-of-Probability-and-Statistics is a Visualize-ML educational volume that introduces the statistical and probabilistic concepts underpinning modern data analysis and machine learning. The repository explains topics such as distributions, sampling, inference, and uncertainty using visual demonstrations and intuitive narratives. Its teaching philosophy prioritizes conceptual clarity over heavy formalism, making statistical thinking more approachable for beginners. The material connects probability theory directly to real analytical workflows, helping learners understand how statistics supports predictive modeling. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Elyra

    Elyra

    Elyra extends JupyterLab with an AI centric approach

    Elyra is a set of AI-centric extensions to JupyterLab Notebooks. The Elyra Getting Started Guide includes more details on these features. A version-specific summary of new features is located on the releases page.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    VMZ (Video Model Zoo)

    VMZ (Video Model Zoo)

    VMZ: Model Zoo for Video Modeling

    The codebase was designed to help researchers and practitioners quickly reproduce FAIR’s results and leverage robust pre-trained backbones for downstream tasks. It also integrates Gradient Blending, an audio-visual modeling method that fuses modalities effectively (available in the Caffe2 implementation). Although VMZ is now archived and no longer actively maintained, it remains a valuable reference for understanding early large-scale video model training, transfer learning, and multimodal integration strategies that influenced modern architectures like SlowFast and X3D.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Watermark Anything

    Watermark Anything

    Official implementation of Watermark Anything with Localized Messages

    ...Developed by Facebook Research, it provides a robust, flexible system that allows users to insert one or multiple watermarks within selected image regions while maintaining visual quality and recoverability. Unlike traditional watermarking methods that rely on uniform embedding, WAM supports spatially localized watermarks, enabling targeted protection of specific image regions or objects. The model is trained to balance imperceptibility, ensuring minimal visual distortion, with robustness against transformations and edits such as cropping or motion.
    Downloads: 1 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use. Hybrid architecture combining multimodal transformer blocks and unimodal refinement blocks. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    LatentSync

    LatentSync

    Taming Stable Diffusion for Lip Sync

    ...The system leverages a U-Net diffusion backbone, with cross-attention of audio embeddings (via an audio encoder) and reference video frames to guide generation, and applies a set of loss functions (temporal, perceptual, sync-net based) to enforce lip-sync accuracy, visual fidelity, and temporal consistency. Over versions, LatentSync has improved temporal stability and lowered resource requirements — making inference more practical (e.g. 8 GB VRAM for earlier versions, somewhat higher for latest models).
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    VisualGLM-6B

    VisualGLM-6B

    Chinese and English multimodal conversational language model

    VisualGLM-6B is an open-source multimodal conversational language model developed by ZhipuAI that supports both images and text in Chinese and English. It builds on the ChatGLM-6B backbone, with 6.2 billion language parameters, and incorporates a BLIP2-Qformer visual module to connect vision and language. In total, the model has 7.8 billion parameters. Trained on a large bilingual dataset — including 30 million high-quality Chinese image-text pairs from CogView and 300 million English pairs — VisualGLM-6B is designed for image understanding, description, and question answering. Fine-tuning on long visual QA datasets further aligns the model’s responses with human preferences. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Book1_Python-For-Beginners

    Book1_Python-For-Beginners

    The Iris Book: Addition, Subtraction, Multiplication, and Division

    Book1_Python-For-Beginners is the introductory volume of the Visualize-ML series, designed to teach Python programming to newcomers with no prior coding experience. The repository emphasizes clarity and gradual skill building, starting from fundamental syntax and moving toward practical programming patterns. It integrates visual aids and annotated code examples to help learners understand not just how Python works but why certain patterns are used. The material is structured to support self-paced learning, making it suitable for students, career switchers, and hobbyists. Because the book is part of a larger data science pathway, it also prepares readers for later work in visualization and machine learning. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Book3_Elements-of-Mathematics

    Book3_Elements-of-Mathematics

    From Addition, Subtraction, Multiplication, and Division to ML

    Book3_Elements-of-Mathematics is an open learning resource in the Visualize-ML collection that introduces core mathematical foundations required for modern data science and AI. The repository presents topics such as algebra, calculus fundamentals, and mathematical reasoning using a highly visual and beginner-friendly approach. Its goal is to reduce the intimidation barrier often associated with formal mathematics by combining diagrams, structured explanations, and applied examples. The content is organized progressively so learners can build confidence before moving into more advanced quantitative subjects. It is particularly useful for self-taught developers and students transitioning into technical fields that require mathematical literacy. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    Screenshot to Code

    Screenshot to Code

    A neural network that transforms a design mock-up into static websites

    Screenshot-to-code is a tool or prototype that attempts to convert UI screenshots (e.g., of mobile or web UIs) into code representations, likely generating layouts, HTML, CSS, or markup from image inputs. It is part of a research/proof-of-concept domain in UI automation and image-to-UI code generation. Mapping visual design to code constructs. Code/UI layout (HTML, CSS, or markup). Examples/demo scripts showing “image UI code”.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Super Magic

    Super Magic

    All-in-one AI productivity platform with agents, workflows, and IM

    ...Magic centers around a general-purpose AI agent system called Super Magic, which can autonomously understand tasks, plan actions, execute workflows, and perform error correction. Alongside this, Magic includes a visual workflow engine that enables users to design complex AI processes using a drag-and-drop interface without requiring extensive coding knowledge. It also provides an enterprise-grade instant messaging system that integrates AI conversations with internal communication, allowing teams to collaborate while leveraging intelligent assistants. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    ydata-profiling

    ydata-profiling

    Create HTML profiling reports from pandas DataFrame objects

    ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 14
    Web Dev for Beginners

    Web Dev for Beginners

    About 24 Lessons, 12 Weeks, Get Started as a Web Developer

    ...Each lesson includes a mix of pre-lecture quizzes, written content, assignments, challenges, and post-lecture quizzes to reinforce learning. The course also offers global accessibility with translations in more than 40 languages and built-in support for running in GitHub Codespaces or locally in Visual Studio Code. This makes it a practical and engaging way for beginners to gain a solid foundation in web development.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 15
    PS2 Cover

    PS2 Cover

    PS2 Covers Collection

    ...Its scale and completeness make it one of the most comprehensive resources for retro gaming visuals. Overall, ps2-covers enhances the user experience of emulation by adding organized and accessible visual metadata.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    dnstwist

    dnstwist

    Detects phishing and lookalike domains using DNS fuzzing techniques

    ...Security teams can use the tool to discover potential threats where attackers attempt to deceive users with lookalike domains. dnstwist also helps detect phishing activity by comparing web page content and visual similarity between domains using fuzzy hashing and perceptual hashing techniques. By automating DNS fuzzing and analysis, it provides organizations with an additional source of targeted threat intelligence. The tool can output results in structured formats, making it easier to integrate with security workflows or further analyze suspicious domains.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    AppAgent

    AppAgent

    Multimodal Agents as Smartphone Users, an LLM-based multimodal agent

    AppAgent is an open-source multimodal agent framework designed to enable large language models to operate smartphone applications through natural interactions with graphical user interfaces. The system allows an AI agent to interpret visual information from the screen and translate natural language instructions into actions such as tapping, swiping, and navigating between application screens. Instead of requiring backend access to application APIs, the framework interacts with apps the same way a human user would, making it compatible with a wide variety of mobile applications. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    GLM-4.5V

    GLM-4.5V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding, and long-document interpretation. GLM-4.5V emerged from a training framework that leverages scalable reinforcement learning (with curriculum sampling) to boost performance across tasks ranging from STEM problem solving to long-context reasoning, giving it broad applicability beyond narrow benchmarks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Sa2VA

    Sa2VA

    Official Repo For "Sa2VA: Marrying SAM2 with LLaVA

    Sa2VA is a cutting-edge open-source multi-modal large language model (MLLM) developed by ByteDance that unifies dense segmentation, visual understanding, and language-based reasoning across both images and videos. It merges the segmentation power of a state-of-the-art video segmentation model (based on SAM‑2) with the vision-language reasoning capabilities of a strong LLM backbone (derived from models like InternVL2.5 / Qwen-VL series), yielding a system that can answer questions about visual content, perform referring segmentation, and maintain temporal consistency across frames in video. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Microsoft Azure CLI

    Microsoft Azure CLI

    Azure command-line interface

    ...We support tab completion for groups, commands, and some parameters. You can use the --query parameter and the JMESPath query syntax to customize your output. With the Azure CLI Tools Visual Studio Code extension, you can create .azcli files and use these features. IntelliSense for commands and their arguments. Snippets for commands, inserting required arguments automatically. Run the current command in the integrated terminal. Run the current command and show its output in a side-by-side editor. Show documentation on mouse hover. ...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 21
    AdalFlow

    AdalFlow

    The library to build & auto-optimize LLM applications

    AdalFlow is a framework for building AI-powered automation workflows, enabling users to design and execute intelligent automation pipelines with minimal coding.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    ...The frontend part of Label Studio app lies in the frontend/ folder and written in React JSX. Multi-user labeling sign up and login, when you create an annotation it's tied to your account. Configurable label formats let you customize the visual interface to meet your specific labeling needs. Support for multiple data types including images, audio, text, HTML, time-series, and video.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 23
    Phi-3-MLX

    Phi-3-MLX

    Phi-3.5 for Mac: Locally-run Vision and Language Models

    Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 24
    Qtile

    Qtile

    A full-featured, hackable tiling window manager written in Python

    A full-featured, hackable tiling window manager written and configured in Python. Optimize your workflow by configuring your environment to fit how you work. Efficiently use screen real-estate by automatically arranging windows with minimal visual cruft. Save your wrists from RSI by ditching the mouse and driving with the keyboard. Qtile is simple, small, and extensible. It's easy to write your own layouts, widgets, and built-in commands. Qtile is written and configured entirely in Python. Leverage the full power and flexibility of the language to make it fit your needs. The Qtile community is active and growing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    ROMM

    ROMM

    A beautiful, powerful, self-hosted rom manager and player

    ...The launcher includes a powerful universal search that combs through installed apps, contacts, messages, and web results to deliver quick answers without switching contexts. Romm also supports widgets, customization options, and theme choices so users can tailor the visual experience to their preferences while maintaining performance and responsiveness. Privacy is a highlight, with local indexing and search functions that operate without sending data to external servers unless explicitly permitted.
    Downloads: 8 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB