Showing 168 open source projects for "visual\"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Wan Move

    Wan Move

    Motion-controllable Video Generation via Latent Trajectory Guidance

    Wan Move is an open-source research codebase for motion-controllable video generation that focuses on enabling fine-grained control of motion within generative video models. It is designed to guide the temporal evolution of visual content by leveraging latent trajectory guidance, allowing users to manipulate how objects move over time without modifying the underlying generative architecture. By representing motion information as dense point trajectories and integrating them into the latent space of an image-to-video model, the project produces videos with more precise and controllable motion behavior than many existing methods. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Qwen3-VL-Embedding

    Qwen3-VL-Embedding

    Multimodal embedding and reranking models built on Qwen3-VL

    ...The reranking model then precisely scores relevance between a given query and candidate documents, enhancing retrieval accuracy in complex multimodal tasks. Together, they support advanced information retrieval workflows such as image-text search, visual question answering (VQA), and video-text matching, while providing out-of-the-box support for more than 30 languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    MiniMind-V

    MiniMind-V

    "Big Model" trains a visual multimodal VLM with 26M parameters

    MiniMind-V is an experimental open-source project that aims to train a very small multimodal vision–language model (VLM) from scratch with extremely low compute and cost, making research and experimentation accessible to more people. The repository showcases training workflows and code designed to produce a 26-million parameter model—including both image and text capabilities—using minimal resources in very little time, reflecting a trend toward democratizing AI research. MiniMind-V combines...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Stable Virtual Camera

    Stable Virtual Camera

    Stable Virtual Camera: Generative View Synthesis with Diffusion Models

    ...Unlike traditional methods that require complex reconstruction or scene-specific optimization, this model allows users to generate novel views from any number of input images and define custom camera trajectories, enabling dynamic exploration of scenes. It supports various aspect ratios and can produce 3D-consistent videos up to 1,000 frames, making it a versatile tool for creators seeking to enhance visual storytelling. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    PaddleX

    PaddleX

    PaddlePaddle End-to-End Development Toolkit

    PaddleX is a deep learning full-process development tool based on the core framework, development kit, and tool components of Paddle. It has three characteristics opening up the whole process, integrating industrial practice, and being easy to use and integrate. Image classification and labeling is the most basic and simplest labeling task. Users only need to put pictures belonging to the same category in the same folder. When the model is trained, we need to divide the training set, the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    zvt

    zvt

    Modular quant framework

    ...Your world is built by core concepts inside you, so it’s you. zvt world is built by core concepts inside the market, so it’s zvt. The core concept of the system is visual, and the name of the interface corresponds to it one-to-one, so it is also uniform and extensible. You can write and run the strategy in your favorite ide, and then view its related targets, factor, signal and performance on the UI. Once you are familiar with the core concepts of the system, you can apply it to any target in the market.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    MolmoWeb

    MolmoWeb

    Open multimodal web agent built by Ai2

    ...Unlike traditional automation tools that rely on structured HTML parsing or predefined APIs, MolmoWeb operates directly from screenshots of web pages, interpreting visual content in the same way a human user would. This approach allows it to generalize across different websites without requiring site-specific integrations, making it highly adaptable to diverse web environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Diffusion for World Modeling

    Diffusion for World Modeling

    Learning agent trained in a diffusion world model

    ...Instead of interacting directly with a real environment, the reinforcement learning agent learns within a generative model that produces frames representing the environment. This approach allows training to occur in a simulated world that captures detailed visual dynamics while reducing the need for costly interactions with real environments. The system has been applied to tasks such as Atari game simulations and demonstrations involving complex environments like first-person shooter games.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    FireRed-Image-Edit

    FireRed-Image-Edit

    General-purpose image editing model that delivers high-fidelity

    ...It is built on a flexible text-to-image foundation model that has been extended with training paradigms including pretraining, supervised fine-tuning, and reinforcement learning to imbue the system with strong instruction following and editing consistency. The model excels in maintaining visual and text stylistic fidelity, allowing users to preserve the original artistic qualities of an image while applying creative changes according to natural language instructions. In addition to editing single images, FireRed supports multi-image editing scenarios such as virtual try-on or batch transformations, making it suitable for both creative and practical workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    ticket

    ticket

    Fast, powerful, git-native ticket tracking in a single bash script

    ...It stores each ticket as a Markdown file with YAML frontmatter, making them human-readable and easy to version control alongside your code, while also allowing IDEs to jump straight to ticket definitions. The CLI provides common subcommands to create, list, edit, close, and manage dependencies between tickets, enabling clear hierarchical task structures and visual dependency trees. Its design is rooted in the Unix philosophy of simplicity, composability, and transparency, meaning it integrates well with other standard tools like grep, jq, and ripgrep when installed. Teams can use ticket to track bugs, features, chores, and epics with priority levels and tags, all by staying within the terminal and Git ecosystem.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Oasis

    Oasis

    Inference script for Oasis 500M

    Open-Oasis provides inference code and released weights for Oasis 500M, an interactive world model that generates gameplay frames conditioned on user keyboard input. Instead of rendering a pre-built game world, the system produces the next visual state via a diffusion-transformer approach, effectively “imagining” the world response to your actions in real time. The project focuses on enabling action-conditional frame generation so developers can experiment with interactive, model-generated environments rather than static video generation alone. Because it’s an inference-focused repository, it’s especially useful as a practical reference for running the model, wiring inputs, and producing the autoregressive sequence of gameplay frames. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    MetaCLIP

    MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution

    ...It includes utilities to fine-tune vision-language embeddings, compute prompt or adapter updates, and benchmark across transfer and retention metrics. MetaCLIP is especially suited for real-world settings where a model must continuously incorporate new visual categories or domains over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    VGGT

    VGGT

    [CVPR 2025 Best Paper Award] VGGT

    VGGT is a transformer-based framework aimed at unifying classic visual geometry tasks—such as depth estimation, camera pose recovery, point tracking, and correspondence—under a single model. Rather than training separate networks per task, it shares an encoder and leverages geometric heads/decoders to infer structure and motion from images or short clips. The design emphasizes consistent geometric reasoning: outputs from one head (e.g., correspondences or tracks) reinforce others (e.g., pose or depth), making the system more robust to challenging viewpoints and textures. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    MiniMax-01

    MiniMax-01

    Large-language-model & vision-language-model based on Linear Attention

    MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    LLaMA-Mesh

    LLaMA-Mesh

    Unifying 3D Mesh Generation with Language Models

    ...By serializing 3D geometry into text tokens, the approach allows existing transformer architectures to generate and interpret 3D models without requiring specialized visual tokenizers. The project includes a supervised fine-tuning dataset composed of interleaved text and mesh data, allowing the model to learn relationships between textual descriptions and 3D structures. As a result, the model can generate mesh models directly from text prompts, explain mesh structures in natural language, or output mixed text-and-mesh sequences. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Map-Anything

    Map-Anything

    MapAnything: Universal Feed-Forward Metric 3D Reconstruction

    Map-Anything is a universal, feed-forward transformer for metric 3D reconstruction that predicts a scene’s geometry and camera parameters directly from visual inputs. Instead of stitching together many task-specific models, it uses a single architecture that supports a wide range of 3D tasks—multi-image structure-from-motion, multi-view stereo, monocular metric depth, registration, depth completion, and more. The model flexibly accepts different input combinations (images, intrinsics, poses, sparse or dense depth) and produces a rich set of outputs including per-pixel 3D points, camera intrinsics, camera poses, ray directions, confidence maps, and validity masks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    AI-Codereview-Gitlab

    AI-Codereview-Gitlab

    GitLab automatic code review tool based on large models

    AI-Codereview-Gitlab is an open-source automation tool that integrates large language models into the GitLab development workflow to perform automated code reviews. The system monitors GitLab repositories and analyzes commits or merge requests using AI models to identify potential issues, coding mistakes, and quality improvements before the code is merged. By leveraging multiple large language model providers—including OpenAI, DeepSeek, ZhipuAI, or local models through Ollama—the platform...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    InfiniteYou

    InfiniteYou

    Flexible Photo Recrafting While Preserving Your Identity

    ...Using an architecture built around diffusion transformers (DiTs), InfiniteYou introduces a component called InfuseNet that injects identity features derived from reference images into the generation process — via residual connections — so that the output matches the person’s identity closely, without sacrificing visual quality or text-image alignment. The team uses a multi-stage training strategy with synthetic multi-sample data per identity to fine-tune for both identity consistency and aesthetic quality. Compared to prior methods, InfiniteYou significantly improves on identity similarity, text-prompt adherence, overall image quality, and avoids common problems such as face copy-pasting artifacts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Qwen-VL

    Qwen-VL

    Chat & pretrained large vision language model

    Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    airda

    airda

    airda(Air Data Agent

    airda(Air Data Agent) is a multi-smart body for data analysis, capable of understanding data development and data analysis needs, understanding data, generating data-oriented queries, data visualization, machine learning and other tasks of SQL and Python codes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Luna AI

    Luna AI

    Virtual AI anchor that combines state-of-the-art technology

    Luna AI is a virtual AI streamer framework designed to power an interactive VTuber that can go live on major platforms and chat with viewers in real time. It is built around a core assistant persona called “Luna AI,” which can be driven by a wide range of large language models and platforms, including GPT-style APIs, Claude, LangChain-based backends, ChatGLM, Kimi, Ollama, and many others. The project supports multiple rendering backends for the avatar, such as Live2D, Unreal Engine (UE),...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    OculiX

    OculiX

    Visual Automation IDE — automate anything you see on screen

    ...Key features: - Guided step-by-step recorder with live code preview - Image recognition via OpenCV 4.10 - Dual OCR: Tesseract (built-in) + PaddleOCR (neural, high precision) - Local and remote automation via integrated VNC - SSH tunnels via embedded JSch - Cross-platform: Windows, macOS (Apple Silicon M1-M4), Linux - Scripting: Jython, JRuby, Java, PowerShell, AppleScript - Java 17 recommended (Java 8+ supported) - Full CI/CD with automated builds for all platforms Used worldwide for test automation, RPA, and visual regression testing. MIT License. Maintained by oculix-org.
    Leader badge
    Downloads: 40 This Week
    Last Update:
    See Project
  • 24
    AnimateDiff

    AnimateDiff

    Plug-n-play module turning text-to-image models into animation

    ...This plug-and-play tool is compatible with a wide range of community models and facilitates the generation of animation directly from pre-existing text-to-image models. It supports various configurations to create animations with different visual styles, providing flexibility and ease of use for developers and artists interested in exploring dynamic, AI-generated animations.
    Leader badge
    Downloads: 27 This Week
    Last Update:
    See Project
  • 25
    bitfarm-Archiv Document Management - DMS
    bitfarm-Archiv is a powerful Document Management (DMS), Enterprise Content Management (ECM) and Knowledge Management System (KMS) with Workflow Components. Help us! As we live in the internet age, the best thing, you can help, is to write a short statement about your scenario and your use of the DMS, along with your experiences and put it on your own website or in a blog or forum. It would help us best, if you can also add a hyperlink to our site http://www.bitfarm-archiv.com. By this...
    Downloads: 12 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB