Showing 487 open source projects for "visual\"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    Luigi

    Luigi

    Python module that helps you build complex pipelines of batch jobs

    Luigi is a Python (3.6, 3.7, 3.8, 3.9 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more. The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures will happen. These tasks can be anything, but are typically long running things like Hadoop...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Agent Sprite Forge

    Agent Sprite Forge

    Agent Skill for generating 2D sprite sheets and map, transparent PNG

    ...The system supports multi-frame sprite generation, animation sequencing, and transparent background rendering for easier integration into game engines. Its architecture is designed around automation and repeatability, enabling developers to generate large batches of visual assets through structured prompt workflows. Overall, agent-sprite-forge acts as an AI-assisted creative tool for accelerating 2D game art production and experimentation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    PyTorch3D

    PyTorch3D

    PyTorch3D is FAIR's library of reusable components for deep learning

    ...The library provides fast GPU-accelerated implementations of rendering pipelines, transformations, rasterization, and lighting—making it possible to compute gradients through full 3D rendering processes. Researchers use it for tasks like shape generation, reconstruction, view synthesis, and visual reasoning. PyTorch3D also includes utilities for loading, transforming, and sampling 3D assets, so models can be trained end-to-end from 2D supervision or partial data. Its modular design allows easy extension—components like differentiable rasterizers, mesh blending, or signed distance field (SDF) modules can be swapped or combined to test new architectures quickly.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Perf Book

    Perf Book

    The book "Performance Analysis and Tuning on Modern CPU"

    This project is a practical guide to performance analysis and tuning on modern CPUs, bridging microarchitecture details with hands-on profiling. It explains how caches, TLBs, prefetchers, branch predictors, and out-of-order execution influence real program speed, then connects those concepts to concrete optimization strategies. Readers learn how to design trustworthy benchmarks, avoid measurement traps (warmup, turbo, frequency scaling), and interpret hardware performance counters. The book...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 5
    OpenSwarm

    OpenSwarm

    Claude code for everything except coding

    ...The included agents can handle research, data analysis, slide decks, documents, images, videos, scheduling, messaging, and other productivity tasks. It is designed for outputs like pitch decks, market research, SEO content, quarterly reports, launch campaigns, visual assets, and multimedia projects. The project can connect to external services through integrations and can be customized into purpose-specific swarms for areas such as SEO, sales, marketing, finance, customer support, or research. Its main appeal is giving technical users a forkable, terminal-based framework for building agent teams that produce polished business and creative deliverables.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Viral-Clips-Crew

    Viral-Clips-Crew

    Your CrewAI Powered Video Editing Assistant

    ...The project focuses on content repurposing, helping users adapt long videos into formats suitable for platforms like TikTok and YouTube Shorts. Its modular design allows customization of each processing stage, including selection logic and visual formatting. Overall, it serves as a tool for automating short-form content creation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    AutoCrop-Vertical

    AutoCrop-Vertical

    Smart video converter using YOLOv8 and FFmpeg

    ...It uses computer vision techniques and AI models such as YOLOv8 to analyze each frame, detect subjects, and dynamically adjust cropping decisions. Instead of applying a static center crop, the system intelligently tracks people or key objects to preserve visual focus and composition. When cropping would degrade the scene, it can switch to alternative layouts such as letterboxing to maintain context. The tool integrates FFmpeg for encoding and rendering, ensuring efficient processing and compatibility with standard video workflows. It supports multiple output aspect ratios and quality settings, allowing customization for different platforms. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ComfyUI-HunyuanVideoWrapper

    ComfyUI-HunyuanVideoWrapper

    ComfyUI wrapper nodes for HunyuanVideo

    ...The system introduces specialized nodes such as text-image encoders that allow multiple image inputs to be referenced directly within prompts. This makes it possible to guide generation using both visual and textual context simultaneously. The wrapper is designed to fit seamlessly into ComfyUI pipelines, enabling chaining with other nodes for advanced workflows. It supports prompt-based referencing of images, where placeholders in text correspond to connected inputs, allowing fine control over generation behavior. The project is particularly useful for creators experimenting with multimodal AI video synthesis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    alive-progress

    alive-progress

    A new kind of Progress Bar, with real-time throughput, ETA

    alive-progress is an advanced Python progress bar library that introduces a highly animated and adaptive approach to tracking long-running tasks. Unlike traditional static progress indicators, it dynamically adjusts spinner speed and visual feedback based on actual throughput, giving users a more intuitive sense of activity. The library is designed with performance efficiency in mind, using multithreaded updates that minimize CPU overhead and terminal noise. It includes sophisticated ETA estimation powered by exponential smoothing algorithms, improving prediction accuracy for variable workloads. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    DeepSeek VL

    DeepSeek VL

    Towards Real-World Vision-Language Understanding

    DeepSeek-VL is DeepSeek’s initial vision-language model that anchors their multimodal stack. It enables understanding and generation across visual and textual modalities—meaning it can process an image + a prompt, answer questions about images, caption, classify, or reason about visuals in context. The model is likely used internally as the visual encoder backbone for agent use cases, to ground perception in downstream tasks (e.g. answering questions about a screenshot). The repository includes model weights (or pointers to them), evaluation metrics on standard vision + language benchmarks, and configuration or architecture files. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    dots.ocr

    dots.ocr

    Multilingual Document Layout Parsing in a Single Vision-Language Model

    ...It achieves state-of-the-art performance on document parsing benchmarks while maintaining a relatively compact model size, demonstrating efficiency without sacrificing accuracy. Beyond standard OCR tasks, it extends its capabilities to parse complex visual elements such as charts, diagrams, and web interfaces, converting them into structured outputs like SVG code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    SimpleHTR

    SimpleHTR

    Handwritten Text Recognition (HTR) system implemented with TensorFlow

    ...The project focuses on converting images of handwritten text into machine-readable digital text using neural networks. The system uses a combination of convolutional neural networks and recurrent neural networks to extract visual features and model sequential character patterns in handwriting. It also employs connectionist temporal classification (CTC) to align predicted character sequences with input images without requiring character-level segmentation. The repository provides code for training models, performing inference on handwritten text images, and evaluating recognition accuracy. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SteadyDancer

    SteadyDancer

    Harmonized and Coherent Human Image Animation

    ...The system can be used both in preprocessing pipelines for content creators and in live feedback loops for performers, giving dancers and videographers a tool to refine their visual outputs. It supports integration with standard video formats and includes customizable parameters so users can tune stabilization aggressiveness.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Context Engineering

    Context Engineering

    A frontier, first-principles handbook

    ...It takes inspiration from thought leaders like Andrej Karpathy and bridges theory with practical examples, offering structured guidance on context orchestration, memory, retrieval, and state control within AI workflows. With extensive materials drawn from research, surveys, and visual explanations, the project acts as both a learning resource and a reference for practitioners looking to improve model behavior by engineering richer inputs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Grounded-Segment-Anything

    Grounded-Segment-Anything

    Marrying Grounding DINO with Segment Anything & Stable Diffusion

    Grounded-Segment-Anything is a research-oriented project that combines powerful open-set object detection with pixel-level segmentation and subsequent creative workflows, effectively enabling detection, segmentation, and high-level vision tasks guided by free-form text prompts. The core idea behind the project is to pair Grounding DINO — a zero-shot object detector that can locate objects described by natural language — with Segment Anything Model (SAM), which can produce detailed masks for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Wan Move

    Wan Move

    Motion-controllable Video Generation via Latent Trajectory Guidance

    Wan Move is an open-source research codebase for motion-controllable video generation that focuses on enabling fine-grained control of motion within generative video models. It is designed to guide the temporal evolution of visual content by leveraging latent trajectory guidance, allowing users to manipulate how objects move over time without modifying the underlying generative architecture. By representing motion information as dense point trajectories and integrating them into the latent space of an image-to-video model, the project produces videos with more precise and controllable motion behavior than many existing methods. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Tally

    Tally

    Let agents classify your bank transactions

    Tally is an open-source, AI-assisted tool designed to automate the classification of personal financial transactions, helping users turn raw bank data into meaningful categories without manual tagging. At its core, Tally pairs a local rule engine with large language models so that an AI assistant (like Claude Code, Copilot, or any CLI agent) interprets, suggests, and categorizes expenses, savings, subscriptions, and income events based on your own rules and behavior. It generates...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Qwen3-VL-Embedding

    Qwen3-VL-Embedding

    Multimodal embedding and reranking models built on Qwen3-VL

    ...The reranking model then precisely scores relevance between a given query and candidate documents, enhancing retrieval accuracy in complex multimodal tasks. Together, they support advanced information retrieval workflows such as image-text search, visual question answering (VQA), and video-text matching, while providing out-of-the-box support for more than 30 languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    zvt

    zvt

    Modular quant framework

    ...Your world is built by core concepts inside you, so it’s you. zvt world is built by core concepts inside the market, so it’s zvt. The core concept of the system is visual, and the name of the interface corresponds to it one-to-one, so it is also uniform and extensible. You can write and run the strategy in your favorite ide, and then view its related targets, factor, signal and performance on the UI. Once you are familiar with the core concepts of the system, you can apply it to any target in the market.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Material Theme

    Material Theme

    A theme for Sublime Text 3 by Mattia Astorino

    This theme brings the Material Design visual language to your Sublime Text 3. If you have problems, first search for a similar issue and then report a new one. If you want to enable the white panels and inputs you can install the addon package through Package Control, search for "Material theme white panels". You have to disable it if you want to use the Lighter theme style.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    MolmoWeb

    MolmoWeb

    Open multimodal web agent built by Ai2

    ...Unlike traditional automation tools that rely on structured HTML parsing or predefined APIs, MolmoWeb operates directly from screenshots of web pages, interpreting visual content in the same way a human user would. This approach allows it to generalize across different websites without requiring site-specific integrations, making it highly adaptable to diverse web environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Diffusion for World Modeling

    Diffusion for World Modeling

    Learning agent trained in a diffusion world model

    ...Instead of interacting directly with a real environment, the reinforcement learning agent learns within a generative model that produces frames representing the environment. This approach allows training to occur in a simulated world that captures detailed visual dynamics while reducing the need for costly interactions with real environments. The system has been applied to tasks such as Atari game simulations and demonstrations involving complex environments like first-person shooter games.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    FireRed-Image-Edit

    FireRed-Image-Edit

    General-purpose image editing model that delivers high-fidelity

    ...It is built on a flexible text-to-image foundation model that has been extended with training paradigms including pretraining, supervised fine-tuning, and reinforcement learning to imbue the system with strong instruction following and editing consistency. The model excels in maintaining visual and text stylistic fidelity, allowing users to preserve the original artistic qualities of an image while applying creative changes according to natural language instructions. In addition to editing single images, FireRed supports multi-image editing scenarios such as virtual try-on or batch transformations, making it suitable for both creative and practical workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Unstract

    Unstract

    No-code LLM Platform to launch APIs and ETL Pipelines

    Unstract is a powerful open-source, no-code platform built to automate the extraction and structuring of unstructured documents using large language models and flexible workflows, enabling developers and data teams to turn messy files into organized JSON content without complex coding. It integrates a visual Prompt Studio environment where users can iteratively design extraction schemas, compare outputs from different models, and monitor costs and accuracy side by side, making it easier to refine prompts and extraction logic before deploying at scale. Unstract supports deploying structured extraction as REST API endpoints or embedding it into data engineering ETL pipelines, which allows it to plug directly into data warehouses, cloud storage, or downstream analytics systems. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    ticket

    ticket

    Fast, powerful, git-native ticket tracking in a single bash script

    ...It stores each ticket as a Markdown file with YAML frontmatter, making them human-readable and easy to version control alongside your code, while also allowing IDEs to jump straight to ticket definitions. The CLI provides common subcommands to create, list, edit, close, and manage dependencies between tickets, enabling clear hierarchical task structures and visual dependency trees. Its design is rooted in the Unix philosophy of simplicity, composability, and transparency, meaning it integrates well with other standard tools like grep, jq, and ripgrep when installed. Teams can use ticket to track bugs, features, chores, and epics with priority levels and tags, all by staying within the terminal and Git ecosystem.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB