Showing 373 open source projects for "visual%20scraper"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Total Network Visibility for Network Engineers and IT Managers Icon
    Total Network Visibility for Network Engineers and IT Managers

    Network monitoring and troubleshooting is hard. TotalView makes it easy.

    This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.
    Learn More
  • 1
    1D Visual Tokenization and Generation

    1D Visual Tokenization and Generation

    This repo contains the code for 1D tokenizer and generator

    The 1D Visual Tokenization and Generation project from ByteDance introduces a novel “one-dimensional” tokenizer designed for images: instead of representing images with large grids of 2D tokens (as in many prior generative/image-modeling systems), it compresses images into as few as 32 discrete tokens (or more, optionally) — thereby achieving a very compact, efficient representation that drastically speeds up generation and reconstruction while retaining strong fidelity.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Ren'Py

    Ren'Py

    The Ren'Py Visual Novel Engine

    ...The engine handles essential visual novel conventions like save and load systems, rollback to previous text, scene transitions, and UI menus, so creators can focus on the story and player experience. Because it’s built on Python and widely supported across platforms, Ren’Py games can run on Windows, macOS, Linux, mobile devices, and even in browsers with HTML5 builds, helping developers reach a broad audience.
    Downloads: 47 This Week
    Last Update:
    See Project
  • 3
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short phrase or exemplars, scaling to a vastly larger set of categories than traditional closed-set models. ...
    Downloads: 132 This Week
    Last Update:
    See Project
  • 4
    AIMr

    AIMr

    The best AI Aimbot for Fortnite, Valorant, CS2, R6, COD, Apex, & more

    ...The software includes various aiming enhancements, such as recoil control, silent aim, and prediction capabilities, aimed at making gameplay smoother and more competitive. AIMr also provides visual customization options like field-of-view displays and detection indicators, allowing players to tailor their interface. The system is compatible with games that use human-shaped models, and although it functions effectively out of the box, optimizing it with CUDA-accelerated OpenCV is recommended for maximum performance.
    Downloads: 644 This Week
    Last Update:
    See Project
  • The Game-Changing URL Builder For Your Marketing Campaigns Icon
    The Game-Changing URL Builder For Your Marketing Campaigns

    Saves time by automating campaign conventions and UTM Link creation

    CampaignTrackly is a SaaS-based platform that automates the process of adding tracking tags to digital links. The platform enables businesses to build a centralized, consistent and standardized link-tracking strategy in a simple and cost-effective way. It removes ambiguities, simplifies processes and empowers marketers to be in control of their campaign setup and reporting process without the need to use IT code or manual operations. Its ease of use promotes consistency and high adoption rates among platform users, resulting in enhanced insights into the customer journeys and marketing budget spend, which in turn, helps businesses optimize their promotions to increase revenue streams. The platform boasts over 45 automation features across standard Google Analytics, Adobe and Custom tags, as well as, extensive tag library management capabilities, friendly reporting functions, sophisticated team access level management and more.
    Learn More
  • 5
    R1-V

    R1-V

    Witness the aha moment of VLM with less than $3

    R1-V is an initiative aimed at enhancing the generalization capabilities of Vision-Language Models (VLMs) through Reinforcement Learning in Visual Reasoning (RLVR). The project focuses on building a comprehensive framework that emphasizes algorithm enhancement, efficiency optimization, and task diversity to achieve general vision-language intelligence and visual/GUI agents. The team's long-term goal is to contribute impactful open-source research in this domain.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Qwen-Image-Layered

    Qwen-Image-Layered

    Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

    Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image encodings alone. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7
    Asm-Dude

    Asm-Dude

    Visual Studio extension for syntax highlighting assembly

    Visual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window. Assembly syntax highlighting and code assistance for assembly source files and the disassembly window for Visual Studio 2015, 2017 and 2019. This extension can be found in the visual studio extensions gallery or download latest installer AsmDude.vsix (v1.9.6.14).
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    ComfyUI-LTXVideo

    ComfyUI-LTXVideo

    LTX-Video Support for ComfyUI

    ComfyUI-LTXVideo is a bridge between ComfyUI’s node-based generative workflow environment and the LTX-Video multimedia processing framework, enabling creators to orchestrate complex video tasks within a visual graph paradigm. Instead of writing code to apply effects, transitions, edits, and data flows, users can assemble nodes that represent video inputs, transformations, and outputs, letting them prototype and automate video production pipelines visually. This integration empowers non-programmers and rapid-iteration teams to harness the performance of LTX-Video while maintaining the clarity and flexibility of a dataflow graph model. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    CogVLM

    CogVLM

    A state-of-the-art open visual language model

    CogVLM is an open-source visual–language model suite—and its GUI-oriented sibling CogAgent—aimed at image understanding, grounding, and multi-turn dialogue, with optional agent actions on real UI screenshots. The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490×490 inputs; CogAgent-18B extends this to 1120×1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Next-generation security awareness training. Built for AI email phishing, vishing, smishing, and deepfakes. Icon
    Next-generation security awareness training. Built for AI email phishing, vishing, smishing, and deepfakes.

    Track your GenAI risk, run multichannel deepfake simulations, and engage employees with incredible security training.

    Assess how your company's digital footprint can be leveraged by cybercriminals. Identify the most at-risk individuals using thousands of public data points and take steps to proactively defend them.
    Learn More
  • 10
    SeedVR2 Upscaler ComfyUI

    SeedVR2 Upscaler ComfyUI

    Official SeedVR2 Video Upscaler for ComfyUI

    ComfyUI-SeedVR2 Video Upscaler is an open-source integration node for the ComfyUI workflow environment that brings the advanced SeedVR2 video upscaling and restoration model directly into visual AI pipelines. This project packages the SeedVR2 architecture as a custom node for ComfyUI, letting users upscale low-resolution video or imagery inside a node-based interface without needing to write code manually. The underlying SeedVR2 model is known for delivering high-quality video enhancement with strong temporal consistency and improved detail preservation by using diffusion-based techniques that are trained specifically on video sequences. ...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 11
    LLaVA

    LLaVA

    Visual Instruction Tuning: Large Language-and-Vision Assistant

    Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Minegrub

    Minegrub

    A Grub Theme in the style of Minecraft!

    A Grub Theme in the style of Minecraft. A Grub theme inspired by Minecraft, adding visual enhancements to the bootloader.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 13
    Barfi

    Barfi

    A Python visual Flow Based Programming library

    A Python visual Flow-Based Programming library that integrates into your existing workflow. Barfi is a Flow-Based Programming environment that provides a graphical programming interface. It is integratable into your existing Python workflows. A schema is built using barfi.Blocks. Then the schema is executed with barfi.ComputeEngine. Each barfi.Block has some properties that enable the FBP and schema building.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Qwen-VL

    Qwen-VL

    Chat & pretrained large vision language model

    Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    clangd

    clangd

    clangd language server

    clangd understands your C++ code and adds smart features to your editor: code completion, compile errors, definition, and more. clangd is a language server that can work with many editors via a plugin. Here’s Visual Studio Code with the clangd plugin, demonstrating code completion.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 16
    Janus

    Janus

    Unified Multimodal Understanding and Generation Models

    Janus is a sophisticated open-source project from DeepSeek AI that aims to unify both visual understanding and image generation in a single model architecture. Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Self-Operating Computer

    Self-Operating Computer

    A framework to enable multimodal models to operate a computer

    ...Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen. The framework supports features like Optical Character Recognition (OCR) and Set-of-Mark (SoM) prompting to enhance visual grounding capabilities. It is designed to be compatible with macOS, Windows, and Linux (with X server installed), and is released under the MIT license.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    FastVLM

    FastVLM

    This repository contains the official implementation of FastVLM

    ...The repository documents model variants, showcases head-to-head numbers against known baselines, and explains how the encoder integrates with common LLM backbones. Apple’s research brief frames FastVLM as targeting real-time or latency-sensitive scenarios, where lowering visual token pressure is critical to interactive UX. In short, it’s a practical recipe to make VLMs fast without exotic token-selection heuristics.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    HunyuanWorld 1.0

    HunyuanWorld 1.0

    Generating Immersive, Explorable, and Interactive 3D Worlds

    ...The architecture integrates panoramic proxy generation, semantic layering, and hierarchical 3D reconstruction to produce high-quality scene-scale 3D worlds from both text and images. HunyuanWorld-1.0 surpasses existing open-source methods in visual quality and geometric consistency, demonstrated by superior scores in BRISQUE, NIQE, Q-Align, and CLIP metrics.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 21
    RobotCode

    RobotCode

    RobotFramework support for Visual Studio Code

    An extension that brings support for RobotFramework to Visual Studio Code, including features like code completion, debugging, test explorer, refactoring and more! With RobotCode you can edit your code with auto-completion, code navigation, syntax checking and many more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    DeepSeek VL

    DeepSeek VL

    Towards Real-World Vision-Language Understanding

    DeepSeek-VL is DeepSeek’s initial vision-language model that anchors their multimodal stack. It enables understanding and generation across visual and textual modalities—meaning it can process an image + a prompt, answer questions about images, caption, classify, or reason about visuals in context. The model is likely used internally as the visual encoder backbone for agent use cases, to ground perception in downstream tasks (e.g. answering questions about a screenshot). The repository includes model weights (or pointers to them), evaluation metrics on standard vision + language benchmarks, and configuration or architecture files. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. The model supports text-to-video and image-to-video generation tasks with flexible resolution options suitable for various GPU hardware configurations. ...
    Downloads: 62 This Week
    Last Update:
    See Project
  • 24
    ManiSkill

    ManiSkill

    SAPIEN Manipulation Skill Framework

    ...Developed by Hao Su Lab, it focuses on robotic manipulation with diverse, high-quality 3D tasks designed to challenge perception, control, and planning in robotics. ManiSkill provides both low-level control and visual observation spaces for realistic learning scenarios.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    CogView4

    CogView4

    CogView4, CogView3-Plus and CogView3(ECCV 2024)

    CogView4 is the latest generation in the CogView series of vision-language foundation models, developed as a bilingual (Chinese and English) open-source system for high-quality image understanding and generation. Built on top of the GLM framework, it supports multimodal tasks including text-to-image synthesis, image captioning, and visual reasoning. Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets, enabling stronger alignment between textual prompts and generated visual content. It emphasizes bilingual usability, making it well-suited for cross-lingual multimodal applications. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next