Showing 16 open source projects for "code studio visual"

View related business solutions
  • Automate contact and company data extraction Icon
    Automate contact and company data extraction

    Build lead generation pipelines that pull emails, phone numbers, and company details from directories, maps, social platforms. Full API access.

    Generate leads at scale without building or maintaining scrapers. Use 10,000+ ready-made tools that handle authentication, pagination, and anti-bot protection. Pull data from business directories, social profiles, and public sources, then export to your CRM or database via API. Schedule recurring extractions, enrich existing datasets, and integrate with your workflows.
    Explore Apify Store
  • Pest Control Management Software Icon
    Pest Control Management Software

    Pocomos is a cloud-based field service solution that caters to businesses

    Built for the pest control industry, but also works great for Mosquito Control, Bin Cleaning, Window Washing, Solar Panel Cleaning, and other Home Service Businesses in need of an easy-to-use software that helps you simplify routing, scheduling, communications, payment processing, truck tracking, time tracking, and reporting.
    Learn More
  • 1
    Moondream

    Moondream

    Tiny vision language model

    Moondream is a creative code project and visual experimentation repository that explores generative graphics, aesthetic patterns, and interactive art through code. The project typically showcases procedural visualizations, algorithmic designs, and artistic experiments that push the boundaries of what can be expressed with programming languages and rendering frameworks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    SAM 3 (Segment Anything Model 3) is a unified foundation model for promptable segmentation in both images and videos, capable of detecting, segmenting, and tracking objects. It accepts both text prompts (open-vocabulary concepts like “red car” or “goalkeeper in white”) and visual prompts (points, boxes, masks) and returns high-quality masks, boxes, and scores for the requested concepts. Compared with SAM 2, SAM 3 introduces the ability to exhaustively segment all instances of an...
    Downloads: 120 This Week
    Last Update:
    See Project
  • 3
    ComfyUI-LTXVideo

    ComfyUI-LTXVideo

    LTX-Video Support for ComfyUI

    ComfyUI-LTXVideo is a bridge between ComfyUI’s node-based generative workflow environment and the LTX-Video multimedia processing framework, enabling creators to orchestrate complex video tasks within a visual graph paradigm. Instead of writing code to apply effects, transitions, edits, and data flows, users can assemble nodes that represent video inputs, transformations, and outputs, letting them prototype and automate video production pipelines visually. This integration empowers non-programmers and rapid-iteration teams to harness the performance of LTX-Video while maintaining the clarity and flexibility of a dataflow graph model. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 4
    Qwen3-VL

    Qwen3-VL

    Qwen3-VL, the multimodal large language model series by Alibaba Cloud

    Qwen3-VL is the latest multimodal large language model series from Alibaba Cloud’s Qwen team, designed to integrate advanced vision and language understanding. It represents a major upgrade in the Qwen lineup, with stronger text generation, deeper visual reasoning, and expanded multimodal comprehension. The model supports dense and Mixture-of-Experts (MoE) architectures, making it scalable from edge devices to cloud deployments, and is available in both instruction-tuned and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Most modern and flexible cloud platform for MLM companies Icon
    Most modern and flexible cloud platform for MLM companies

    ERP-class software for multi-level marketing

    For direct selling (MLM) companies, from startup to well established enterprises with millions of distributors across the world
    Learn More
  • 5
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. The model supports...
    Downloads: 64 This Week
    Last Update:
    See Project
  • 6
    FramePack

    FramePack

    Lets make video diffusion practical

    FramePack explores compact representations for sequences of image frames, targeting tasks where many near-duplicate frames carry redundant information. The idea is to “pack” frames by detecting shared structure and storing differences efficiently, which can accelerate training or inference on video-like data. By reducing I/O and memory bandwidth, datasets become lighter to load while models still see the essential temporal variation. The repository demonstrates both packing and unpacking...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 7
    LTX-2

    LTX-2

    Python inference and LoRA trainer package for the LTX-2 audio–video

    ...The framework targets both interactive graphical applications and media-rich experiences, making it a solid foundation for games, creative tools, or visualization systems that demand both performance and flexibility. While being low-level, it also provides sensible defaults and helper abstractions that reduce boilerplate and help teams maintain clear, maintainable code.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 8
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    GLM-4.6V

    GLM-4.6V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.6V represents the latest generation of the GLM-V family and marks a major step forward in multimodal AI by combining advanced vision-language understanding with native “tool-call” capabilities, long-context reasoning, and strong generalization across domains. Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Leverage AI to Automate Medical Coding Icon
    Leverage AI to Automate Medical Coding

    Medical Coding Solution

    As a healthcare provider, you should be paid promptly for the services you provide to patients. Slow, inefficient, and error-prone manual coding keeps you from the financial peace you deserve. XpertDox’s autonomous coding solution accelerates the revenue cycle so you can focus on providing great healthcare.
    Learn More
  • 10
    Oasis

    Oasis

    Inference script for Oasis 500M

    Open-Oasis provides inference code and released weights for Oasis 500M, an interactive world model that generates gameplay frames conditioned on user keyboard input. Instead of rendering a pre-built game world, the system produces the next visual state via a diffusion-transformer approach, effectively “imagining” the world response to your actions in real time. The project focuses on enabling action-conditional frame generation so developers can experiment with interactive, model-generated environments rather than static video generation alone. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MetaCLIP

    MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution

    MetaCLIP is a research codebase that extends the CLIP framework into a meta-learning / continual learning regime, aiming to adapt CLIP-style models to new tasks or domains efficiently. The goal is to preserve CLIP’s strong zero-shot transfer capability while enabling fast adaptation to domain shifts or novel class sets with minimal data and without catastrophic forgetting. The repository provides training logic, adaptation strategies (e.g. prompt tuning, adapter modules), and evaluation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Style Aligned

    Style Aligned

    Official code for Style Aligned Image Generation via Shared Attention

    StyleAligned is a diffusion-model editing technique and codebase that preserves the visual “style” of an original image while applying new semantic edits driven by text. Instead of fully re-generating an image—and risking changes to lighting, texture, or rendering choices—the method aligns internal features across denoising steps so the target edit inherits the source style. This alignment acts like a constraint on the model’s evolution, steering composition, palette, and brushwork even as...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    GLIDE (Text2Im)

    GLIDE (Text2Im)

    GLIDE: a diffusion-based text-conditional image synthesis model

    glide-text2im is an open source implementation of OpenAI’s GLIDE model, which generates photorealistic images from natural language text prompts. It demonstrates how diffusion-based generative models can be conditioned on text to produce highly detailed and coherent visual outputs. The repository provides both model code and pretrained checkpoints, making it possible for researchers and developers to experiment with text-to-image synthesis. GLIDE includes advanced techniques such as classifier-free guidance, which improves the quality and alignment of generated images with the input text. The project also offers sampling scripts and utilities for exploring how diffusion models can be applied to multimodal tasks. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    gpt-oss-20b

    gpt-oss-20b

    OpenAI’s compact 20B open model for fast, agentic, and local use

    GPT-OSS-20B is OpenAI’s smaller, open-weight language model optimized for low-latency, agentic tasks, and local deployment. With 21B total parameters and 3.6B active parameters (MoE), it fits within 16GB of memory thanks to native MXFP4 quantization. Designed for high-performance reasoning, it supports Harmony response format, function calling, web browsing, and code execution. Like its larger sibling (gpt-oss-120b), it offers adjustable reasoning depth and full chain-of-thought visibility...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    gpt-oss-120b

    gpt-oss-120b

    OpenAI’s open-weight 120B model optimized for reasoning and tooling

    ...The model supports fine-tuning, chain-of-thought reasoning, and structured outputs, making it ideal for complex workflows. It operates in OpenAI’s Harmony response format and can be deployed via Transformers, vLLM, Ollama, LM Studio, and PyTorch. Developers can control the reasoning level (low, medium, high) to balance speed and depth depending on the task. Released under the Apache 2.0 license, it enables both commercial and research applications. The model supports function calling, web browsing, and code execution, streamlining intelligent agent development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    OpenVLA 7B

    OpenVLA 7B

    Vision-language-action model for robot control via images and text

    OpenVLA 7B is a multimodal vision-language-action model trained on 970,000 robot manipulation episodes from the Open X-Embodiment dataset. It takes camera images and natural language instructions as input and outputs normalized 7-DoF robot actions, enabling control of multiple robot types across various domains. Built on top of LLaMA-2 and DINOv2/SigLIP visual backbones, it allows both zero-shot inference for known robot setups and parameter-efficient fine-tuning for new domains. The model...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next