Showing 67 open source projects for "x-art"

View related business solutions
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    CodeGeeX

    CodeGeeX

    CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)

    ...Developed with MindSpore and later made PyTorch-compatible, it is capable of multilingual code generation, cross-lingual code translation, code completion, summarization, and explanation. It has been benchmarked on HumanEval-X, a multilingual program synthesis benchmark introduced alongside the model, and achieves state-of-the-art performance compared to other open models like InCoder and CodeGen. CodeGeeX also powers IDE plugins for VS Code and JetBrains, offering features like code completion, translation, debugging, and annotation. The model supports Ascend 910 and NVIDIA GPUs, with optimizations like quantization and FasterTransformer acceleration for faster inference.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 2
    TRELLIS.2

    TRELLIS.2

    Native and Compact Structured Latents for 3D Generation

    TRELLIS.2 is a cutting-edge open-source model and codebase for high-fidelity 3D asset generation from 2D images, developed to push forward the state of the art in image-to-3D generation. At its core is a novel sparse voxel structure called O-Voxel that jointly encodes both geometry and surface appearance, enabling reconstruction and generation of complex 3D shapes with arbitrary topology, open surfaces, and physically based rendering (PBR) textures. The system leverages a large 4-billion-parameter architecture combining sparse 3D variational autoencoders with flow-matching transformers to produce fully textured 3D models at resolutions up to 1536³ voxels. ...
    Downloads: 47 This Week
    Last Update:
    See Project
  • 3
    Moondream

    Moondream

    Tiny vision language model

    ...It serves as both a playground for the author’s artistic curiosity and a resource for other creative coders interested in generative art techniques. The repository may include shaders, canvas/WebGL code, visual demos, and utilities that demonstrate how mathematical functions or noise patterns can be harnessed for compelling visuals.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    MiniMax-M2.5

    MiniMax-M2.5

    State of the art LLM and coding model

    MiniMax-M2.5 is a state-of-the-art foundation model extensively trained with reinforcement learning across hundreds of thousands of real-world environments. It delivers leading performance in coding, agentic tool use, search, and complex office workflows, achieving top benchmark scores such as 80.2% on SWE-Bench Verified and 76.3% on BrowseComp. Designed to reason efficiently and decompose tasks like an experienced architect, M2.5 plans features, structure, and system design before generating code. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    stable-diffusion.cpp

    stable-diffusion.cpp

    Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference

    stable-diffusion.cpp is a lightweight, high-performance implementation of Stable Diffusion and related generative models written entirely in portable C/C++, designed to run on virtually any device without heavy dependencies. It enables text-to-image and image-to-image generation, supports a growing set of models like SD1.x, SD2.x, SDXL, SD-Turbo, Qwen Image, and more, and is continually updated with support for cutting-edge model variants including video and image editing models. The project is built on the ggml backend, which allows efficient execution on CPUs and GPUs via backends like CUDA, Vulkan, Metal, OpenCL, and SYCL, making it suitable for everything from desktops to mobile devices. ...
    Downloads: 42 This Week
    Last Update:
    See Project
  • 6
    Perception Models

    Perception Models

    State-of-the-art Image & Video CLIP, Multimodal Large Language Models

    Perception Models is a state-of-the-art framework developed by Facebook Research for advanced image and video perception tasks. It introduces two primary components: the Perception Encoder (PE) for visual feature extraction and the Perception Language Model (PLM) for multimodal decoding and reasoning. The PE module is a family of vision encoders designed to excel in image and video understanding, surpassing models like SigLIP2, InternVideo2, and DINOv2 across multiple benchmarks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Granite 3.0 Language Models

    Granite 3.0 Language Models

    New set of lightweight state-of-the-art, open foundation models

    This repository introduces Granite 3.0 language models as lightweight, state-of-the-art open foundation models built to natively support multilinguality, coding, reasoning, and tool usage. A central goal is efficient deployment, including the potential to run on constrained compute resources while remaining useful for a broad span of enterprise tasks. The repo positions the models for both research and commercial use under an Apache-2.0 license, signaling permissive adoption paths. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Step-Video-T2V

    Step-Video-T2V

    State-of-the-art (SoTA) text-to-video pre-trained model

    Step-Video-T2V is a state-of-the-art text-to-video foundation model developed to generate videos from natural-language prompts; its 30B-parameter architecture is designed to produce coherent, temporally extended video sequences — up to around 204 frames — based on input text. Under the hood it uses a compressed latent representation (a Video-VAE) to reduce spatial and temporal redundancy, and a denoising diffusion (or similar) process over that latent space to generate smooth, plausible motion and visuals. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    LongCat-Image

    LongCat-Image

    Foundation model for image generation

    ...The model excels at both text-to-image generation and instruction-guided image editing, offering users versatile capabilities for creative and practical tasks—whether generating art, mockups, or adjusting existing visuals with fine control.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    DINOv3

    DINOv3

    Reference PyTorch implementation and models for DINOv3

    ...The model supports multiple backbone architectures, including Vision Transformers (ViT), and can handle larger image resolutions with improved stability during training. The learned embeddings generalize robustly across tasks like classification, retrieval, and segmentation without fine-tuning, showing state-of-the-art transfer performance among self-supervised models.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 11
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    ...It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a wide variety of OCR tasks, outperforming many traditional OCR systems and even other multimodal models on benchmark suites. HunyuanOCR handles complex documents: multi-column layouts, tables, mathematical formulas, mixed languages, handwritten or stylized fonts, receipts, tickets, and even video-frame subtitles. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    GLM-V

    GLM-V

    GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning

    ...The repository provides both GLM-4.5V and GLM-4.1V models, designed to advance beyond basic perception toward higher-level reasoning, long-context understanding, and agent-based applications. GLM-4.5V builds on the flagship GLM-4.5-Air foundation (106B parameters, 12B active), achieving state-of-the-art results on 42 benchmarks across image, video, document, GUI, and grounding tasks. It introduces hybrid training for broad-spectrum reasoning and a Thinking Mode switch to balance speed and depth of reasoning. GLM-4.1V-9B-Thinking incorporates reinforcement learning with curriculum sampling (RLCS) and Chain-of-Thought reasoning, outperforming models much larger in scale (e.g., Qwen-2.5-VL-72B) across many benchmarks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    TRIBE v2

    TRIBE v2

    A multimodal model for brain response prediction

    TRIBE v2 is a multimodal foundation model developed by Meta AI for predicting human brain activity from naturalistic stimuli such as video, audio, and text. It is designed for in-silico neuroscience, enabling researchers to model how the brain responds to complex real-world inputs. The system integrates state-of-the-art encoders—including LLaMA for text, V-JEPA for video, and Wav2Vec-BERT for audio—into a unified Transformer architecture. This combined representation is mapped onto the cortical surface to predict fMRI responses across thousands of brain regions. TRIBE v2 allows researchers to simulate and analyze brain activity without requiring direct human experiments. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 14
    AlphaGenome

    AlphaGenome

    Programmatic access to the AlphaGenome model

    ...The model analyzes DNA sequences of up to 1 million base pairs in length and can deliver predictions at single-base-pair resolution for most outputs. AlphaGenome achieves state-of-the-art performance across a range of genomic prediction benchmarks, including numerous diverse variant effect prediction tasks.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 15
    Step1X-Edit

    Step1X-Edit

    A SOTA open-source image editing model

    ...The model targets general-purpose editing: from object addition/removal, style changes, recoloring, retouching, background replacement, to complex transformations like changing lighting, mood, or art style. The authors trained it on a large curated dataset and benchmarked it on a newly introduced evaluation suite, showing that Step1X-Edit significantly outperforms previous open-source baselines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 17
    Kimi k1.5

    Kimi k1.5

    Scaling Reinforcement Learning with LLMs

    Kimi-k1.5 is an advanced open-source multimodal large-language model project that explores scaling reinforcement learning with long-context chains of thought, achieving performance that rivals or surpasses state-of-the-art models on benchmarks like LiveCodeBench, AIME, and MATH-500. The project emphasizes a simplistic yet powerful framework where the context window scales up to 128k tokens, enabling reasoning that resembles planning, reflection, and correction over a much longer sequence of data than typical models. By using techniques like partial rollouts to improve training efficiency and applying sophisticated policy optimization methods, the developers demonstrate that strong ability can emerge without relying on complex solutions like Monte Carlo tree search or value functions. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 18
    GLM-5.1

    GLM-5.1

    GLM-5: From Vibe Coding to Agentic Engineering

    GLM-5.1 is a next-generation large language model developed by Z.ai for advanced coding, reasoning, and long-horizon agentic engineering tasks. Built as the successor to GLM-5, the model significantly improves performance in software engineering benchmarks, repository generation, and real-world terminal-based workflows. GLM-5.1 is designed to remain effective over extended problem-solving sessions, allowing it to iteratively refine strategies, analyze failures, and sustain productivity...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 19
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice cloning — meaning it can mimic a target speaker’s voice from a short reference sample — making it versatile for multi-voice uses. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 20
    FLUX.2

    FLUX.2

    Official inference repo for FLUX.2 models

    FLUX.2 is a state-of-the-art open-weight image generation and editing model released by Black Forest Labs aimed at bridging the gap between research-grade capabilities and production-ready workflows. The model offers both text-to-image generation and powerful image editing, including editing of multiple reference images, with fidelity, consistency, and realism that push the limits of what open-source generative models have achieved.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 21
    JiT

    JiT

    PyTorch implementation of JiT

    JiT is an open-source PyTorch implementation of a state-of-the-art image diffusion model designed around a minimalist yet powerful architecture for pixel-level generative modeling, based on the paper Back to Basics: Let Denoising Generative Models Denoise. Rather than predicting noise, JiT models directly predict clean image data, which the research suggests aligns better with the manifold structure of natural images and leads to stronger generative performance at high resolution. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Qwen3-Coder

    Qwen3-Coder

    Qwen3-Coder is the code version of Qwen3

    ...Its flagship version, Qwen3-Coder-480B-A35B-Instruct, features a massive 480 billion-parameter Mixture-of-Experts architecture with 35 billion active parameters, delivering top-tier performance on coding and agentic tasks. This model sets new state-of-the-art benchmarks among open models for agentic coding, browser-use, and tool-use, matching performance comparable to leading models like Claude Sonnet. Qwen3-Coder supports an exceptionally long context window of 256,000 tokens, extendable to 1 million tokens using Yarn, enabling repository-scale code understanding and generation. It is capable of handling 358 programming languages, from common to niche, making it versatile for a wide range of development environments. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 23
    Miso TTS

    Miso TTS

    Miso TTS is an 8 billion, highly emotive text-to-speech model

    ...Designed for local deployment, it offers watermarking by default to help promote responsible use of generated audio. With its focus on emotive speech generation, Miso TTS delivers state-of-the-art performance for AI voice applications, virtual assistants, and conversational AI experiences.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 24
    NVIDIA Earth2Studio

    NVIDIA Earth2Studio

    Open-source deep-learning framework

    ...The toolkit makes it easy to run deterministic and ensemble forecasts, swap models interchangeably, and process large geophysical datasets with Xarray structures, enabling experimentation with state-of-the-art deep learning models for climate and atmospheric prediction. Users can extend Earth2Studio with optional model packs, advanced data interfaces, statistical operators, and backend integrations that support flexible workflows from simple tests to large-scale operational inference.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    rwkv.cpp

    rwkv.cpp

    INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

    Besides the usual FP32, it supports FP16, quantized INT4, INT5 and INT8 inference. This project is focused on CPU, but cuBLAS is also supported. RWKV is a novel large language model architecture, with the largest model in the family having 14B parameters. In contrast to Transformer with O(n^2) attention, RWKV requires only state from the previous step to calculate logits. This makes RWKV very CPU-friendly on large context lengths.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Auth0 Logo