Showing 43 open source projects for "cross-platform"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    Hunyuan3D-2.1

    Hunyuan3D-2.1

    From Images to High-Fidelity 3D Assets

    ...It supports both shape generation (mesh geometry) and texture generation modules. Physically Based Rendering texture synthesis to model realistic material effects, including reflections, subsurface scattering, etc. Cross-platform support (MacOS, Windows, Linux) via Python / PyTorch, including diffusers-style APIs.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 2
    LTX-2

    LTX-2

    Python inference and LoRA trainer package for the LTX-2 audio–video

    LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. The framework targets both interactive graphical applications and media-rich experiences, making it a solid foundation for games, creative tools, or visualization systems that demand both performance and flexibility. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 3
    PaddleOCR

    PaddleOCR

    Awesome multilingual OCR toolkits based on PaddlePaddle

    PaddleOCR offers exceptional, multilingual, and practical Optical Character Recognition (OCR) tools that can help users train better models and apply them into practice. Inspired by PaddlePaddle, PaddleOCR is an ultra lightweight OCR system, with multilingual recognition, digit recognition, vertical text recognition, as well as long text recognition. It features a PPOCR series of high-quality pre-trained models, which includes: ultra lightweight ppocr_mobile series models, general...
    Downloads: 62 This Week
    Last Update:
    See Project
  • 4
    OpenAI Realtime Embedded

    OpenAI Realtime Embedded

    Instructions on how to use the Realtime API on Microcontrollers

    ...The repo includes pointers to an ESP32 implementation (maintained as esp32 branch) and documentation that Espressif offers an official example in openai_demo. It does not appear to include a full cross-platform embedded SDK in the main branch (the core content is mostly links and minimal README), but acts as a launching point for integrating realtime on microcontrollers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    CodeGeeX

    CodeGeeX

    CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)

    CodeGeeX is a large-scale multilingual code generation model with 13 billion parameters, trained on 850B tokens across more than 20 programming languages. Developed with MindSpore and later made PyTorch-compatible, it is capable of multilingual code generation, cross-lingual code translation, code completion, summarization, and explanation. It has been benchmarked on HumanEval-X, a multilingual program synthesis benchmark introduced alongside the model, and achieves state-of-the-art performance compared to other open models like InCoder and CodeGen. CodeGeeX also powers IDE plugins for VS Code and JetBrains, offering features like code completion, translation, debugging, and annotation. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6
    CogView4

    CogView4

    CogView4, CogView3-Plus and CogView3(ECCV 2024)

    ...Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets, enabling stronger alignment between textual prompts and generated visual content. It emphasizes bilingual usability, making it well-suited for cross-lingual multimodal applications. The model also supports fine-tuning and downstream customization, extending its applicability to creative content generation, human–computer interaction, and research on vision-language alignment.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    DeepSeek VL2

    DeepSeek VL2

    Mixture-of-Experts Vision-Language Models for Advanced Multimodal

    ...While the internal architecture details are not fully documented publicly, the repo suggests that VL2 introduces enhancements over prior vision-language models (e.g. better scaling, cross-modal attention, more robust alignment) to improve grounding and multimodal understanding.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    DeepSeek Coder V2

    DeepSeek Coder V2

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models

    DeepSeek-Coder-V2 is the version-2 iteration of DeepSeek’s code generation models, refining the original DeepSeek-Coder line with improved architecture, training strategies, and benchmark performance. While the V1 models already targeted strong code understanding and generation, V2 appears to push further in both multilingual support and reasoning in code, likely via architectural enhancements or additional training objectives. The repository provides updated model weights, evaluation...
    Downloads: 43 This Week
    Last Update:
    See Project
  • 9
    CodeGeeX2

    CodeGeeX2

    CodeGeeX2: A More Powerful Multilingual Code Generation Model

    ...Its backend powers the CodeGeeX IDE plugins for VS Code, JetBrains, and other editors, offering developers interactive AI assistance with features like infilling and cross-file completion.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    BCEmbedding

    BCEmbedding

    Netease Youdao's open-source embedding and reranker models

    ...It includes an EmbeddingModel for semantic vector generation and a RerankerModel for refining and ordering search results. The project is optimized for bilingual and cross-lingual retrieval, especially across Chinese and English. It is used as a foundation for RAG systems such as QAnything and other Youdao products. The models are designed to work directly without fine-tuning across common business scenarios such as education, medicine, law, finance, literature, FAQs, textbooks, and general conversation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    FastSD CPU

    FastSD CPU

    Fast stable diffusion on CPU and AI PC

    FastSD CPU is an optimized fork of Stable Diffusion designed to run efficiently on CPUs and devices without dedicated GPUs by leveraging Latent Consistency Models and Adversarial Diffusion Distillation techniques that accelerate inference. It focuses on bringing fast text-to-image generation to mainstream hardware like desktop CPUs, lower-end laptops, or edge devices without requiring high-end graphics processors. The repository contains multiple interfaces including a desktop GUI for simple...
    Downloads: 38 This Week
    Last Update:
    See Project
  • 12
    HeartMuLa

    HeartMuLa

    A Family of Open Sourced Music Foundation Models

    HeartMuLa is the open-source library and reference implementation for the HeartMuLa family of music foundation models, designed to support both music generation and music-related understanding tasks in a cohesive stack. At the center is HeartMuLa, a music language model that generates music conditioned on inputs like lyrics and tags, with multilingual support that broadens the range of lyric-driven use cases. The project also includes HeartCodec, a music codec optimized for high...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 13
    Kimi K2.5

    Kimi K2.5

    Moonshot's most powerful AI model

    Kimi K2.5 is Moonshot AI’s open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed vision and text tokens. Based on a 1T-parameter Mixture-of-Experts (MoE) architecture with 32B activated parameters, it integrates advanced language reasoning with strong visual understanding. K2.5 supports both “Thinking” and “Instant” modes, enabling either deep step-by-step reasoning or low-latency responses depending on the task. Designed for...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 14
    FireRedTTS-2

    FireRedTTS-2

    Long-form streaming TTS system for multi-speaker dialogue generation

    ...It features a specialized streaming speech tokenizer and a dual-transformer architecture that enables low latency and high-quality synthesis, making it suitable for interactive systems like chatbots, podcasts, and applications where dynamic turn-taking between speakers is essential. FireRedTTS2 supports multilingual output and speaker flexibility, enabling scenarios that involve language switching, cross-lingual voice cloning, and expressive dialogue generation that maintains consistency over longer utterances.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    HY-World 2.0

    HY-World 2.0

    A Multi-Modal World Model for Reconstructing, Generating, Simulation

    ...The system also improves reconstruction from multi-view images and video by upgrading its feed-forward 3D prediction components and its memory-aware view generation process. Another major part of the project is WorldLens, a rendering platform designed for interactive exploration with an engine-agnostic architecture, automatic image-based lighting, collision detection, and support for character interaction.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    GLM-V

    GLM-V

    GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning

    GLM-V is an open-source vision-language model (VLM) series from ZhipuAI that extends the GLM foundation models into multimodal reasoning and perception. The repository provides both GLM-4.5V and GLM-4.1V models, designed to advance beyond basic perception toward higher-level reasoning, long-context understanding, and agent-based applications. GLM-4.5V builds on the flagship GLM-4.5-Air foundation (106B parameters, 12B active), achieving state-of-the-art results on 42 benchmarks across image,...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    Clay Foundation Model

    Clay Foundation Model

    The Clay Foundation Model - An open source AI model and interface

    ...It aims to serve as a foundational tool for environmental monitoring, research, and decision-making by integrating various data sources and offering an accessible platform for analysis.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    HunyuanVideo-Avatar

    HunyuanVideo-Avatar

    Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model

    HunyuanVideo-Avatar is a multimodal diffusion transformer (MM-DiT) model by Tencent Hunyuan for animating static avatar images into dynamic, emotion-controllable, and multi-character dialogue videos, conditioned on audio. It addresses challenges of motion realism, identity consistency, and emotional alignment. Innovations include a character image injection module, an Audio Emotion Module for transferring emotion cues, and a Face-Aware Audio Adapter to isolate audio effects on faces,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    LingBot-VLA

    LingBot-VLA

    A Pragmatic VLA Foundation Model

    LingBot-VLA is an open-source Vision-Language-Action (VLA) foundational AI model designed to serve as a general “brain” for real-world robotic manipulation by grounding multimodal perception and language into actionable motions. It has been pretrained on tens of thousands of hours of real robotic interaction data across multiple robot platforms, which enables it to generalize well to diverse morphologies and tasks without needing extensive retraining on each new bot. The model aims to bridge...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Qwen3-VL-Embedding

    Qwen3-VL-Embedding

    Multimodal embedding and reranking models built on Qwen3-VL

    Qwen3-VL-Embedding (with its companion Qwen3-VL-Reranker) is a state-of-the-art multimodal embedding and reranking model suite built on the open-sourced Qwen3-VL foundation, developed to handle diverse inputs including text, images, screenshots, and videos. The core embedding model maps such inputs into semantically rich vectors in a unified representation space, enabling similarity search, clustering, and cross-modal retrieval. The reranking model then precisely scores relevance between a given query and candidate documents, enhancing retrieval accuracy in complex multimodal tasks. Together, they support advanced information retrieval workflows such as image-text search, visual question answering (VQA), and video-text matching, while providing out-of-the-box support for more than 30 languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Sapiens

    Sapiens

    High-resolution models for human tasks

    ...It integrates sensory inputs such as vision, audio, and proprioception into a unified learning architecture that allows agents to understand and adapt to their surroundings dynamically. The project emphasizes long-horizon reasoning and cross-modal grounding—connecting language, perception, and action into a single agentic model capable of following abstract goals. It includes simulation environments, datasets, and benchmarks for testing grounded understanding, imitation learning, and decision-making. The system’s modular pipeline supports both imitation-based and reinforcement-based training strategies, allowing flexible experimentation with different embodiments and tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Core AI Models

    Core AI Models

    Model export recipes, Python primitives, and Swift runtime utilities

    ...It provides export recipes that convert supported open-source models into Core AI model files. It also includes Python primitives for authoring custom PyTorch models that are better suited for Apple platform deployment. The Swift package adds runtime utilities that help developers integrate exported models into macOS and iOS apps. The repository also contains agent skills that guide coding assistants through Core AI workflows, model authoring, and compression exploration. It is useful for developers who want a curated path from model preparation to local app integration on Apple hardware.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    FinGPT

    FinGPT

    Open-Source Financial Large Language Models

    ...It extends traditional GPT-style models by connecting them to live or historical financial datasets, news APIs, and economic indicators so that outputs are grounded in relevant and recent market conditions rather than generic knowledge alone. The platform typically includes tools for fine-tuning, context engineering, and prompt templating, enabling users to build specialized assistants for tasks like sentiment analysis, earnings summary generation, risk profiling, trading signal interpretation, and document extraction from financial reports.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    Seamless Communication

    Seamless Communication

    Foundational Models for State-of-the-Art Speech and Text Translation

    ...The system architecture includes a real-time multimodal signal pipeline for audio, video, and sensor data, a dialog manager that can decide when to act (speak, gesture, point) or query, and a cross-modal reasoning layer that fuses perception with semantic context. The research prototype includes components for visual grounding (understanding when a user references something in view), gesture recognition and synthesis, and turn-taking mechanisms that mirror human conversational timing. Because latency and synchronization are critical, the codebase invests in asynchronous scheduling, overlap of perception and reasoning, and fast fallback responses.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    NVIDIA Isaac GR00T

    NVIDIA Isaac GR00T

    NVIDIA Isaac GR00T N1.5 is the world's first open foundation model

    NVIDIA Isaac‑GR00T N1.5 is an open-source foundation model engineered for generalized humanoid robot reasoning and manipulation skills. It accepts multimodal inputs—such as language and images—and uses a diffusion transformer architecture built upon vision-language encoders, enabling adaptive robot behaviors across diverse environments. It is designed to be customizable via post-training with real or synthetic data. The vision-language model remains frozen during both pretraining and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo