Showing 29 open source projects for "form"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    VibeVoice

    VibeVoice

    Open-source multi-speaker long-form text-to-speech model

    VibeVoice-1.5B is Microsoft’s frontier open-source text-to-speech (TTS) model designed for generating expressive, long-form, multi-speaker conversational audio such as podcasts. Unlike traditional TTS systems, it excels in scalability, speaker consistency, and natural turn-taking for up to 90 minutes of continuous speech with as many as four distinct speakers. A key innovation is its use of continuous acoustic and semantic speech tokenizers operating at an ultra-low frame rate of 7.5 Hz, enabling high audio fidelity with efficient processing of long sequences. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 2
    MOSS-TTS Family

    MOSS-TTS Family

    MOSS‑TTS Family open‑source speech and sound generation model

    MOSS-TTS is an open-source speech and sound generation model family built for high-fidelity, expressive, and production-oriented audio workflows. It covers long-form speech, voice cloning, multi-speaker dialogue, voice design, environmental sound effects, and real-time streaming TTS. The project is designed for complex real-world use cases where a single speech model may not be enough. Its flagship model focuses on stable long speech generation, multilingual and code-switched synthesis, pronunciation control, and zero-shot voice cloning. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    DeepGEMM

    DeepGEMM

    Clean and efficient FP8 GEMM kernels with fine-grained scaling

    ...Despite its lean design, it includes scaling strategies (fine-grained scaling) and optimizations inspired by cutting edge systems (drawing from ideas in CUTLASS, CuTe) but in a more streamlined form.
    Downloads: 44 This Week
    Last Update:
    See Project
  • 4
    GLM-4.6

    GLM-4.6

    Agentic, Reasoning, and Coding (ARC) foundation models

    GLM-4.6 is the latest iteration of Zhipu AI’s foundation model, delivering significant advancements over GLM-4.5. It introduces an extended 200K token context window, enabling more sophisticated long-context reasoning and agentic workflows. The model achieves superior coding performance, excelling in benchmarks and practical coding assistants such as Claude Code, Cline, Roo Code, and Kilo Code. Its reasoning capabilities have been strengthened, including improved tool usage during inference...
    Downloads: 47 This Week
    Last Update:
    See Project
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 5
    DeepSeek R1

    DeepSeek R1

    Open-source, high-performance AI model with advanced reasoning

    DeepSeek-R1 is an open-source large language model developed by DeepSeek, designed to excel in complex reasoning tasks across domains such as mathematics, coding, and language. DeepSeek R1 offers unrestricted access for both commercial and academic use. The model employs a Mixture of Experts (MoE) architecture, comprising 671 billion total parameters with 37 billion active parameters per token, and supports a context length of up to 128,000 tokens. DeepSeek-R1's training regimen uniquely...
    Downloads: 91 This Week
    Last Update:
    See Project
  • 6
    FireRedTTS-2

    FireRedTTS-2

    Long-form streaming TTS system for multi-speaker dialogue generation

    FireRedTTS2 is a next-generation open-source text-to-speech (TTS) system focused on long-form, streaming speech synthesis for multi-speaker dialogue, delivering stable natural speech with context-aware prosody and reliable speaker transitions that support real-time and conversational applications. It features a specialized streaming speech tokenizer and a dual-transformer architecture that enables low latency and high-quality synthesis, making it suitable for interactive systems like chatbots, podcasts, and applications where dynamic turn-taking between speakers is essential. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Open Infra Index

    Open Infra Index

    Production-tested AI infrastructure tools

    ...Instead of a single monolithic codebase, it functions more like an index or launching point: linking and documenting a set of library repos (e.g. FlashMLA, DeepEP, DeepGEMM, 3FS, etc.) that together form DeepSeek’s infrastructure stack. The repo's README describes the project as sharing “humble building blocks” of their online service—code that is documented, deployed, and battle-tested in production. The timing of its opening matches DeepSeek’s “Open-Source Week” campaign (starting around February 2025) when they gradually released internal infrastructure components publicly. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    GLM-4-Voice

    GLM-4-Voice

    GLM-4-Voice | End-to-End Chinese-English Conversational Model

    ...The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility applications. GLM-4-Voice builds upon the bilingual strengths of the GLM architecture, supporting both Chinese and English, and is designed to handle long-form conversations with context retention. The repository provides model weights, inference demos, and setup instructions for deploying speech-enabled AI systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Pearl

    Pearl

    A Production-ready Reinforcement Learning AI Agent Library

    Pearl is a production-ready reinforcement learning and contextual bandit agent library built for real-world sequential decision making. It is organized around modular components—policy learners, replay buffers, exploration strategies, safety modules, and history summarizers—that snap together to form reliable agents with clear boundaries and strong defaults. The library implements classic and modern algorithms across two regimes: contextual bandits (e.g., LinUCB, LinTS, SquareCB, neural bandits) and fully sequential RL (e.g., DQN, PPO-style policy optimization), with attention to practical concerns like nonstationarity and dynamic action spaces. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    GLM-4.6V

    GLM-4.6V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.6V represents the latest generation of the GLM-V family and marks a major step forward in multimodal AI by combining advanced vision-language understanding with native “tool-call” capabilities, long-context reasoning, and strong generalization across domains. Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    Vidi is a family of large multimodal models developed for deep video understanding and editing tasks, integrating vision, audio, and language to allow sophisticated querying and manipulation of video content. It’s designed to process long-form, real-world videos and answer complex queries such as “when in this clip does X happen?” or “where in the frame is object Y during that moment?” — offering temporal retrieval, spatio-temporal grounding (i.e. locating objects over time + space), and even video question answering. Vidi targets applications like intelligent video editing, automated video search, content analysis, and editing assistance, enabling users to efficiently locate relevant segments and objects in hours-long footage. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DiffRhythm

    DiffRhythm

    Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation

    ...Focused on music creation, it combines advanced AI techniques to produce coherent and creative audio compositions. The model utilizes a latent diffusion architecture, making it capable of producing high-quality, long-form music. It can be accessed on Huggingface, where users can interact with a demo or download the model for further use. DiffRhythm offers tools for both training and inference, and its flexibility makes it ideal for AI-based music production and research in music generation.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 14
    GLM-4-32B-0414

    GLM-4-32B-0414

    Open Multilingual Multimodal Chat LMs

    ...The model is pre-trained on 15 trillion tokens of high-quality data, including substantial synthetic reasoning datasets, and further enhanced with reinforcement learning and human preference alignment for improved instruction-following and function calling. Variants like GLM-Z1-32B-0414 offer deep reasoning and advanced mathematical problem-solving, while GLM-Z1-Rumination-32B-0414 specializes in long-form, complex research-style writing using scaled reinforcement learning and external search tools. Despite its large capacity, the model supports user-friendly local deployment and efficient fine-tuning with available scripts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    CycleGAN

    CycleGAN

    Software that can generate photos from paintings

    CycleGAN — in its original form — is a landmark in deep learning for image-to-image translation without paired data. Rather than requiring matching image pairs between source and target domains (which are often hard or impossible to obtain), CycleGAN learns two mappings — one from domain A to B, and another back from B to A — along with a cycle-consistency loss that encourages the round-trip to reconstruct the original image.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    SG2Im

    SG2Im

    Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201

    sg2im is a research codebase that learns to synthesize images from scene graphs—structured descriptions of objects and their relationships. Instead of conditioning on free-form text alone, it leverages graph structure to control layout and interactions, generating scenes that respect constraints like “person left of dog” or “cup on table.” The pipeline typically predicts object layouts (bounding boxes and masks) from the graph, then renders a realistic image conditioned on those layouts. This separation lets the model reason about geometry and composition before committing to texture and color, improving spatial fidelity. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Qwen2.5-14B-Instruct

    Qwen2.5-14B-Instruct

    Powerful 14B LLM with strong instruction and long-text handling

    Qwen2.5-14B-Instruct is a powerful instruction-tuned language model developed by the Qwen team, based on the Qwen2.5 architecture. It features 14.7 billion parameters and is optimized for tasks like dialogue, long-form generation, and structured output. The model supports context lengths up to 128K tokens and can generate up to 8K tokens, making it suitable for long-context applications. It demonstrates improved performance in coding, mathematics, and multilingual understanding across over 29 languages. Qwen2.5-14B-Instruct is built on a transformer backbone with RoPE, SwiGLU, RMSNorm, and attention QKV bias. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    layoutlm-base-uncased

    layoutlm-base-uncased

    Multimodal Transformer for document image understanding and layout

    ...LayoutLM enables better performance in tasks where the spatial arrangement of text plays a crucial role. The model uses a standard BERT-like architecture but enriches input with 2D positional embeddings. It achieves state-of-the-art results in form understanding and information extraction benchmarks. This model is particularly useful for document AI applications like document classification, question answering, and named entity recognition.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    QwQ-32B

    QwQ-32B

    QwQ-32B is a reasoning-focused language model for complex tasks

    QwQ-32B is a 32.8 billion parameter reasoning-optimized language model developed by Qwen as part of the Qwen2.5 family, designed to outperform conventional instruction-tuned models on complex tasks. Built with RoPE positional encoding, SwiGLU activations, RMSNorm, and Attention QKV bias, it excels in multi-turn conversation and long-form reasoning. It supports an extended context length of up to 131,072 tokens and incorporates supervised fine-tuning and reinforcement learning for enhanced instruction-following capabilities. The model is capable of structured thinking and delivers competitive performance against top models like DeepSeek-R1 and o1-mini. Recommended usage involves prompts starting with <think>\n, non-greedy sampling strategies, and support for standardized outputs on math and multiple-choice tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    ZAYA1-8B

    ZAYA1-8B

    Efficient MoE reasoning model for coding and math workloads

    ...The model contains 8.4B total parameters with around 760M active during inference, allowing it to achieve strong reasoning, mathematics, and coding performance while remaining lightweight enough for efficient local or on-device deployment. ZAYA1-8B is optimized for long-form reasoning and test-time compute workflows, making it particularly effective for mathematical problem solving, coding tasks, and advanced reasoning chains. It introduces architectural innovations such as Compressed Convolutional Attention, a novel MLP-based expert router, and learned residual scaling to improve routing stability and inference efficiency. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Qwen2.5-VL-3B-Instruct

    Qwen2.5-VL-3B-Instruct

    Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video

    ...As part of the Qwen2.5 series, it supports image-text-to-text generation with capabilities like chart reading, object localization, and structured data extraction. The model can serve as an intelligent visual agent capable of interacting with digital interfaces and understanding long-form videos by dynamically sampling resolution and frame rate. It uses a SwiGLU and RMSNorm-enhanced ViT architecture and introduces mRoPE updates for robust temporal and spatial understanding. The model supports flexible image input (file path, URL, base64) and outputs structured responses like bounding boxes or JSON, making it highly versatile in commercial and research settings. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Qwen2.5-VL-7B-Instruct

    Qwen2.5-VL-7B-Instruct

    Multimodal 7B model for image, video, and text understanding tasks

    ...It supports complex tasks like visual question answering, localization with bounding boxes, and structured output generation from documents. The model is also capable of video understanding with dynamic frame sampling and temporal reasoning, enabling it to analyze and respond to long-form videos. Built with an enhanced ViT architecture using window attention, SwiGLU, and RMSNorm, it aligns closely with Qwen2.5 LLM standards. The model demonstrates high performance across benchmarks like DocVQA, ChartQA, and MMStar, and even functions as a tool-using visual agent.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Jan-v1-edge

    Jan-v1-edge

    Jan-v1-edge: efficient 1.7B reasoning model optimized for edge devices

    Jan-v1-edge is a lightweight agentic language model developed by JanHQ, designed for fast and reliable on-device execution. It is the second release in the Jan Family and was distilled from the larger Jan-v1 model, retaining strong reasoning and problem-solving capabilities while reducing its computational footprint. The model was refined through a two-stage post-training process: Supervised Fine-Tuning (SFT) to transfer knowledge from Jan-v1, followed by Reinforcement Learning with...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Ministral 3 8B Reasoning 2512

    Ministral 3 8B Reasoning 2512

    Efficient 8B multimodal model tuned for advanced reasoning tasks.

    Ministral 3 8B Reasoning 2512 is a balanced midsize model in the Ministral 3 family, delivering strong multimodal reasoning capabilities within an efficient footprint. It combines an 8.4B-parameter language model with a 0.4B vision encoder, enabling it to process both text and images for advanced reasoning tasks. This version is specifically post-trained for reasoning, making it well-suited for math, coding, and STEM applications requiring multi-step logic and problem-solving. Despite its...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Ministral 3 14B Reasoning 2512

    Ministral 3 14B Reasoning 2512

    High-precision 14B multimodal model built for advanced reasoning tasks

    ...It maintains robust system-prompt adherence, supports dozens of languages, and provides native function calling with clean JSON output for agentic workflows. The model's architecture also delivers a 256k context window, unlocking large-document analysis and long-form reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo