Showing 31 open source projects for "size"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    FramePack

    FramePack

    Lets make video diffusion practical

    FramePack explores compact representations for sequences of image frames, targeting tasks where many near-duplicate frames carry redundant information. The idea is to “pack” frames by detecting shared structure and storing differences efficiently, which can accelerate training or inference on video-like data. By reducing I/O and memory bandwidth, datasets become lighter to load while models still see the essential temporal variation. The repository demonstrates both packing and unpacking...
    Downloads: 44 This Week
    Last Update:
    See Project
  • 3
    Z-Image

    Z-Image

    Image generation model with single-stream diffusion transformer

    ...The project includes several variants: Z-Image-Turbo, a distilled version optimized for speed and low resource consumption; Z-Image-Base, the full-capacity foundation model; and Z-Image-Edit, fine-tuned for image editing tasks. Despite its compact size, Z-Image produces outputs that closely rival those from much larger models — including strong rendering of bilingual (English and Chinese) text inside images, accurate prompt adherence, and good layout and composition.
    Downloads: 30 This Week
    Last Update:
    See Project
  • 4
    FlashMLA

    FlashMLA

    FlashMLA: Efficient Multi-head Latent Attention Kernels

    ...It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style. The library supports both BF16 and FP16 data types, and includes a paged KV cache implementation with a block size of 64 to efficiently manage memory during decoding. On very compute-bound settings, it can reach up to ~660 TFLOPS on H800 SXM5 hardware, while in memory-bound configurations it can push memory throughput to ~3000 GB/s. The team regularly updates it with performance improvements; for example, a 2025 update claims 5 % to 15 % gains on compute-bound workloads while maintaining API compatibility.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    GLM-4.1V

    GLM-4.1V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.1V — often referred to as a smaller / lighter version of the GLM-V family — offers a more resource-efficient option for users who want multimodal capabilities without requiring large compute resources. Though smaller in scale, GLM-4.1V maintains competitive performance, particularly impressive on many benchmarks for models of its size: in fact, on a number of multimodal reasoning and vision-language tasks it outperforms some much larger models from other families. It represents a trade-off: somewhat reduced capacity compared to 4.5V or 4.6V, but with benefits in terms of speed, deployability, and lower hardware requirements — making it especially useful for developers experimenting locally, building lightweight agents, or deploying on limited infrastructure. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    ...Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. The model’s multimodal capabilities allow it to reason across image and text content holistically, capturing structured and unstructured information from pages that include dense tables, seals, code snippets, and varied document graphics. GLM-OCR integrates a comprehensive SDK and inference toolchain that makes it easy for developers to install, invoke, and embed into production pipelines with simple commands or APIs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    MiniMind-O

    MiniMind-O

    A 0.1B Omni model trained from scratch

    ...It extends the MiniMind family by exploring a model that can handle text, audio, and image inputs while producing text and streaming speech outputs. The project is designed to make multimodal AI training more accessible by keeping the model size small enough for ordinary personal hardware. It includes both mini and full training data paths, allowing learners to run a complete workflow quickly or reproduce the released model setup more closely. The implementation emphasizes native PyTorch code instead of relying on high-level third-party abstractions. minimind-o is most useful for developers and researchers who want to understand how multimodal and speech-capable AI systems are built from the ground up.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Step3-VL-10B

    Step3-VL-10B

    Multimodal model achieving SOTA performance

    Step3-VL-10B is an open-source multimodal foundation model developed by StepFun AI that pushes the boundaries of what compact models can achieve by combining visual and language understanding in a single architecture. Despite having only about 10 billion parameters, it delivers performance that rivals or even surpasses much larger models (10×–20× larger) on a wide range of multimodal benchmarks covering reasoning, perception, and complex tasks, positioning it as one of the most powerful...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    GLM-4.5V

    GLM-4.5V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Tongyi DeepResearch

    Tongyi DeepResearch

    Tongyi Deep Research, the Leading Open-source Deep Research Agent

    ...It’s built to act like a research agent: synthesizing, reasoning, retrieving information via the web and documents, and backing its outputs with evidence. The model is about 30.5 billion parameters in size, though at any given token only ~3.3B parameters are active. It uses a mix of synthetic data generation, fine-tuning and reinforcement learning; supports benchmarks like web search, document understanding, question answering, “agentic” tasks; provides inference tools, evaluation scripts, and “web agent” style interfaces. The aim is to enable more autonomous, agentic models that can perform sustained knowledge gathering, reasoning, and synthesis across multiple modalities (web, files, etc.).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Grok-1

    Grok-1

    Open-source, high-performance Mixture-of-Experts large language model

    ...In March 2024, xAI released Grok-1's model weights and architecture under the Apache 2.0 license, making them openly accessible to developers. The accompanying GitHub repository provides JAX example code for loading and running the model. Due to its substantial size, utilizing Grok-1 requires a machine with significant GPU memory. The repository's MoE layer implementation prioritizes correctness over efficiency, avoiding the need for custom kernels. This is a full repo snapshot ZIP file of the Grok-1 code.
    Leader badge
    Downloads: 15 This Week
    Last Update:
    See Project
  • 13
    RoBERTa for Chinese

    RoBERTa for Chinese

    RoBERTa Chinese pre-training model: RoBERTa for Chinese

    ...It provides TensorFlow and PyTorch-compatible model releases trained on large-scale Chinese text. The project follows the main RoBERTa training ideas, including removing next sentence prediction, using more diverse data, training longer, increasing batch size, and tuning optimization settings. Its training data includes news, community discussion, encyclopedia content, and other broad Chinese text sources. The repository also describes whole word masking for Chinese and provides examples for loading and fine-tuning models on sentence-pair matching tasks. Overall, it is a useful pretrained model resource for developers who want stronger Chinese BERT-style representations for classification, matching, reading comprehension, and related NLP tasks.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    TimeSformer

    TimeSformer

    The official pytorch implementation of our paper

    TimeSformer is a vision transformer architecture for video that extends the standard attention mechanism into spatiotemporal attention. The model alternates attention along spatial and temporal dimensions (or designs variants like divided attention) so that it can capture both appearance and motion cues in video. Because the attention is global across frames, TimeSformer can reason about dependencies across long time spans, not just local neighborhoods. The official implementation in PyTorch...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Llama-3.2-1B-Instruct

    Llama-3.2-1B-Instruct

    Instruction-tuned 1.2B LLM for multilingual text generation by Meta

    ...The model supports eight primary languages (including English, Spanish, Hindi, and Thai) and was trained on a curated mix of publicly available online data, with a December 2023 knowledge cutoff. Llama-3.2-1B is lightweight enough for deployment on constrained devices like smartphones, using formats like SpinQuant and QLoRA to reduce model size and latency. Despite its small size, it performs competitively across benchmarks such as MMLU, ARC, and TLDR summarization. The model is distributed under the Llama 3.2 Community License, requiring attribution and adherence to Meta’s Acceptable Use Policy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    GLM-4.5-Air

    GLM-4.5-Air

    Compact hybrid reasoning language model for intelligent responses

    ...Open-sourced under the MIT license, it is commercially usable and integrates with transformers, vLLM, and SGLang inference frameworks. It includes FP8 variants for faster inference and reduced memory requirements. Despite its smaller size compared to full GLM-4.5, GLM-4.5-Air maintains high performance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    t5-small

    t5-small

    T5-Small: Lightweight text-to-text transformer for NLP tasks

    ...With only 60 million parameters, T5-Small is compact and suitable for fast inference or deployment in constrained environments. It was pretrained on the C4 dataset using both unsupervised denoising and supervised learning on tasks like sentiment analysis, NLI, and QA. Despite its size, it performs competitively across 24 NLP benchmarks, making it a strong candidate for prototyping and fine-tuning. T5-Small is compatible with major deep learning frameworks including PyTorch, TensorFlow, JAX, and ONNX. The model is open-source under the Apache 2.0 license and has wide support across Hugging Face's ecosystem.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Llama-3.2-1B

    Llama-3.2-1B

    Llama 3.2–1B: Multilingual, instruction-tuned model for mobile AI

    ...The model supports eight officially listed languages (including Spanish, German, Hindi, and Thai) but can be adapted to more. Llama 3.2-1B outperforms other open models in several benchmarks relative to its size and offers quantized versions for efficiency. It uses a refined transformer architecture with Grouped-Query Attention (GQA) and supports long context windows of up to 128k tokens.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    bart-large-cnn

    bart-large-cnn

    Summarization model fine-tuned on CNN/DailyMail articles

    ...Its architecture allows it to model both language understanding and generation tasks effectively. The model supports usage in PyTorch, TensorFlow, and JAX, and is integrated with the Hugging Face pipeline API for simple deployment. Due to its size and performance, it's widely used in real-world summarization applications such as news aggregation, legal document condensing, and content creation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    bge-base-en-v1.5

    bge-base-en-v1.5

    Efficient English embedding model for semantic search and retrieval

    bge-base-en-v1.5 is an English sentence embedding model from BAAI optimized for dense retrieval tasks, part of the BGE (BAAI General Embedding) family. It is a fine-tuned BERT-based model designed to produce high-quality, semantically meaningful embeddings for tasks like semantic similarity, information retrieval, classification, and clustering. This version (v1.5) improves retrieval performance and stabilizes similarity score distribution without requiring instruction-based prompts. With...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    DiffusionGemma

    DiffusionGemma

    NVFP4 DiffusionGemma model for fast multimodal text generation

    ...The model supports a 256K-token context window, configurable thinking mode, native function calling, structured JSON output, and multilingual inference across 35+ languages. The NVFP4 quantization reduces weights and activations from 16-bit to 4-bit, lowering disk size and GPU memory needs for vLLM deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Gemma 4 12B

    Gemma 4 12B

    Unified multimodal Gemma model for local coding and reasoning

    ...The model has 11.95B parameters, 48 layers, a 256K-token context window, and support for over 140 languages. It also includes configurable thinking modes, native system prompt support, function calling, and strong benchmark performance for its size. It is optimized for consumer GPUs, workstations, and streamlined local deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    GigaChat 3 Ultra

    GigaChat 3 Ultra

    High-performance MoE model with MLA, MTP, and multilingual reasoning

    GigaChat 3 Ultra is a flagship instruct-model built on a custom Mixture-of-Experts architecture with 702B total and 36B active parameters. It leverages Multi-head Latent Attention to compress the KV cache into latent vectors, dramatically reducing memory demand and improving inference speed at scale. The model also employs Multi-Token Prediction, enabling multi-step token generation in a single pass for up to 40% faster output through speculative and parallel decoding techniques. Its...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Jan-v1-edge

    Jan-v1-edge

    Jan-v1-edge: efficient 1.7B reasoning model optimized for edge devices

    Jan-v1-edge is a lightweight agentic language model developed by JanHQ, designed for fast and reliable on-device execution. It is the second release in the Jan Family and was distilled from the larger Jan-v1 model, retaining strong reasoning and problem-solving capabilities while reducing its computational footprint. The model was refined through a two-stage post-training process: Supervised Fine-Tuning (SFT) to transfer knowledge from Jan-v1, followed by Reinforcement Learning with...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Qwen-Image-Edit

    Qwen-Image-Edit

    An advanced bilingual image editing with semantic control

    ...The model excels at semantic edits like style transfer, object rotation, and novel view synthesis, while also handling precise appearance edits such as adding or removing elements without altering surrounding regions. A standout feature is its bilingual text editing in English and Chinese, which preserves original font, size, and style during modifications. Benchmarks confirm its state-of-the-art performance in image editing, establishing it as a reliable foundation for both artistic and practical tasks. Its applications span IP creation, meme generation, background changes, clothing edits, and fine corrections in artworks or calligraphy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo