27 projects for "size" with 2 filters applied:

  • Save Up to 91% on Cloud Compute With Spot VMs Icon
    Save Up to 91% on Cloud Compute With Spot VMs

    Automatic sustained-use discounts. One free VM per month. No negotiation needed.

    Run batch jobs at 60-91% off with Spot VMs. Long-running workloads get automatic discounts with sustained use.
    Try Free
  • Host LLMs in Production With On-Demand GPUs Icon
    Host LLMs in Production With On-Demand GPUs

    NVIDIA L4 GPUs. 5-second cold starts. Scale to zero when idle.

    Deploy your model, get an endpoint, pay only for compute time. No GPU provisioning or infrastructure management required.
    Try Free
  • 1
    Z-Image

    Z-Image

    Image generation model with single-stream diffusion transformer

    ...The project includes several variants: Z-Image-Turbo, a distilled version optimized for speed and low resource consumption; Z-Image-Base, the full-capacity foundation model; and Z-Image-Edit, fine-tuned for image editing tasks. Despite its compact size, Z-Image produces outputs that closely rival those from much larger models — including strong rendering of bilingual (English and Chinese) text inside images, accurate prompt adherence, and good layout and composition.
    Downloads: 30 This Week
    Last Update:
    See Project
  • 2
    FlashMLA

    FlashMLA

    FlashMLA: Efficient Multi-head Latent Attention Kernels

    ...It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style. The library supports both BF16 and FP16 data types, and includes a paged KV cache implementation with a block size of 64 to efficiently manage memory during decoding. On very compute-bound settings, it can reach up to ~660 TFLOPS on H800 SXM5 hardware, while in memory-bound configurations it can push memory throughput to ~3000 GB/s. The team regularly updates it with performance improvements; for example, a 2025 update claims 5 % to 15 % gains on compute-bound workloads while maintaining API compatibility.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    GLM-4.1V

    GLM-4.1V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.1V — often referred to as a smaller / lighter version of the GLM-V family — offers a more resource-efficient option for users who want multimodal capabilities without requiring large compute resources. Though smaller in scale, GLM-4.1V maintains competitive performance, particularly impressive on many benchmarks for models of its size: in fact, on a number of multimodal reasoning and vision-language tasks it outperforms some much larger models from other families. It represents a trade-off: somewhat reduced capacity compared to 4.5V or 4.6V, but with benefits in terms of speed, deployability, and lower hardware requirements — making it especially useful for developers experimenting locally, building lightweight agents, or deploying on limited infrastructure. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    ...Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. The model’s multimodal capabilities allow it to reason across image and text content holistically, capturing structured and unstructured information from pages that include dense tables, seals, code snippets, and varied document graphics. GLM-OCR integrates a comprehensive SDK and inference toolchain that makes it easy for developers to install, invoke, and embed into production pipelines with simple commands or APIs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    MiniMind-O

    MiniMind-O

    A 0.1B Omni model trained from scratch

    ...It extends the MiniMind family by exploring a model that can handle text, audio, and image inputs while producing text and streaming speech outputs. The project is designed to make multimodal AI training more accessible by keeping the model size small enough for ordinary personal hardware. It includes both mini and full training data paths, allowing learners to run a complete workflow quickly or reproduce the released model setup more closely. The implementation emphasizes native PyTorch code instead of relying on high-level third-party abstractions. minimind-o is most useful for developers and researchers who want to understand how multimodal and speech-capable AI systems are built from the ground up.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Step3-VL-10B

    Step3-VL-10B

    Multimodal model achieving SOTA performance

    Step3-VL-10B is an open-source multimodal foundation model developed by StepFun AI that pushes the boundaries of what compact models can achieve by combining visual and language understanding in a single architecture. Despite having only about 10 billion parameters, it delivers performance that rivals or even surpasses much larger models (10×–20× larger) on a wide range of multimodal benchmarks covering reasoning, perception, and complex tasks, positioning it as one of the most powerful...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    GLM-4.5V

    GLM-4.5V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Grok-1

    Grok-1

    Open-source, high-performance Mixture-of-Experts large language model

    ...In March 2024, xAI released Grok-1's model weights and architecture under the Apache 2.0 license, making them openly accessible to developers. The accompanying GitHub repository provides JAX example code for loading and running the model. Due to its substantial size, utilizing Grok-1 requires a machine with significant GPU memory. The repository's MoE layer implementation prioritizes correctness over efficiency, avoiding the need for custom kernels. This is a full repo snapshot ZIP file of the Grok-1 code.
    Leader badge
    Downloads: 15 This Week
    Last Update:
    See Project
  • 9
    RoBERTa for Chinese

    RoBERTa for Chinese

    RoBERTa Chinese pre-training model: RoBERTa for Chinese

    ...It provides TensorFlow and PyTorch-compatible model releases trained on large-scale Chinese text. The project follows the main RoBERTa training ideas, including removing next sentence prediction, using more diverse data, training longer, increasing batch size, and tuning optimization settings. Its training data includes news, community discussion, encyclopedia content, and other broad Chinese text sources. The repository also describes whole word masking for Chinese and provides examples for loading and fine-tuning models on sentence-pair matching tasks. Overall, it is a useful pretrained model resource for developers who want stronger Chinese BERT-style representations for classification, matching, reading comprehension, and related NLP tasks.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Cut Data Warehouse Costs by 54% Icon
    Cut Data Warehouse Costs by 54%

    Easily migrate from Snowflake, Redshift, or Databricks with free tools.

    BigQuery delivers 54% lower TCO with exabyte scale and flexible pricing. Free migration tools handle the SQL translation automatically.
    Try Free
  • 10
    TimeSformer

    TimeSformer

    The official pytorch implementation of our paper

    TimeSformer is a vision transformer architecture for video that extends the standard attention mechanism into spatiotemporal attention. The model alternates attention along spatial and temporal dimensions (or designs variants like divided attention) so that it can capture both appearance and motion cues in video. Because the attention is global across frames, TimeSformer can reason about dependencies across long time spans, not just local neighborhoods. The official implementation in PyTorch...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Llama-3.2-1B-Instruct

    Llama-3.2-1B-Instruct

    Instruction-tuned 1.2B LLM for multilingual text generation by Meta

    ...The model supports eight primary languages (including English, Spanish, Hindi, and Thai) and was trained on a curated mix of publicly available online data, with a December 2023 knowledge cutoff. Llama-3.2-1B is lightweight enough for deployment on constrained devices like smartphones, using formats like SpinQuant and QLoRA to reduce model size and latency. Despite its small size, it performs competitively across benchmarks such as MMLU, ARC, and TLDR summarization. The model is distributed under the Llama 3.2 Community License, requiring attribution and adherence to Meta’s Acceptable Use Policy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    GLM-4.5-Air

    GLM-4.5-Air

    Compact hybrid reasoning language model for intelligent responses

    ...Open-sourced under the MIT license, it is commercially usable and integrates with transformers, vLLM, and SGLang inference frameworks. It includes FP8 variants for faster inference and reduced memory requirements. Despite its smaller size compared to full GLM-4.5, GLM-4.5-Air maintains high performance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    t5-small

    t5-small

    T5-Small: Lightweight text-to-text transformer for NLP tasks

    ...With only 60 million parameters, T5-Small is compact and suitable for fast inference or deployment in constrained environments. It was pretrained on the C4 dataset using both unsupervised denoising and supervised learning on tasks like sentiment analysis, NLI, and QA. Despite its size, it performs competitively across 24 NLP benchmarks, making it a strong candidate for prototyping and fine-tuning. T5-Small is compatible with major deep learning frameworks including PyTorch, TensorFlow, JAX, and ONNX. The model is open-source under the Apache 2.0 license and has wide support across Hugging Face's ecosystem.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Llama-3.2-1B

    Llama-3.2-1B

    Llama 3.2–1B: Multilingual, instruction-tuned model for mobile AI

    ...The model supports eight officially listed languages (including Spanish, German, Hindi, and Thai) but can be adapted to more. Llama 3.2-1B outperforms other open models in several benchmarks relative to its size and offers quantized versions for efficiency. It uses a refined transformer architecture with Grouped-Query Attention (GQA) and supports long context windows of up to 128k tokens.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    bart-large-cnn

    bart-large-cnn

    Summarization model fine-tuned on CNN/DailyMail articles

    ...Its architecture allows it to model both language understanding and generation tasks effectively. The model supports usage in PyTorch, TensorFlow, and JAX, and is integrated with the Hugging Face pipeline API for simple deployment. Due to its size and performance, it's widely used in real-world summarization applications such as news aggregation, legal document condensing, and content creation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    bge-base-en-v1.5

    bge-base-en-v1.5

    Efficient English embedding model for semantic search and retrieval

    bge-base-en-v1.5 is an English sentence embedding model from BAAI optimized for dense retrieval tasks, part of the BGE (BAAI General Embedding) family. It is a fine-tuned BERT-based model designed to produce high-quality, semantically meaningful embeddings for tasks like semantic similarity, information retrieval, classification, and clustering. This version (v1.5) improves retrieval performance and stabilizes similarity score distribution without requiring instruction-based prompts. With...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DiffusionGemma

    DiffusionGemma

    NVFP4 DiffusionGemma model for fast multimodal text generation

    ...The model supports a 256K-token context window, configurable thinking mode, native function calling, structured JSON output, and multilingual inference across 35+ languages. The NVFP4 quantization reduces weights and activations from 16-bit to 4-bit, lowering disk size and GPU memory needs for vLLM deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Gemma 4 12B

    Gemma 4 12B

    Unified multimodal Gemma model for local coding and reasoning

    ...The model has 11.95B parameters, 48 layers, a 256K-token context window, and support for over 140 languages. It also includes configurable thinking modes, native system prompt support, function calling, and strong benchmark performance for its size. It is optimized for consumer GPUs, workstations, and streamlined local deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    GigaChat 3 Ultra

    GigaChat 3 Ultra

    High-performance MoE model with MLA, MTP, and multilingual reasoning

    GigaChat 3 Ultra is a flagship instruct-model built on a custom Mixture-of-Experts architecture with 702B total and 36B active parameters. It leverages Multi-head Latent Attention to compress the KV cache into latent vectors, dramatically reducing memory demand and improving inference speed at scale. The model also employs Multi-Token Prediction, enabling multi-step token generation in a single pass for up to 40% faster output through speculative and parallel decoding techniques. Its...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Jan-v1-edge

    Jan-v1-edge

    Jan-v1-edge: efficient 1.7B reasoning model optimized for edge devices

    Jan-v1-edge is a lightweight agentic language model developed by JanHQ, designed for fast and reliable on-device execution. It is the second release in the Jan Family and was distilled from the larger Jan-v1 model, retaining strong reasoning and problem-solving capabilities while reducing its computational footprint. The model was refined through a two-stage post-training process: Supervised Fine-Tuning (SFT) to transfer knowledge from Jan-v1, followed by Reinforcement Learning with...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Qwen-Image-Edit

    Qwen-Image-Edit

    An advanced bilingual image editing with semantic control

    ...The model excels at semantic edits like style transfer, object rotation, and novel view synthesis, while also handling precise appearance edits such as adding or removing elements without altering surrounding regions. A standout feature is its bilingual text editing in English and Chinese, which preserves original font, size, and style during modifications. Benchmarks confirm its state-of-the-art performance in image editing, establishing it as a reliable foundation for both artistic and practical tasks. Its applications span IP creation, meme generation, background changes, clothing edits, and fine corrections in artworks or calligraphy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Bio_ClinicalBERT

    Bio_ClinicalBERT

    ClinicalBERT model trained on MIMIC notes for clinical NLP tasks

    ...The training focused on improving performance in tasks like named entity recognition and natural language inference within the healthcare domain. Notes were processed using rule-based sectioning and tokenized with SciSpacy. Training was done for 150,000 steps using a batch size of 32, max sequence length of 128, and a masked language modeling objective with a 0.15 mask probability. Bio_ClinicalBERT is available through Hugging Face's Transformers library for easy integration. It supports medical AI research and applications involving electronic health record understanding, clinical decision support, and biomedical information extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Mistral Large 3 675B Base 2512

    Mistral Large 3 675B Base 2512

    Frontier-scale 675B multimodal base model for custom AI training

    Mistral Large 3 675B Base 2512 is the foundational, pre-trained version of the Mistral Large 3 family, built as a frontier-scale multimodal Mixture-of-Experts model with 41B active parameters and a total size of 675B. It is trained from scratch using 3000 H200 GPUs, making it one of the most advanced and compute-intensive open-weight models available. As the base version, it is not fine-tuned for instruction following or reasoning, making it ideal for teams planning their own domain-specific finetuning or custom training pipelines. The model is engineered for reliability, long-context comprehension, and stable performance across many enterprise, scientific, and knowledge-intensive workloads. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Ministral 3 3B Base 2512

    Ministral 3 3B Base 2512

    Small 3B-base multimodal model ideal for custom AI on edge hardware

    ...It supports dozens of languages, making it practical for multilingual, global, or distributed environments. With a large 256k token context window, it can handle long documents, extended inputs, or multi-step processing workflows even at its small size.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Ministral 3 3B Instruct 2512

    Ministral 3 3B Instruct 2512

    Ultra-efficient 3B multimodal instruct model built for edge deployment

    ...As an FP8 instruct-fine-tuned model, it is optimized for chat, instruction following, and compact agentic tasks while maintaining strong adherence to system prompts. Despite its small size, it delivers efficient real-time performance and can run locally on a single 8GB GPU, with further memory reductions through quantization. It supports dozens of languages across major global regions, making it well-suited for multilingual and embedded applications. The model also provides function calling, clean JSON output, and stable tool-use behavior, enabling it to serve as a small but effective agentic system.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo