• Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    ChatGLM-6B

    ChatGLM-6B

    ChatGLM-6B: An Open Bilingual Dialogue Language Model

    ...It is optimized for dialogue and question answering with a balance between performance and deployability in consumer hardware settings. Support for quantized inference (INT4, INT8) to reduce GPU memory requirements. Automatic mode switching between precision/memory tradeoffs (full/quantized).
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    Tencent-Hunyuan-Large

    Tencent-Hunyuan-Large

    Open-source large language model family from Tencent Hunyuan

    ...It is designed with long-context capabilities, quantization support, and high performance on benchmarks across general reasoning, mathematics, language understanding, and Chinese / multilingual tasks. It aims to provide competitive capability with efficient deployment and inference. FP8 quantization support to reduce memory usage (~50%) while maintaining precision. High benchmarking performance on tasks like MMLU, MATH, CMMLU, C-Eval, etc.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Paper2Slides

    Paper2Slides

    From Paper to Presentation in One Click

    ...It uses an extraction approach intended to capture critical insights comprehensively, including important visuals and data points that often get missed in naive summarization. A major focus is traceability: generated slide content is designed to remain linked back to the source material so you can verify accuracy and reduce information drift. It also offers styling flexibility, letting you use built-in themes or describe a custom design direction in natural language for themed outputs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    LLaMA-Factory

    LLaMA-Factory

    Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

    LLaMA-Factory is a fine-tuning and training framework for Meta's LLaMA language models. It enables researchers and developers to train and customize LLaMA models efficiently using advanced optimization techniques.
    Downloads: 4 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Qwen3-Omni

    Qwen3-Omni

    Qwen3-omni is a natively end-to-end, omni-modal LLM

    ...It achieves state-of-the-art results: across 36 audio and audio-visual benchmarks, it hits open-source SOTA on 32 and overall SOTA on 22, outperforming or matching strong closed-source models such as Gemini-2.5 Pro and GPT-4o. To reduce latency, especially in audio/video streaming, Talker predicts discrete speech codecs via a multi-codebook scheme and replaces heavier diffusion approaches.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    MatMul-Free LM

    MatMul-Free LM

    Implementation for MatMul-free LM

    ...Since matrix multiplication is one of the most computationally expensive components of modern language models, the project explores alternative computational strategies that reduce hardware requirements while maintaining comparable performance. The architecture relies on quantization-aware training and lightweight operations to replace conventional dense matrix multiplications with more efficient alternatives. These optimizations can significantly reduce memory consumption and potentially improve computational efficiency during both training and inference. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Qwen

    Qwen

    The official repo of Qwen chat & pretrained large language model

    Qwen is a series of large language models developed by Alibaba Cloud, consisting of various pretrained versions like Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B. These models, which range from smaller to larger configurations, are designed for a wide range of natural language processing tasks. They are openly available for research and commercial use, with Qwen's code and model weights shared on GitHub. Qwen's capabilities include text generation, comprehension, and conversation, making it a...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 8
    WhisperJAV

    WhisperJAV

    Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

    WhisperJAV is an open-source speech transcription pipeline designed specifically for generating subtitles for Japanese adult video content. The project addresses challenges that standard speech recognition models face when transcribing this type of audio, which often includes low signal-to-noise ratios and large numbers of non-verbal vocalizations. Traditional automatic speech recognition systems can misinterpret these sounds as words, leading to inaccurate transcripts. WhisperJAV introduces...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    LLM Council

    LLM Council

    LLM Council works together to answer your hardest questions

    ...After this peer-review process, a designated “Chairman” model synthesizes a final consolidated answer drawing on the strengths and insights of all participants. The interface looks like a familiar chat app but under the hood it implements this ensemble and consensus workflow to reduce bias and leverage diverse reasoning styles.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    CogVLM

    CogVLM

    A state-of-the-art open visual language model

    ...The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490×490 inputs; CogAgent-18B extends this to 1120×1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks. The repo provides multiple ways to run models (CLI, web demo, and OpenAI-Vision–style APIs), along with quantization options that reduce VRAM needs (e.g., 4-bit). It includes checkpoints for chat, base, and grounding variants, plus recipes for model-parallel inference and LoRA fine-tuning. The documentation covers task prompts for general dialogue, visual grounding (box→caption, caption→box, caption+boxes), and GUI agent workflows that produce structured actions with bounding boxes.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    Cybersecurity AI

    Cybersecurity AI

    Cybersecurity AI (CAI), the framework for AI Security

    ...Rather than being a single-purpose tool, CAI is positioned as a framework that supports building multiple security automations and integrating them into existing processes. It is designed for real-world usability, aiming to reduce friction for teams experimenting with AI agents in security operations, assessment, and response contexts. The framework emphasizes extensibility so users can connect models, tools, and supporting components depending on their environment and constraints.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    BruteForceAI

    BruteForceAI

    Advanced LLM-powered brute-force tool combining AI intelligence

    BruteForceAI is an open-source security testing tool that applies large language models to the analysis of login forms and authentication flows in web applications. At a high level, the project uses AI to inspect HTML content, identify the relevant form elements, and automate selector discovery so that a tester does not need to hand-map every field before evaluation. It combines that analysis layer with automated credential testing workflows, framing itself as a more adaptive alternative to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Chitu

    Chitu

    High-performance inference framework for large language models

    ...Chitu is designed to scale from small single-machine deployments to large distributed clusters that handle high volumes of concurrent inference requests. The system also includes performance optimizations for large models, including support for quantized formats and efficient computation operators that reduce memory usage and latency. Its architecture aims to support enterprise adoption by ensuring stable long-term operation under production workloads.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    MetaScreener

    MetaScreener

    AI-powered tool for efficient abstract and PDF screening

    ...The platform can analyze both abstracts and full PDF documents, enabling automated filtering based on research criteria defined by the user. By incorporating natural language processing techniques, the system can identify potentially relevant studies and reduce the workload associated with manual screening.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    SageAttention

    SageAttention

    NeurIPS2025 Spotlight] Quantized Attention

    ...Since attention operations are often the most computationally expensive component of modern AI models, SageAttention introduces quantization techniques that significantly reduce computational overhead while preserving model accuracy. The system achieves this by using low-precision numerical formats such as INT4, FP8, or INT8 to represent key matrices within the attention computation. These optimizations allow models to perform matrix operations faster and consume less memory during inference. SageAttention is designed to function as a plug-and-play replacement for standard attention implementations, enabling developers to accelerate existing models without modifying their architecture.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    KVCache-Factory

    KVCache-Factory

    Unified KV Cache Compression Methods for Auto-Regressive Models

    ...In large language models, the key-value cache stores intermediate attention states that enable efficient token generation during inference, but these caches can consume large amounts of GPU memory when handling long contexts. KVCache-Factory provides a platform for implementing and evaluating multiple compression strategies that reduce memory usage while preserving model performance. The framework integrates several state-of-the-art methods such as PyramidKV, SnapKV, H2O, and StreamingLLM, allowing researchers to compare and experiment with different approaches within the same environment. It also supports advanced inference configurations such as Flash Attention v2 and multi-GPU inference setups for very large models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Claude Code Bridge

    Claude Code Bridge

    Real-time multi-AI collaboration: Claude, Codex & Gemini

    Claude Code Bridge is an open-source command-line tool designed to enable real-time collaboration between multiple AI coding assistants within a unified development environment. The system allows developers to coordinate interactions between models such as Claude, Codex, and Gemini so that they can work together on programming tasks. By maintaining persistent shared context between these models, the tool reduces redundant prompts and minimizes token usage while allowing each AI system to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    CAG

    CAG

    Cache-Augmented Generation: A Simple, Efficient Alternative to RAG

    ...This strategy allows the model to generate responses using the cached context directly, eliminating the need for repeated retrieval operations during runtime. As a result, the approach can significantly reduce latency and simplify system architecture compared with traditional RAG pipelines. The framework is particularly effective when the knowledge base is limited enough to fit within the extended context window of modern language models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Magicoder

    Magicoder

    Empowering Code Generation with OSS-Instruct

    ...This technique uses open-source code repositories as a foundation for generating more realistic and diverse instruction datasets for training language models. By grounding training data in real open-source examples, Magicoder aims to reduce bias and improve the reliability of code generation results compared to models trained solely on synthetic instructions. The project includes model implementations, training resources, and evaluation benchmarks that demonstrate how the approach improves instruction-following and code synthesis capabilities. Magicoder models are intended for tasks such as programming assistance, code explanation, automated debugging, and software documentation generation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Torch Pruning

    Torch Pruning

    DepGraph: Towards Any Structural Pruning

    Torch-Pruning is an open-source toolkit designed to optimize deep neural networks by performing structural pruning directly within PyTorch models. The library focuses on reducing the size and computational cost of neural networks by removing redundant parameters and channels while maintaining model performance. It introduces a graph-based algorithm called DepGraph that automatically identifies dependencies between layers, allowing parameters to be pruned safely across complex architectures....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    LightLLM

    LightLLM

    LightLLM is a Python-based LLM (Large Language Model) inference

    ...Built primarily in Python, the project integrates optimization techniques and ideas from several leading open-source implementations, including FasterTransformer, vLLM, and FlashAttention, to accelerate token generation and reduce latency. LightLLM is designed to handle large-scale model workloads in production environments, supporting efficient batching and GPU utilization for fast inference across multiple requests. Its architecture allows models to be deployed with minimal overhead while maintaining compatibility with popular transformer-based model families such as LLaMA and GPT-style architectures.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Xtuner

    Xtuner

    A Next-Generation Training Engine Built for Ultra-Large MoE Models

    ...The framework focuses on enabling scalable training for extremely large models while maintaining efficiency across distributed computing environments. Unlike traditional 3D parallel training strategies, XTuner introduces optimized parallelism techniques that simplify scaling and reduce system complexity when training massive models. The engine supports training models with hundreds of billions of parameters and enables long-context training with sequence lengths reaching tens of thousands of tokens. Its architecture incorporates memory-efficient optimizations that allow researchers to train large models even when computational resources are limited. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    bitsandbytes

    bitsandbytes

    Accessible large language models via k-bit quantization for PyTorch

    ...These optimizations enable large language models and other deep learning architectures to run on hardware with limited memory resources, including consumer-grade GPUs. The project includes specialized optimizers and quantized matrix operations that significantly reduce the memory footprint of training and inference workloads. By lowering the hardware requirements needed to work with large models, bitsandbytes helps make modern AI development more accessible to researchers and engineers. The library has become widely used in machine learning pipelines that rely on parameter-efficient training techniques and low-precision inference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Purple Llama

    Purple Llama

    Set of tools to assess and improve LLM security

    ...CyberSecEval, one of its flagship components, provides repeatable evaluations for security risk, including agent-oriented tasks such as automated patching benchmarks. The aim is to make safety practical: ship testable baselines, publish metrics, and provide drop-in implementations that reduce friction for teams adopting Llama. Documentation and sites attached to the repo walk through setup, usage, and the rationale behind each safeguard, encouraging community contributions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB