• Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • Host LLMs in Production With On-Demand GPUs Icon
    Host LLMs in Production With On-Demand GPUs

    NVIDIA L4 GPUs. 5-second cold starts. Scale to zero when idle.

    Deploy your model, get an endpoint, pay only for compute time. No GPU provisioning or infrastructure management required.
    Try Free
  • 1
    GLM-V

    GLM-V

    GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning

    GLM-V is an open-source vision-language model (VLM) series from ZhipuAI that extends the GLM foundation models into multimodal reasoning and perception. The repository provides both GLM-4.5V and GLM-4.1V models, designed to advance beyond basic perception toward higher-level reasoning, long-context understanding, and agent-based applications.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    llama.cpp

    llama.cpp

    LLM inference in C/C++

    ...It provides command-line tools, a server mode with an OpenAI-compatible API style, model conversion utilities, and extensive backend acceleration options. llama.cpp runs on CPUs and GPUs, with support for Apple silicon, x86, RISC-V, CUDA, HIP, Vulkan, SYCL, Metal, and hybrid CPU-GPU execution. Its main value is making practical LLM inference accessible across consumer machines, servers, and specialized deployment environments.
    Downloads: 29 This Week
    Last Update:
    See Project
  • 3
    ChatGLM.cpp

    ChatGLM.cpp

    C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

    ChatGLM.cpp is a C++ implementation of the ChatGLM-6B model, enabling efficient local inference without requiring a Python environment. It is optimized for running on consumer hardware.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    tt-metal

    tt-metal

    TT-NN operator library, and TT-Metalium low level kernel programming

    ...Instead of following a traditional GPU model centered on massive thread parallelism, the platform is built around a grid of specialized compute nodes called Tensix cores, each with local SRAM, dedicated compute units, and multiple RISC-V control processors. The SDK provides the abstractions and APIs needed to manage data movement, compute kernels, memory coordination, and execution flow across this architecture.
    Downloads: 3 This Week
    Last Update:
    See Project
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 5
    Learn Prompting

    Learn Prompting

    This website is a free, open-source guide on prompt engineering

    This website is a free, open-source guide on prompt engineering. Contributions are welcome! Harsh criticism is welcome too. We launched the first ever prompt hacking competition designed to enhance AI safety and education by challenging participants to outsmart large language models from May 5th to June 3rd! The competition featured 10 increasingly difficult levels of prompt hacking defenses and the chance to win over $35,000 in prizes. Coding is a great skill to learn alongside prompt...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    DeepSeek-V4-Flash

    DeepSeek-V4-Flash

    Efficient MoE model for million-token reasoning and coding

    ...The model uses a hybrid attention architecture that combines Compressed Sparse Attention and Heavily Compressed Attention to improve long-context efficiency, while Manifold-Constrained Hyper-Connections strengthen signal stability across layers. It is trained on more than 32T tokens and refined through a post-training pipeline that includes supervised fine-tuning, reinforcement learning, domain-specific expert cultivation, and on-policy distillation. DeepSeek-V4-Flash supports non-think, think, and think-max reasoning modes, allowing users to balance speed and depth. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo