• MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • 1
    TokenSpeed

    TokenSpeed

    TokenSpeed is a speed-of-light LLM inference engine

    TokenSpeed is an LLM inference engine designed for high-performance production agent workloads. It aims to combine TensorRT-LLM-level speed with vLLM-level usability, making it relevant for teams that need fast generation without sacrificing developer ergonomics. The project is focused on the specific needs of agentic systems, where latency, throughput, and efficient scheduling matter across many short or tool-heavy requests. It builds on ideas and components from the broader open-source inference ecosystem while presenting its own execution stack. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    ChatGLM3

    ChatGLM3

    ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat

    ...The repo ships Python APIs, CLI and web demos (Gradio/Streamlit), an OpenAI-format API server, and a compact fine-tuning kit. Quantization (4/8-bit), CPU/MPS support, and accelerator backends (TensorRT-LLM, OpenVINO, chatglm.cpp) enable lightweight local or edge deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Infinity

    Infinity

    Low-latency REST API for serving text-embeddings

    Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. Infinity is developed under MIT License. Infinity powers inference behind Gradient.ai and other Embedding API providers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next