Showing 28 open source projects for "tensorrt"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Hunyuan-A13B-Instruct

    Hunyuan-A13B-Instruct

    Efficient 13B MoE language model with long context and reasoning modes

    ...It excels in mathematics, science, coding, and multi-turn conversation tasks, rivaling or outperforming larger models in several areas. Deployment is supported via TensorRT-LLM, vLLM, and SGLang, with Docker images and integration guides provided. Open-source under a custom license, it's ideal for researchers and developers seeking scalable, high-context AI capabilities with optimized inference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    GigaChat 3 Ultra

    GigaChat 3 Ultra

    High-performance MoE model with MLA, MTP, and multilingual reasoning

    GigaChat 3 Ultra is a flagship instruct-model built on a custom Mixture-of-Experts architecture with 702B total and 36B active parameters. It leverages Multi-head Latent Attention to compress the KV cache into latent vectors, dramatically reducing memory demand and improving inference speed at scale. The model also employs Multi-Token Prediction, enabling multi-step token generation in a single pass for up to 40% faster output through speculative and parallel decoding techniques. Its...
    Downloads: 0 This Week
    Last Update:
    See Project