Search Results for "tensorrt"
Sort By:
TokenSpeed is a speed-of-light LLM inference engine
ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat
Low-latency REST API for serving text-embeddings
Mooncake is the serving platform for Kimi