routing free download

OpenAI Forward

An efficient forwarding service designed for LLMs

...Its main purpose is to make model access more manageable and efficient by adding operational controls such as request rate limiting, token rate limiting, caching, logging, routing, and key management around existing LLM endpoints. The project can proxy both local and cloud-hosted language model services, which makes it useful for teams that want a single control layer regardless of whether they are using something like LocalAI or a hosted provider compatible with OpenAI-style APIs. A major emphasis of the repository is asynchronous performance, using tools such as uvicorn, aiohttp, and asyncio to support high-throughput forwarding workloads.

Downloads: 0 This Week

Last Update: 2026-03-10

See Project

MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

...Instead of forcing each token to attend to every other token in the sequence, MoBA divides the context into blocks and dynamically routes queries to only the most relevant segments of information. This routing strategy reduces the computational cost associated with traditional attention while preserving performance on reasoning and long-context tasks. The approach allows language models to scale to significantly longer input contexts without the quadratic computational cost normally associated with transformer attention mechanisms.

Downloads: 0 This Week

Last Update: 2026-03-06

See Project

Parallax

Parallax is a distributed model serving framework

Parallax is a decentralized inference framework designed to run large language models across distributed computing resources. Instead of relying on centralized GPU clusters in data centers, the system allows multiple heterogeneous machines to collaborate in serving AI inference workloads. Parallax divides model layers across different nodes and dynamically coordinates them to form a complete inference pipeline. A two-stage scheduling architecture determines how model layers are allocated to...

Downloads: 0 This Week

Last Update: 2026-03-09

See Project

MiniMax-01

Large-language-model & vision-language-model based on Linear Attention

MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel strategies such as LASP+, varlen ring attention, and Expert Tensor Parallelism, enabling a training context of 1 million tokens and up to 4 million tokens at inference. ...

Downloads: 1 This Week

Last Update: 2025-12-01

See Project

LLaMA-MoE

Building Mixture-of-Experts from LLaMA with Continual Pre-training

LLaMA-MoE is an open-source project that builds mixture-of-experts language models from LLaMA through expert partitioning and continual pre-training. The repository is centered on making MoE research more accessible by offering smaller and more affordable models with only about 3.0 to 3.5 billion activated parameters, which helps reduce deployment and experimentation costs. Its architecture works by splitting LLaMA feed-forward networks into sparse experts and adding gating mechanisms so...

Downloads: 0 This Week

Last Update: 2026-03-10

See Project

Search Results for "routing"

Showing 5 open source projects for "routing"

OpenAI Forward

MoBA

Parallax

MiniMax-01

LLaMA-MoE

Search Results for "routing"

Showing 5 open source projects for "routing"

OpenAI Forward

MoBA

Parallax

MiniMax-01

LLaMA-MoE

Related Categories