Page 2 | token free download

Nano-vLLM

A lightweight vLLM implementation built from scratch

Nano-vLLM is a lightweight implementation of the vLLM inference engine designed to run large language models efficiently while maintaining a minimal and readable codebase. The project recreates the core functionality of vLLM in a simplified architecture written in approximately a thousand lines of Python, making it easier for developers and researchers to understand how modern LLM inference systems work. Despite its compact design, nano-vllm incorporates advanced optimization techniques such...

Downloads: 0 This Week

Last Update: 2026-04-26

See Project

LLM TLDR

95% token savings. 155x faster queries. 16 languages

LLM TLDR is a tool that leverages large language models (LLMs) to generate concise, coherent summaries (TL;DRs) of long documents, articles, or text files, helping users quickly understand large amounts of content without reading every word. It integrates with LLM APIs to handle input texts of varying lengths and complexity, applying techniques like chunking, context management, and multi-pass summarization to preserve accuracy even when the source is very large. The system supports both...

Downloads: 0 This Week

Last Update: 2026-01-27

See Project

Grok-1

Open-source, high-performance Mixture-of-Experts large language model

Grok-1 is a 314-billion-parameter Mixture-of-Experts (MoE) large language model developed by xAI. Designed to optimize computational efficiency, it activates only 25% of its weights for each input token. In March 2024, xAI released Grok-1's model weights and architecture under the Apache 2.0 license, making them openly accessible to developers. The accompanying GitHub repository provides JAX example code for loading and running the model. Due to its substantial size, utilizing Grok-1 requires a machine with significant GPU memory. ...

1 Review

Downloads: 32 This Week

Last Update: 2025-02-27

See Project

autollm

Ship RAG based LLM web apps in seconds

autollm is an open-source Python framework designed to make it much faster to build retrieval-augmented generation applications and expose them as usable services with minimal setup. The project focuses on simplifying the usual stack of model selection, document ingestion, vector storage, querying, and API deployment into a more unified developer experience. Its core idea is that a developer can create a query engine from a document set in just a few lines and then turn that same engine into...

Downloads: 0 This Week

Last Update: 2026-03-10

See Project

Mixtral offloading

Run Mixtral-8x7B models in Colab or consumer desktops

...The project implements techniques that allow model components to be dynamically moved between CPU memory and GPU memory during inference, significantly reducing the amount of GPU VRAM required to run the model. This approach takes advantage of the sparse activation properties of mixture-of-experts architectures, where only a subset of expert networks are used for each token during generation. By selectively loading and caching the required experts, the system avoids keeping the entire model in GPU memory at once. The repository includes notebooks and code examples that demonstrate how to run large language models on consumer hardware such as personal GPUs or cloud notebook environments.

Downloads: 0 This Week

Last Update: 2026-03-06

See Project

Petals

Run 100B+ language models at home, BitTorrent-style

...Run large language models like BLOOM-176B collaboratively — you load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning. Single-batch inference runs at ≈ 1 sec per step (token) — up to 10x faster than offloading, enough for chatbots and other interactive apps. Parallel inference reaches hundreds of tokens/sec. Beyond classic language model APIs — you can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of PyTorch. ...

Downloads: 6 This Week

Last Update: 2023-09-06

See Project

Repo of Tree of Thoughts (ToT)

Implementation of "Tree of Thoughts

Language models are increasingly being deployed for general problem-solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem-solving. ...

Downloads: 0 This Week

Last Update: 2023-08-21

See Project

Search Results for "token" - Page 2

Showing 32 open source projects for "token"

Nano-vLLM

LLM TLDR

Grok-1

autollm

Mixtral offloading

Petals

Repo of Tree of Thoughts (ToT)

Search Results for "token" - Page 2

Showing 32 open source projects for "token"

Nano-vLLM

LLM TLDR

Grok-1

autollm

Mixtral offloading

Petals

Repo of Tree of Thoughts (ToT)

Related Searches

Related Categories