Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Large Language Models (LLM)
Search Results

Search Results for "token"

x

Sort By:

Relevance

Clear All Filters

OS

Windows 32
Linux 31
Mac 31
More...
BSD 22
ChromeOS 22
Mobile Operating Systems 2

Category

Artificial Intelligence 32

License

OSI-Approved Open Source 30

Translations

English 1

Programming Language

Python 32

Showing 32 open source projects for "token"

View related business solutions

Large Language Models (LLM) Python Clear Filters & Widen Search

$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
1

DeepSeek-V3

Powerful AI language model (MoE) optimized for efficiency/performance

DeepSeek-V3 is a robust Mixture-of-Experts (MoE) language model developed by DeepSeek, featuring a total of 671 billion parameters, with 37 billion activated per token. It employs Multi-head Latent Attention (MLA) and the DeepSeekMoE architecture to enhance computational efficiency. The model introduces an auxiliary-loss-free load balancing strategy and a multi-token prediction training objective to boost performance. Trained on 14.8 trillion diverse, high-quality tokens, DeepSeek-V3 underwent supervised fine-tuning and reinforcement learning to fully realize its capabilities. ...

1 Review

Downloads: 64 This Week

Last Update: 2025-07-09
See Project
2

TokenCost

Easy token price estimates for 400+ LLMs. TokenOps

TokenCost is an open-source developer utility designed to estimate the cost of using large language model APIs by calculating token usage and translating it into real monetary values. The tool focuses on helping developers understand how much their prompts and generated completions cost when interacting with commercial AI models. It works by counting tokens in prompts and responses before or after sending requests and then applying pricing information associated with different models. ...

Downloads: 3 This Week

Last Update: 2026-03-06
See Project
3

DeepSeek R1

Open-source, high-performance AI model with advanced reasoning

...DeepSeek R1 offers unrestricted access for both commercial and academic use. The model employs a Mixture of Experts (MoE) architecture, comprising 671 billion total parameters with 37 billion active parameters per token, and supports a context length of up to 128,000 tokens. DeepSeek-R1's training regimen uniquely integrates large-scale reinforcement learning (RL) without relying on supervised fine-tuning, enabling the model to develop advanced reasoning capabilities. This approach has resulted in performance comparable to leading models like OpenAI's o1, while maintaining cost-efficiency. ...

1 Review

Downloads: 98 This Week

Last Update: 2025-07-09
See Project
4

Headroom

Compress tool outputs, logs, files, and RAG chunks

Headroom is a context optimization layer for LLM applications that compresses information before it reaches the model. It sits between an application and an LLM provider, intercepting requests and forwarding a shorter optimized prompt. The project is designed to reduce token usage while preserving the answer quality needed for agent workflows. It can compress tool outputs, logs, RAG chunks, files, and conversation history. Headroom can be used as a transparent proxy, a Python function, a TypeScript SDK, or through integrations with frameworks such as LangChain and LiteLLM. It is useful for teams building AI agents, research tools, or LLM products where context size, cost, and latency matter.

Downloads: 0 This Week

Last Update: 2 days ago
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

MiniOneRec

Minimal reproduction of OneRec

...The framework provides an end-to-end pipeline for building generative recommender systems, including semantic identifier construction, supervised fine-tuning, and reinforcement learning-based optimization. Semantic IDs are created using techniques such as quantized variational autoencoders to convert item features into token sequences that can be modeled by transformer architectures. Developers can train and evaluate recommendation models using different backbone language models while benefiting from the generative framework’s parameter efficiency and scalability.

Downloads: 1 This Week

Last Update: 2026-05-14
See Project
6

Claude Code Bridge

Real-time multi-AI collaboration: Claude, Codex & Gemini

...The system allows developers to coordinate interactions between models such as Claude, Codex, and Gemini so that they can work together on programming tasks. By maintaining persistent shared context between these models, the tool reduces redundant prompts and minimizes token usage while allowing each AI system to contribute specialized capabilities. The architecture functions as a unified launcher that manages communication between multiple AI providers and coordinates their responses within the same development session. Developers can run the tool in terminal environments and integrate it with terminal multiplexers such as tmux or advanced terminal emulators.

Downloads: 2 This Week

Last Update: 7 hours ago
See Project
7

MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

MoBA, short for Mixture of Block Attention, is an open-source research implementation of a novel attention mechanism designed to improve the efficiency of large language models processing extremely long contexts. The architecture adapts ideas from Mixture-of-Experts networks and applies them directly to the attention mechanism of transformer models. Instead of forcing each token to attend to every other token in the sequence, MoBA divides the context into blocks and dynamically routes queries to only the most relevant segments of information. This routing strategy reduces the computational cost associated with traditional attention while preserving performance on reasoning and long-context tasks. The approach allows language models to scale to significantly longer input contexts without the quadratic computational cost normally associated with transformer attention mechanisms.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
8

Kaleidoscope-SDK

User toolkit for analyzing and interfacing with Large Language Models

...Users must authenticate using their Vector Institute cluster credentials. This can be done interactively instantiating a client object. This will generate an authentication token that will be used for all subsequent requests. The token will expire after 30 days, at which point the user will be prompted to re-authenticate.

Downloads: 0 This Week

Last Update: 2024-07-10
See Project
9

OpenAI Forward

An efficient forwarding service designed for LLMs

OpenAI Forward is an open-source forwarding and reverse proxy service for large language model APIs, designed to sit between client applications and model providers. Its main purpose is to make model access more manageable and efficient by adding operational controls such as request rate limiting, token rate limiting, caching, logging, routing, and key management around existing LLM endpoints. The project can proxy both local and cloud-hosted language model services, which makes it useful for teams that want a single control layer regardless of whether they are using something like LocalAI or a hosted provider compatible with OpenAI-style APIs. ...

Downloads: 1 This Week

Last Update: 2026-03-10
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
10

Tongyi DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

...It’s built to act like a research agent: synthesizing, reasoning, retrieving information via the web and documents, and backing its outputs with evidence. The model is about 30.5 billion parameters in size, though at any given token only ~3.3B parameters are active. It uses a mix of synthetic data generation, fine-tuning and reinforcement learning; supports benchmarks like web search, document understanding, question answering, “agentic” tasks; provides inference tools, evaluation scripts, and “web agent” style interfaces. The aim is to enable more autonomous, agentic models that can perform sustained knowledge gathering, reasoning, and synthesis across multiple modalities (web, files, etc.).

Downloads: 1 This Week

Last Update: 2026-02-27
See Project
11

GLM-4.6

Agentic, Reasoning, and Coding (ARC) foundation models

GLM-4.6 is the latest iteration of Zhipu AI’s foundation model, delivering significant advancements over GLM-4.5. It introduces an extended 200K token context window, enabling more sophisticated long-context reasoning and agentic workflows. The model achieves superior coding performance, excelling in benchmarks and practical coding assistants such as Claude Code, Cline, Roo Code, and Kilo Code. Its reasoning capabilities have been strengthened, including improved tool usage during inference and more effective integration within agent frameworks. ...

Downloads: 52 This Week

Last Update: 2026-02-01
See Project
12

MiniMax-01

Large-language-model & vision-language-model based on Linear Attention

...MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel strategies such as LASP+, varlen ring attention, and Expert Tensor Parallelism, enabling a training context of 1 million tokens and up to 4 million tokens at inference. MiniMax-VL-01 extends this core by adding a 303M-parameter Vision Transformer and a two-layer MLP projector in a ViT–MLP–LLM framework, allowing the model to process images at dynamic resolutions up to 2016×2016.

Downloads: 2 This Week

Last Update: 2025-12-01
See Project
13

MiniMax-M1

Open-weight, large-scale hybrid-attention reasoning model

...Architecturally, it combines Mixture-of-Experts layers with lightning attention, enabling the model to support a native context length of 1 million tokens while using far fewer FLOPs than comparable reasoning models for very long generations. The team emphasizes efficient scaling of test-time compute: at 100K-token generation lengths, M1 reportedly uses only about 25 percent of the FLOPs of some competing models, making extended “think step” traces more feasible. M1 is further trained with large-scale reinforcement learning over diverse tasks.

Downloads: 1 This Week

Last Update: 2025-12-01
See Project
14

Qwen3

Qwen3 is the large language model series developed by Qwen team

Qwen3 is a cutting-edge large language model (LLM) series developed by the Qwen team at Alibaba Cloud. The latest updated version, Qwen3-235B-A22B-Instruct-2507, features significant improvements in instruction-following, reasoning, knowledge coverage, and long-context understanding up to 256K tokens. It delivers higher quality and more helpful text generation across multiple languages and domains, including mathematics, coding, science, and tool usage. Various quantized versions,...

1 Review

Downloads: 20 This Week

Last Update: 2026-01-09
See Project
15

uqlm

Uncertainty Quantification for Language Models, is a Python package

...The library includes both black-box and white-box approaches to uncertainty estimation. Black-box methods evaluate model outputs through multiple generations or comparative analysis, while white-box methods rely on token probabilities produced during inference. UQLM also supports ensemble strategies and model-as-judge approaches for evaluating responses. By combining multiple uncertainty metrics, the system provides more reliable indicators of when language model outputs may be unreliable.

Downloads: 8 This Week

Last Update: 5 days ago
See Project
16

GPUStack

Performance-optimized AI inference on your GPUs

GPUStack is an open-source GPU cluster management platform designed to simplify the deployment and operation of artificial intelligence models across heterogeneous hardware environments. The system aggregates GPU resources from multiple machines into a unified cluster so developers and administrators can run large language models and other AI workloads efficiently across distributed infrastructure. Instead of requiring complex orchestration systems such as Kubernetes, GPUStack provides a...

Downloads: 7 This Week

Last Update: 2026-04-21
See Project
17

FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs

...The platform enables developers to deploy trained models quickly using optimized inference pipelines that support GPUs, specialized AI accelerators, and other hardware architectures. FastDeploy includes advanced acceleration technologies such as speculative decoding, multi-token prediction, and efficient KV cache management to improve throughput and latency during inference. It also offers compatibility with OpenAI-style APIs and vLLM-like interfaces, allowing developers to integrate deployed models easily into existing applications and services.

Downloads: 6 This Week

Last Update: 2026-04-08
See Project
18

LLM Telegram Bot

A Telegram bot for Large Language Models

...The system supports multiple modes or personas, enabling users to switch between different conversational styles or use cases. It also allows fine-tuning of generation parameters such as temperature and token limits, giving users control over response behavior. The architecture is modular, making it easy to extend or adapt for different workflows or integrations.

Downloads: 2 This Week

Last Update: 2026-04-20
See Project
19

tiny-llm

A course of learning LLM inference serving on Apple Silicon

...The project is structured as a guided course that walks developers through the process of implementing the core components required to run a modern language model, including attention mechanisms, token generation, and optimization techniques. Rather than relying on high-level machine learning frameworks, the codebase uses mostly low-level array and matrix manipulation APIs so that developers can understand exactly how model inference works internally. The project demonstrates how to load and run models such as Qwen-style architectures while progressively implementing performance improvements like KV caching, request batching, and optimized attention mechanisms. ...

Downloads: 5 This Week

Last Update: 7 days ago
See Project
20

LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference

...Built primarily in Python, the project integrates optimization techniques and ideas from several leading open-source implementations, including FasterTransformer, vLLM, and FlashAttention, to accelerate token generation and reduce latency. LightLLM is designed to handle large-scale model workloads in production environments, supporting efficient batching and GPU utilization for fast inference across multiple requests. Its architecture allows models to be deployed with minimal overhead while maintaining compatibility with popular transformer-based model families such as LLaMA and GPT-style architectures.

Downloads: 2 This Week

Last Update: 2026-03-05
See Project
21

TokenSpeed

TokenSpeed is a speed-of-light LLM inference engine

...TokenSpeed is useful for developers building local or server-side LLM infrastructure for agents, coding systems, and high-volume AI applications. Its main value is providing an inference layer optimized for fast token generation under practical agent workloads.

Downloads: 1 This Week

Last Update: 3 days ago
See Project
22

KIS Open API

Korea Investment & Securities Open API Github

The open-trading-api repository from Korea Investment & Securities provides sample code and developer resources for interacting with the KIS Developers Open Trading API, which enables programmatic access to financial market data and automated trading functionality. The project is designed primarily for Python developers and AI automation environments that want to build investment applications, algorithmic trading systems, or financial analytics tools using the brokerage’s infrastructure. It...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
23

KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

KVCache-Factory is an open-source research framework designed to explore and implement unified key-value cache compression techniques for autoregressive transformer models. In large language models, the key-value cache stores intermediate attention states that enable efficient token generation during inference, but these caches can consume large amounts of GPU memory when handling long contexts. KVCache-Factory provides a platform for implementing and evaluating multiple compression strategies that reduce memory usage while preserving model performance. The framework integrates several state-of-the-art methods such as PyramidKV, SnapKV, H2O, and StreamingLLM, allowing researchers to compare and experiment with different approaches within the same environment. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
24

TAME LLM

Traditional Mandarin LLMs for Taiwan

TAME LLM is an open-source initiative focused on building and releasing large language models optimized for Traditional Mandarin and the linguistic context of Taiwan. The project includes models such as Llama-3-Taiwan-70B, which are fine-tuned versions of large transformer architectures trained on extensive corpora containing both Traditional Mandarin and English text. These models are designed to support applications such as conversational AI, knowledge retrieval, and domain-specific...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
25

LlamaGen

Autoregressive Model Beats Diffusion

LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation. The project explores how scaling autoregressive models and improving image tokenization techniques can produce competitive results compared with modern diffusion-based image generators. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project

Previous
You're on page 1
2
Next

Related Searches

deepseek

rivals

deepseek-r1-distill-qwen-1.5b

institute

glm 4.6

android

rivals mobile aimbot

rivals mac aimbot

nvidia deepseek-r1

nvidia

Related Categories

Artificial Intelligence

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise