111 projects for "token" with 2 filters applied:

  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    Token-Oriented Object Notation

    Token-Oriented Object Notation

    Token-Oriented Object Notation (TOON)

    Token-Oriented Object Notation is an open specification and toolkit for a data serialization format called Token-Oriented Object Notation (TOON), designed specifically to optimize how structured data is passed to large language models. The format aims to reduce token overhead compared with traditional formats like JSON while remaining human-readable and structurally expressive.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    DeepSeek-V3

    DeepSeek-V3

    Powerful AI language model (MoE) optimized for efficiency/performance

    DeepSeek-V3 is a robust Mixture-of-Experts (MoE) language model developed by DeepSeek, featuring a total of 671 billion parameters, with 37 billion activated per token. It employs Multi-head Latent Attention (MLA) and the DeepSeekMoE architecture to enhance computational efficiency. The model introduces an auxiliary-loss-free load balancing strategy and a multi-token prediction training objective to boost performance. Trained on 14.8 trillion diverse, high-quality tokens, DeepSeek-V3 underwent supervised fine-tuning and reinforcement learning to fully realize its capabilities. ...
    Downloads: 64 This Week
    Last Update:
    See Project
  • 3
    TokenCost

    TokenCost

    Easy token price estimates for 400+ LLMs. TokenOps

    TokenCost is an open-source developer utility designed to estimate the cost of using large language model APIs by calculating token usage and translating it into real monetary values. The tool focuses on helping developers understand how much their prompts and generated completions cost when interacting with commercial AI models. It works by counting tokens in prompts and responses before or after sending requests and then applying pricing information associated with different models. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    FastVLM

    FastVLM

    This repository contains the official implementation of FastVLM

    ...Apple’s research brief frames FastVLM as targeting real-time or latency-sensitive scenarios, where lowering visual token pressure is critical to interactive UX. In short, it’s a practical recipe to make VLMs fast without exotic token-selection heuristics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    OpenAI Privacy Filter

    OpenAI Privacy Filter

    Bidirectional token-classification model for identifiable info

    OpenAI Privacy Filter is an open-weight machine learning model designed to detect and mask personally identifiable information in text with high efficiency and contextual awareness. It operates as a bidirectional token classification system that labels sensitive data in a single forward pass rather than generating text sequentially, enabling fast processing for large datasets. The model supports long-context inputs, allowing it to analyze extensive documents without chunking, which improves consistency in redaction tasks. It can run locally on standard hardware, ensuring that sensitive information never leaves the user’s environment and supporting privacy-first workflows. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Claude Cognitive

    Claude Cognitive

    Persistent context and multi-instance coordination

    ...It introduces an attention-based context router that prioritizes files and content relevant to the current development discussion — tagging them as HOT, WARM, or COLD based on recency and keyword activation — so Claude Code doesn’t waste token budget rereading irrelevant code. This context routing dramatically reduces redundant token usage and accelerates large codebase interactions by focusing only on what truly matters to the current task. Additionally, Claude-Cognitive includes a pool coordinator to share state across multiple Claude Code instances, preserving what’s been learned or completed and preventing repetitive debugging or redundant exploration.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    Claw Compactor

    Claw Compactor

    14-stage Fusion Pipeline for LLM token compression

    ...It addresses the challenge of finite context windows in language models by compressing or summarizing historical interactions while preserving essential information. The system works by transforming older conversation data into condensed representations that maintain continuity without exceeding token limits. This approach allows long-running agent sessions to continue operating efficiently without losing critical context. It is especially useful in autonomous workflows where agents accumulate large volumes of interaction history over time. The project aligns with broader strategies in AI systems that balance memory retention with computational constraints. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    TONL

    TONL

    TONL (Token-Optimized Notation Language)

    TONL is a cutting-edge data platform built around a production-ready serialization format designed to be both compact and powerful, combining human readability with performance features that make it suitable for large-scale applications and AI workflows. It provides a serialization format that significantly reduces token usage compared with traditional JSON, which can result in lower costs and more efficient prompt size utilization in LLM-driven systems. TONL isn’t just a format — it includes a rich API for querying, indexing, modifying, and streaming data, along with tools for schema validation and TypeScript code generation. The platform comes with a complete command-line interface that supports interactive dashboards and cross-platform usage in browsers and server environments, and its high test coverage gives developers confidence in stability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Pinchtab

    Pinchtab

    High-performance browser automation bridge and orchestrator

    ...Implemented as a small standalone HTTP server, it allows any agent or script to interact with web pages using simple API calls instead of heavyweight browser frameworks. The tool emphasizes accessibility-first snapshots that dramatically reduce token usage compared to screenshot-based approaches, making it cost-effective for large-scale automation. It launches and manages its own Chrome instance while remaining framework-agnostic, so it can be used with any language or agent system. Pinchtab also supports persistent sessions, stealth automation, and both headless and headed operation modes. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 10
    DeepSeek R1

    DeepSeek R1

    Open-source, high-performance AI model with advanced reasoning

    ...DeepSeek R1 offers unrestricted access for both commercial and academic use. The model employs a Mixture of Experts (MoE) architecture, comprising 671 billion total parameters with 37 billion active parameters per token, and supports a context length of up to 128,000 tokens. DeepSeek-R1's training regimen uniquely integrates large-scale reinforcement learning (RL) without relying on supervised fine-tuning, enabling the model to develop advanced reasoning capabilities. This approach has resulted in performance comparable to leading models like OpenAI's o1, while maintaining cost-efficiency. ...
    Downloads: 98 This Week
    Last Update:
    See Project
  • 11
    Lossless Claw

    Lossless Claw

    LCM (Lossless Context Management) plugin for OpenClaw

    ...Instead of relying on traditional sliding-window truncation or lossy summarization, it introduces a lossless architecture that preserves all historical messages while maintaining usable context within token limits. The system stores every interaction in a persistent database and incrementally summarizes older content into a hierarchical directed acyclic graph, allowing efficient compression without discarding information. This structure enables agents to dynamically reconstruct detailed context by expanding summaries when needed, effectively simulating perfect long-term memory.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    Gitingest

    Gitingest

    Create prompt-friendly codebase digests from any Git repository URL

    ...In addition to producing the code digest, Gitingest also calculates statistics about the extracted content such as repository structure, total size of the extract, and token count. Gitingest can be used as a command line utility or integrated directly into Python applications.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    Oh My OpenCode Slim

    Oh My OpenCode Slim

    Slimmed, cleaned and fine-tuned oh-my-opencode fork

    Oh My OpenCode Slim is a lightweight, optimized fork of the broader oh-my-opencode ecosystem, designed to deliver high-performance multi-agent coding workflows while significantly reducing token consumption and system overhead. It retains the core concept of orchestrating multiple specialized AI agents but streamlines their configuration, execution, and communication to make the system more efficient and practical for everyday use. The framework introduces a structured “pantheon” of agents, each with a defined role such as orchestration, exploration, and execution, allowing tasks to be automatically delegated and completed through coordinated workflows. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    OpenMonoAgent

    OpenMonoAgent

    Terminal-native coding agent powered by local LLMs

    OpenMonoAgent.ai is a self-hosted coding agent designed to run entirely on the user’s own hardware. It pairs a .NET CLI with a local llama.cpp inference server so developers can use agentic coding workflows without cloud subscriptions or per-token billing. The project emphasizes privacy, local control, and ownership of the model, compute, and project data. It includes a terminal-native workflow, built-in tools, Docker sandboxing, and code intelligence features. The system can run on CPU or GPU and is designed to auto-configure itself when possible. OpenMonoAgent.ai is best suited for developers who want a local AI development stack with no API keys, no cloud dependency, and no telemetry.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    OpenSpace

    OpenSpace

    OpenSpace: Make Your Agents: Smarter, Low-Cost, Self-Evolving

    ...The platform emphasizes collective intelligence, enabling multiple agents to share learned behaviors and benefit from each other’s experiences. It also focuses on cost efficiency by reducing redundant computations and reusing successful workflows, significantly lowering token usage in repeated tasks. The framework includes monitoring and evaluation mechanisms to track skill performance and ensure reliability as systems evolve. It supports integration with various agent platforms, making it flexible and extensible across different environments.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    Humanizer Skill

    Humanizer Skill

    Claude Code skill that removes signs of AI-generated writing from text

    ...It also includes functions for transforming camelCase, snake_case, or PascalCase identifiers into spaced and capitalized representations suitable for user interfaces, reports, or documentation. Beyond text formatting, the library can handle pluralization, enumeration formatting (“A, B, and C”), and token expansion so that program-generated content feels more conversational.
    Downloads: 118 This Week
    Last Update:
    See Project
  • 17
    Tiktoken

    Tiktoken

    tiktoken is a fast BPE tokeniser for use with OpenAI's models

    tiktoken is a high-performance, tokenizer library (based on byte-pair encoding, BPE) designed for use with OpenAI’s models. It handles encoding and decoding text to token IDs efficiently, with minimal overhead. Because tokenization is a fundamental step in preparing text for models, tiktoken is optimized for speed, memory, and correctness in model contexts (e.g. matching OpenAI’s internal tokenization). The repo supports multiple encodings (e.g. “cl100k_base”) and lets users switch encoding names to match different model contexts. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    MiMo-V2-Flash

    MiMo-V2-Flash

    MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation

    MiMo-V2-Flash is a large Mixture-of-Experts language model designed to deliver strong reasoning, coding, and agentic-task performance while keeping inference fast and cost-efficient. It uses an MoE setup where a very large total parameter count is available, but only a smaller subset is activated per token, which helps balance capability with runtime efficiency. The project positions the model for workflows that require tool use, multi-step planning, and higher throughput, rather than only single-turn chat. Architecturally, it highlights attention and prediction choices aimed at accelerating generation while preserving instruction-following quality in complex prompts. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    MiniOneRec

    MiniOneRec

    Minimal reproduction of OneRec

    ...The framework provides an end-to-end pipeline for building generative recommender systems, including semantic identifier construction, supervised fine-tuning, and reinforcement learning-based optimization. Semantic IDs are created using techniques such as quantized variational autoencoders to convert item features into token sequences that can be modeled by transformer architectures. Developers can train and evaluate recommendation models using different backbone language models while benefiting from the generative framework’s parameter efficiency and scalability.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Transformer Explainer

    Transformer Explainer

    Learn How LLM Transformer Models Work with Interactive Visualization

    ...Through visual diagrams and interactive interfaces, the tool reveals how tokens are processed through layers such as embeddings, attention mechanisms, and feed-forward networks. Users can observe how attention weights change as the model predicts the next token, offering insight into how transformer architectures capture relationships between words. The design of the platform emphasizes educational accessibility, allowing students, researchers, and developers to explore complex machine learning concepts without requiring specialized hardware or installations.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    HunyuanImage-3.0

    HunyuanImage-3.0

    A Powerful Native Multimodal Model for Image Generation

    ...It unifies multimodal understanding and generation in a single autoregressive framework, combining text and image modalities seamlessly rather than relying on separate image-only diffusion components. It uses a Mixture-of-Experts (MoE) architecture with many expert subnetworks to scale efficiently, deploying only a subset of experts per token, which allows large parameter counts without linear inference cost explosion. The model is intended to be competitive with closed-source image generation systems, aiming for high fidelity, prompt adherence, fine detail, and even “world knowledge” reasoning (i.e. leveraging context, semantics, or common sense in generation). The GitHub repo includes code, scripts, model loading instructions, inference utilities, prompt handling, and integration with standard ML tooling (e.g. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    RecursiveMAS

    RecursiveMAS

    Offical Implementation for "Recursive Multi-Agent Systems"

    ...It also incorporates an inner–outer loop training approach that optimizes the entire system collectively rather than tuning each agent separately. This design improves efficiency, reduces token usage, and stabilizes learning during iterative reasoning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Streamdown

    Streamdown

    Streaming markdown renderer for AI apps with smooth updates

    ...Streamdown is built to handle partial Markdown input gracefully, progressively enhancing the output as more text becomes available. It is especially relevant for chat interfaces, coding assistants, and any environment where responses are streamed token by token. Streamdown emphasizes performance and simplicity, ensuring that developers can integrate it without unnecessary complexity. It prioritizes correctness in Markdown rendering while maintaining responsiveness during continuous updates. Overall, it serves as a practical solution for improving the user experience of real-time generated text displays.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    MoBA

    MoBA

    MoBA: Mixture of Block Attention for Long-Context LLMs

    MoBA, short for Mixture of Block Attention, is an open-source research implementation of a novel attention mechanism designed to improve the efficiency of large language models processing extremely long contexts. The architecture adapts ideas from Mixture-of-Experts networks and applies them directly to the attention mechanism of transformer models. Instead of forcing each token to attend to every other token in the sequence, MoBA divides the context into blocks and dynamically routes queries to only the most relevant segments of information. This routing strategy reduces the computational cost associated with traditional attention while preserving performance on reasoning and long-context tasks. The approach allows language models to scale to significantly longer input contexts without the quadratic computational cost normally associated with transformer attention mechanisms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    DFlash

    DFlash

    Block Diffusion for Ultra-Fast Speculative Decoding

    ...It acts as a “drafter” that proposes likely continuations which the main model then verifies, enabling significant throughput gains compared to traditional autoregressive decoding methods that generate token by token. This approach has been shown to deliver lossless acceleration on models like Qwen3-8B by combining block diffusion techniques with efficient batching, making it ideal for applications where latency and cost matter. The project includes support for multiple draft models, example integration code, and scripts to benchmark performance, and it is structured to work with popular model serving stacks like SGLang and the Hugging Face Transformers ecosystem.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Auth0 Logo