cache free download - SourceForge

Showing 32 open source projects for "cache"

View related business solutions

Artificial Intelligence Clear Filters & Widen Search

Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

R-KV

Redundancy-aware KV Cache Compression for Reasoning Models

R-KV is an open-source research project that focuses on improving the efficiency of large language model inference through key-value cache compression techniques. Modern transformer models rely heavily on KV caches during autoregressive decoding, which store intermediate attention states to accelerate generation. However, these caches can consume large amounts of memory, especially in reasoning-oriented models with long context windows. R-KV introduces a method for compressing the KV cache during decoding, allowing models to maintain reasoning performance while reducing memory consumption and computational overhead. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
2

GPTCache

Semantic cache for LLMs. Fully integrated with LangChain

...Additionally, LLM services might exhibit slow response times, especially when dealing with a significant number of requests. To tackle this challenge, we have created GPTCache, a project dedicated to building a semantic cache for storing LLM responses. This project is undergoing swift development, and as such, the API may be subject to change at any time.

Downloads: 0 This Week

Last Update: 2024-08-01
See Project
3

KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

KVCache-Factory is an open-source research framework designed to explore and implement unified key-value cache compression techniques for autoregressive transformer models. In large language models, the key-value cache stores intermediate attention states that enable efficient token generation during inference, but these caches can consume large amounts of GPU memory when handling long contexts. KVCache-Factory provides a platform for implementing and evaluating multiple compression strategies that reduce memory usage while preserving model performance. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
4

CAG

Cache-Augmented Generation: A Simple, Efficient Alternative to RAG

CAG, or Cache-Augmented Generation, is an experimental framework that explores an alternative architecture for integrating external knowledge into large language model responses. Traditional retrieval-augmented generation systems rely on real-time retrieval of documents from databases or vector stores during inference. CAG proposes a different approach by preloading relevant knowledge into the model’s context window and precomputing the model’s key-value cache before queries are processed. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
5

LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

...These capabilities aim to lower latency, cut GPU cycles, and stabilize performance for production workloads with overlapping prompts or retrieval-augmented contexts. The end result is a cache fabric for LLMs that complements engines rather than replacing them.

Downloads: 1 This Week

Last Update: 2 days ago
See Project
6

Mooncake

Mooncake is the serving platform for Kimi

...Mooncake also introduces distributed key-value cache storage that allows inference systems to reuse previously computed attention states, significantly improving throughput in large-scale deployments. The system supports advanced networking technologies such as RDMA and NVMe over Fabric, enabling high-speed communication across clusters.

Downloads: 0 This Week

Last Update: 3 days ago
See Project
7

UCCL

UCCL is an efficient communication library for GPUs

...UCCL is designed to work with heterogeneous hardware environments, allowing GPUs from different vendors and network interfaces to communicate efficiently without vendor lock-in. The system also supports specialized workloads such as reinforcement learning weight transfers, key-value cache sharing, and expert parallelism for mixture-of-experts models. Its architecture emphasizes flexibility and extensibility so that developers can implement custom communication protocols tailored to specific machine learning workloads.

Downloads: 0 This Week

Last Update: 2026-03-14
See Project
8

FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs

...The platform enables developers to deploy trained models quickly using optimized inference pipelines that support GPUs, specialized AI accelerators, and other hardware architectures. FastDeploy includes advanced acceleration technologies such as speculative decoding, multi-token prediction, and efficient KV cache management to improve throughput and latency during inference. It also offers compatibility with OpenAI-style APIs and vLLM-like interfaces, allowing developers to integrate deployed models easily into existing applications and services.

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
9

FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

...It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style. The library supports both BF16 and FP16 data types, and includes a paged KV cache implementation with a block size of 64 to efficiently manage memory during decoding. On very compute-bound settings, it can reach up to ~660 TFLOPS on H800 SXM5 hardware, while in memory-bound configurations it can push memory throughput to ~3000 GB/s. The team regularly updates it with performance improvements; for example, a 2025 update claims 5 % to 15 % gains on compute-bound workloads while maintaining API compatibility.

Downloads: 0 This Week

Last Update: 2026-03-31
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

JDA

Java wrapper for the popular chat & VOIP service

...After setting the token and other options via setters, the JDA Object is then created by calling the build() method. When build() returns, JDA might not have finished starting up. However, you can use await ready() on the JDA object to ensure that the entire cache is loaded before proceeding.

Downloads: 1 This Week

Last Update: 2026-04-02
See Project
11

Grok CLI

An open-source AI agent that brings the power of Grok

Grok CLI is a command-line interface built around the Grok AI model that brings programmatic and conversational AI capabilities directly to developer terminals. It lets you run Grok queries from your shell, scripting environment, or automation workflows without switching to a browser, enabling utility in scripting, quick data exploration, code generation, and assistant-guided tasks directly where you write code. The CLI supports streaming responses, so outputs appear in real time as the Grok...

Downloads: 15 This Week

Last Update: 2026-04-17
See Project
12

HunyuanWorld-Voyager

RGBD video generation model conditioned on camera input

HunyuanWorld-Voyager is a next-generation video diffusion framework developed by Tencent-Hunyuan for generating world-consistent 3D scene videos from a single input image. By leveraging user-defined camera paths, it enables immersive scene exploration and supports controllable video synthesis with high realism. The system jointly produces aligned RGB and depth video sequences, making it directly applicable to 3D reconstruction tasks. At its core, Voyager integrates a world-consistent video...

Downloads: 25 This Week

Last Update: 2026-04-15
See Project
13

ModelScope

Bring the notion of Model-as-a-Service to life

ModelScope is built upon the notion of “Model-as-a-Service” (MaaS). It seeks to bring together most advanced machine learning models from the AI community, and streamlines the process of leveraging AI models in real-world applications. The core ModelScope library open-sourced in this repository provides the interfaces and implementations that allow developers to perform model inference, training and evaluation. In particular, with rich layers of API abstraction, the ModelScope library offers...

Downloads: 6 This Week

Last Update: 21 hours ago
See Project
14

mac code

Claude Code, but it runs on your Mac for free

mac code is a local AI coding agent designed to run large language models directly on Apple Silicon machines without relying on cloud services, effectively transforming a Mac into a self-contained AI development environment. The project focuses on enabling models that traditionally exceed available RAM to run efficiently by streaming model weights from SSD storage, thereby overcoming hardware limitations through innovative memory management techniques. It operates as a CLI-based assistant...

Downloads: 0 This Week

Last Update: 2026-04-14
See Project
15

tiny-llm

A course of learning LLM inference serving on Apple Silicon

tiny-llm is an educational open-source project designed to teach system engineers how large language model inference and serving systems work by building them from scratch. The project is structured as a guided course that walks developers through the process of implementing the core components required to run a modern language model, including attention mechanisms, token generation, and optimization techniques. Rather than relying on high-level machine learning frameworks, the codebase uses...

Downloads: 2 This Week

Last Update: 10 hours ago
See Project
16

TensorRT LLM

TensorRT LLM provides users with an easy-to-use Python API

TensorRT-LLM is an open-source high-performance inference library specifically designed to optimize and accelerate large language model deployment on NVIDIA GPUs. It provides a Python-based API built on top of PyTorch that allows developers to define, customize, and deploy LLMs efficiently across a variety of hardware configurations, from single GPUs to large multi-node clusters. The library focuses on maximizing throughput and minimizing latency through advanced techniques such as...

Downloads: 0 This Week

Last Update: 2026-04-16
See Project
17

Secret Llama

Fully private LLM chatbot that runs entirely with a browser

Secret Llama is a privacy-first large-language-model chatbot that runs entirely inside your web browser, meaning no server is required and your conversation data never leaves your device. It focuses on open-source model support, letting you load families like Llama and Mistral directly in the client for fully local inference. Because everything happens in-browser, it can work offline once models are cached, which is helpful for air-gapped environments or travel. The interface mirrors the...

Downloads: 1 This Week

Last Update: 2025-11-07
See Project
18

ChatGPT.Net

Unofficial .Net Client for ChatGPT

The ChatGPT.Net Unofficial .Net API for ChatGPT is a C# library that allows developers to access ChatGPT, a chat-based language model. With this API, developers can send queries to ChatGPT and receive responses in real-time, making it easy to integrate ChatGPT into their own applications. The new method operates without a browser by utilizing a server that has implemented bypass methods to function as a proxy. The library sends requests to the server, which then redirects the request to...

Downloads: 0 This Week

Last Update: 2024-06-29
See Project
19

ContextForge MCP Gateway

A Model Context Protocol (MCP) Gateway & Registry

MCP Context Forge is a feature-rich gateway and registry that federates Model Context Protocol (MCP) servers and traditional REST services behind a single, governed endpoint. It exposes an MCP-compliant interface to clients while handling discovery, authentication, rate limiting, retries, and observability on the server side. The gateway scales horizontally, supports multi-cluster deployments on Kubernetes, and uses Redis for federation and caching across instances. Operators can define...

Downloads: 0 This Week

Last Update: 2026-04-14
See Project
20

tic tac toe AI

simplest AI programme of tic-tac-toe game

...My future visions about this program is: v 1.0.1 --> bug fixes v 1.1 --> (added) click interaction _______________________________________________________________________________________________________________________________________________ v 1.2 --> addition of reinforcement learning (cache data different for each computer unlike v1.3). v 1.3 --> addition of cloud reinforcement learning (optional; chosen from settings). ... & more

Downloads: 1 This Week

Last Update: 2024-11-14
See Project
21

Bert-VITS2

VITS2 backbone with multilingual-bert

Bert-VITS2 is a neural text-to-speech project that combines a VITS2 backbone with a multilingual BERT front-end to produce high-quality speech in multiple languages. The core idea is to use BERT-style contextual embeddings for text encoding while relying on a refined VITS2 architecture for acoustic generation and vocoding. The repository includes everything needed to train, fine-tune, and run the model, from configuration files to preprocessing scripts, spectrogram utilities, and training...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
22

gpu_poor

Calculate token/s & GPU memory requirement for any LLM

...By analyzing factors such as model size, context length, batch size, and GPU specifications, the system estimates how much VRAM will be required and how fast tokens can be generated during inference. The tool also provides a detailed breakdown of where GPU memory is allocated, including model weights, KV cache, activations, and other runtime overhead. This information allows developers to evaluate trade-offs between different quantization methods such as GGML, bitsandbytes, and QLoRA before attempting to deploy a model. gpu_poor is particularly useful for researchers and hobbyists.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
23

Chidori

A reactive runtime for building durable AI agents

A reactive runtime for building durable AI agents. Chidori is an open-source orchestrator, runtime, and IDE for building software in symbiosis with modern AI tools. When using Chidori, you author code with python or javascript, we provide a layer for interfacing with the complexities of AI models in long-running workflows. We have avoided the need for declaring a new language or SDK in order to provide these capabilities so that you can leverage software patterns that you are already...

Downloads: 0 This Week

Last Update: 2024-09-02
See Project
24

Audio Webui

A webui for different audio related Neural Networks

...Installation is streamlined through automatic installers and platform-specific scripts that create a virtual environment, install dependencies, and launch the web app with minimal manual setup. For more advanced users, it exposes a rich set of command-line flags to control behavior such as skipping installation, disabling venv, changing model cache directories, sharing Gradio links, setting passwords, and specifying themes or ports.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
25

Snipe Chan Discord Bot

Snipe Chan is a Discord Bot that snipes deleted/edited messages

...Snipe Chan is a Discord Bot that snipes messages, you will have to run your own snipe bot with the code provided because of the limitations of Discord TOS. Features - Snipe messages - Snipe files - Sniped cache -Editable settings in config.yml -[prefix]help - help menu -[prefix]snipe - shows the latest snipe -[prefix]sniped [index] - shows a snipe in sniped cache - [prefix]snipelist [index | nothing] - shows interactive snipe list (Prev | Next | Hide | Remove) ---[Prev] - browse previous snipe --- [Next] - browse next snipe --- [Hide List] - hide embed --- [Remove Snipe] - remove from cache - [prefix]remove [index] - removes index from cache - [prefix]clear - clears cache - [prefix]version - check for newer versions

Downloads: 1 This Week

Last Update: 2023-07-04
See Project