Showing 82 open source projects for "memory"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Kernel Memory

    Kernel Memory

    Research project. A Memory solution for users, teams, and applications

    Kernel Memory is an open-source reference architecture developed by Microsoft to help developers build memory systems for AI applications powered by large language models. The project focuses on enabling applications to store, index, and retrieve information so that AI systems can incorporate external knowledge when generating responses. It supports scenarios such as document ingestion, semantic search, and retrieval-augmented generation, allowing language models to answer questions using contextual information from private or enterprise datasets. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    SimpleMem

    SimpleMem

    SimpleMem: Efficient Lifelong Memory for LLM Agents

    SimpleMem is a lightweight memory-augmented model framework that helps developers build AI applications that retain long-term context and recall relevant information without overloading model context windows. It provides easy-to-use APIs for storing structured memory entries, querying those memories using semantic search, and retrieving context to augment prompt inputs for downstream processing.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    ChatGLM-6B

    ChatGLM-6B

    ChatGLM-6B: An Open Bilingual Dialogue Language Model

    ChatGLM-6B is an open bilingual (Chinese + English) conversational language model based on the GLM architecture, with approximately 6.2 billion parameters. The project provides inference code, demos (command line, web, API), quantization support for lower memory deployment, and tools for finetuning (e.g., via P-Tuning v2). It is optimized for dialogue and question answering with a balance between performance and deployability in consumer hardware settings. Support for quantized inference (INT4, INT8) to reduce GPU memory requirements. Automatic mode switching between precision/memory tradeoffs (full/quantized).
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    MemoryOS

    MemoryOS

    MemoryOS is designed to provide a memory operating system

    MemoryOS is an open-source framework designed to provide a structured memory management system for AI agents and large language model applications. The project addresses one of the major limitations of modern language models: their inability to maintain long-term context beyond the limits of their prompt window. MemoryOS introduces a hierarchical memory architecture inspired by operating system memory management principles, allowing agents to store, update, retrieve, and generate information from multiple layers of memory.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    AirLLM

    AirLLM

    AirLLM 70B inference with single 4GB GPU

    AirLLM is an open source Python library that enables extremely large language models to run on consumer hardware with very limited GPU memory. The project addresses one of the main barriers to local LLM experimentation by introducing a memory-efficient inference technique that loads model layers sequentially rather than storing the entire model in GPU memory. This layer-wise inference approach allows models with tens of billions of parameters to run on devices with only a few gigabytes of VRAM. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    bitsandbytes

    bitsandbytes

    Accessible large language models via k-bit quantization for PyTorch

    bitsandbytes is an open-source library designed to make training and inference of large neural networks more efficient by dramatically reducing memory usage. Built primarily for the PyTorch ecosystem, the library introduces advanced quantization techniques that allow models to operate using reduced numerical precision while maintaining high accuracy. These optimizations enable large language models and other deep learning architectures to run on hardware with limited memory resources, including consumer-grade GPUs. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    xLSTM

    xLSTM

    Neural Network architecture based on ideas of the original LSTM

    xLSTM is an open-source machine learning architecture that reimagines the classic Long Short-Term Memory (LSTM) network for modern large-scale language modeling and sequence processing tasks. The project introduces a new recurrent neural network design that incorporates exponential gating mechanisms and enhanced memory structures to overcome limitations of traditional LSTM models. By introducing innovations such as matrix-based memory and improved normalization techniques, xLSTM improves the ability of recurrent networks to capture long-range dependencies in sequential data. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    vLLM

    vLLM

    A high-throughput and memory-efficient inference and serving engine

    vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 9
    LangChain

    LangChain

    ⚡ Building applications with LLMs through composability ⚡

    Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using these LLMs in isolation is often not enough to create a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge. This library is aimed at assisting in the development of those types of applications.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    Zep

    Zep

    Zep: A long-term memory store for LLM / Chatbot applications

    Easily add relevant documents, chat history memory & rich user data to your LLM app's prompts. Understands chat messages, roles, and user metadata, not just texts and embeddings. Zep Memory and VectorStore implementations are shipped with your favorite frameworks: LangChain, LangChain.js, LlamaIndex, and more. Automatically embed texts and messages using state-of-the-art opeb source models, OpenAI, or bring your own vectors.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Neuron AI

    Neuron AI

    The PHP Agentic Framework to build production-ready AI driven apps

    Neuron AI is a PHP agentic framework for building production-ready AI applications that connect models, memory, vector databases, and tools into working agents. It is designed for developers who want to create systems such as RAG pipelines, multi-agent workflows, and business process automations without having to hand-build every integration from scratch. The framework provides an Agent class that can be extended to inherit core capabilities like memory, tools, function calling, and retrieval-augmented generation. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    PicoLM

    PicoLM

    Run a 1-billion parameter LLM on a $10 board with 256MB RAM

    PicoLM is an open-source inference framework designed to run large language models on extremely constrained hardware environments such as inexpensive single-board computers and embedded systems. The project focuses on enabling efficient local inference by optimizing memory usage, computation, and system dependencies so that relatively large models can operate on devices with minimal RAM. It is written primarily in C and designed with a minimalist architecture that removes unnecessary dependencies and external libraries. The runtime is capable of running language models with billions of parameters on devices with only a few hundred megabytes of memory, which is significantly lower than typical LLM infrastructure requirements. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    llmfit

    llmfit

    157 models, 30 providers, one command to find what runs on hardware

    llmfit is a terminal-based utility that helps developers determine which large language models can realistically run on their local hardware by analyzing system resources and model requirements. The tool automatically detects CPU, RAM, GPU, and VRAM specifications, then ranks available models based on performance factors such as speed, quality, and memory fit. It provides both an interactive terminal user interface and a traditional CLI mode, enabling flexible workflows for different user preferences. llmfit also supports advanced configurations including multi-GPU setups, mixture-of-experts architectures, and dynamic quantization recommendations. By presenting clear performance estimates and compatibility guidance, the project reduces the trial-and-error typically involved in local LLM experimentation. ...
    Downloads: 31 This Week
    Last Update:
    See Project
  • 14
    MiroFish

    MiroFish

    A Simple and Universal Swarm Intelligence Engine

    ...The system extracts “seed” information from sources such as breaking news, policy documents, and market signals to construct a high-fidelity digital parallel world populated by thousands of virtual agents with independent memory and behavior rules. Users can inject variables or conditions into this simulated environment from a “god’s eye view,” enabling iterative prediction of future trends under different assumptions, which can be useful for decision support, scenario planning, or creative exploration. The engine includes both backend and frontend components, with configuration and deployment instructions for local and containerized setups, and is designed to produce detailed predictive reports based on interactions and emergent patterns within the simulated world.
    Downloads: 340 This Week
    Last Update:
    See Project
  • 15
    R-KV

    R-KV

    Redundancy-aware KV Cache Compression for Reasoning Models

    ...Modern transformer models rely heavily on KV caches during autoregressive decoding, which store intermediate attention states to accelerate generation. However, these caches can consume large amounts of memory, especially in reasoning-oriented models with long context windows. R-KV introduces a method for compressing the KV cache during decoding, allowing models to maintain reasoning performance while reducing memory consumption and computational overhead. The approach focuses on identifying which attention heads and cache components are most important for maintaining reasoning quality, allowing less critical information to be compressed or discarded. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    claude-obsidian

    claude-obsidian

    Claude + Obsidian knowledge companion

    ...The system follows the LLM Wiki pattern, where information is stored as persistent markdown files that grow richer over time through cross-referencing and synthesis. It includes features such as contradiction detection, orphaned note identification, and automatic indexing. A persistent memory layer ensures continuity across sessions, eliminating the need for repeated context. It also performs autonomous research to fill knowledge gaps and expand the knowledge base. Overall, it turns note-taking into an active, compounding intelligence system.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 17
    Mooncake

    Mooncake

    Mooncake is the serving platform for Kimi

    ...Its architecture centers on a high-performance transfer engine that provides unified data transfer across different storage and networking technologies. This engine enables efficient movement of tensors and model data across heterogeneous environments such as GPU memory, system memory, and distributed storage systems. Mooncake also introduces distributed key-value cache storage that allows inference systems to reuse previously computed attention states, significantly improving throughput in large-scale deployments. The system supports advanced networking technologies such as RDMA and NVMe over Fabric, enabling high-speed communication across clusters.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 18
    LangChain Rust

    LangChain Rust

    LangChain for Rust, the easiest way to write LLM-based programs

    ...The library aims to provide Rust developers with a structured framework for orchestrating prompts, chains, agents, and external tools within LLM-driven workflows. By adapting LangChain concepts to the Rust programming language, the project emphasizes performance, safety, and efficient memory management. Developers can use the framework to build chatbots, autonomous agents, and knowledge-augmented AI systems that interact with external data sources. The library provides abstractions for model providers, prompt templates, conversation memory, and vector search integrations. It also enables the construction of multi-step pipelines where LLM outputs feed into subsequent actions or tool calls.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    GPU Hot

    GPU Hot

    Real-time NVIDIA GPU dashboard

    ...The project offers a self-hosted web interface that streams hardware metrics directly from GPU servers, enabling developers, ML engineers, and system administrators to observe GPU utilization and system behavior in real time through a browser. The dashboard collects and displays a wide range of performance metrics including temperature, memory usage, power consumption, clock speeds, fan speed, and active processes. It can scale from monitoring a single GPU workstation to large distributed environments with dozens or even hundreds of GPUs by running lightweight containers on each node and aggregating the data centrally.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    LLM Telegram Bot

    LLM Telegram Bot

    A Telegram bot for Large Language Models

    ...The project is designed to provide a customizable AI assistant that can operate within Telegram conversations, supporting dynamic responses based on user input and configurable parameters. It includes features such as conversation memory, allowing the bot to maintain context across multiple messages and provide more coherent responses. The system supports multiple modes or personas, enabling users to switch between different conversational styles or use cases. It also allows fine-tuning of generation parameters such as temperature and token limits, giving users control over response behavior. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    PEFT

    PEFT

    State-of-the-art Parameter-Efficient Fine-Tuning

    Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often prohibitively costly. In this regard, PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs. Recent State-of-the-Art PEFT techniques achieve performance comparable to that of full...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    MaiBot

    MaiBot

    Maimaibot, a (more focused) multi-platform intelligent agent

    ...It can generate responses that imitate human speech patterns, learn slang or expressions from chat participants, and adapt its conversational style based on previous interactions. The architecture includes a memory system that stores conversation history and contextual information, allowing the bot to recall previous events and maintain continuity in discussions.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    KVCache-Factory

    KVCache-Factory

    Unified KV Cache Compression Methods for Auto-Regressive Models

    ...In large language models, the key-value cache stores intermediate attention states that enable efficient token generation during inference, but these caches can consume large amounts of GPU memory when handling long contexts. KVCache-Factory provides a platform for implementing and evaluating multiple compression strategies that reduce memory usage while preserving model performance. The framework integrates several state-of-the-art methods such as PyramidKV, SnapKV, H2O, and StreamingLLM, allowing researchers to compare and experiment with different approaches within the same environment. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    uzu

    uzu

    A high-performance inference engine for AI models

    ...The engine implements a hybrid architecture in which model layers can be executed either as custom GPU kernels or through Apple’s MPSGraph API, allowing it to balance performance and compatibility depending on the workload. By utilizing Apple’s unified memory architecture, uzu reduces memory copying overhead and improves inference throughput for local AI workloads. The system includes a simple high-level API that enables developers to run models, create inference sessions, and generate outputs with minimal configuration.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PowerInfer

    PowerInfer

    High-speed Large Language Model Serving for Local Deployment

    ...Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into GPU memory while processing less common activations on the CPU. This hybrid execution strategy significantly reduces memory bottlenecks and improves overall inference speed. PowerInfer incorporates specialized algorithms and sparse operators to manage neuron activation patterns and minimize data transfers between hardware components. As a result, it enables powerful language models to run on consumer hardware while achieving performance comparable to more expensive server-grade systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
Auth0 Logo