c memory allocator free download

Kernel Memory

Research project. A Memory solution for users, teams, and applications

Kernel Memory is an open-source reference architecture developed by Microsoft to help developers build memory systems for AI applications powered by large language models. The project focuses on enabling applications to store, index, and retrieve information so that AI systems can incorporate external knowledge when generating responses. It supports scenarios such as document ingestion, semantic search, and retrieval-augmented generation, allowing language models to answer questions using...

Downloads: 0 This Week

Last Update: 2026-03-06

See Project

PicoLM

Run a 1-billion parameter LLM on a $10 board with 256MB RAM

PicoLM is an open-source inference framework designed to run large language models on extremely constrained hardware environments such as inexpensive single-board computers and embedded systems. The project focuses on enabling efficient local inference by optimizing memory usage, computation, and system dependencies so that relatively large models can operate on devices with minimal RAM. It is written primarily in C and designed with a minimalist architecture that removes unnecessary dependencies and external libraries. The runtime is capable of running language models with billions of parameters on devices with only a few hundred megabytes of memory, which is significantly lower than typical LLM infrastructure requirements. ...

Downloads: 0 This Week

Last Update: 2026-03-09

See Project

llama.cpp

LLM inference in C/C++

llama.cpp is a high-performance C and C++ project for running large language models locally and in the cloud with minimal setup. It is built around efficient inference, broad hardware support, and the GGUF model format. The project supports many model families and has become a major foundation for local AI tools, model serving, and embedded inference workflows. It provides command-line tools, a server mode with an OpenAI-compatible API style, model conversion utilities, and extensive backend...

Downloads: 15 This Week

Last Update: 10 hours ago

See Project

PowerInfer

High-speed Large Language Model Serving for Local Deployment

PowerInfer is a high-performance inference engine designed to run large language models efficiently on personal computers equipped with consumer-grade GPUs. The project focuses on improving the performance of local AI inference by optimizing how neural network computations are distributed between CPU and GPU resources. Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into...

Downloads: 0 This Week

Last Update: 2026-05-11

See Project

Tencent-Hunyuan-Large

Open-source large language model family from Tencent Hunyuan

...It is designed with long-context capabilities, quantization support, and high performance on benchmarks across general reasoning, mathematics, language understanding, and Chinese / multilingual tasks. It aims to provide competitive capability with efficient deployment and inference. FP8 quantization support to reduce memory usage (~50%) while maintaining precision. High benchmarking performance on tasks like MMLU, MATH, CMMLU, C-Eval, etc.

Downloads: 2 This Week

Last Update: 2025-09-24

See Project

llm.c

LLM training in simple, raw C/CUDA

llm.c is a minimalist, systems-level implementation of a small transformer-based language model in C that prioritizes clarity and educational value. By stripping away heavy frameworks, it exposes the core math and memory flows of embeddings, attention, and feed-forward layers. The code illustrates how to wire forward passes, losses, and simple training or inference loops with direct control over arrays and buffers. Its compact design makes it easy to trace execution, profile hotspots, and understand the cost of each operation. ...

Downloads: 0 This Week

Last Update: 2025-10-15

See Project

ChatGLM.cpp

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

ChatGLM.cpp is a C++ implementation of the ChatGLM-6B model, enabling efficient local inference without requiring a Python environment. It is optimized for running on consumer hardware.

Downloads: 0 This Week

Last Update: 2025-01-21

See Project

Mooncake

Mooncake is the serving platform for Kimi

Mooncake is an open-source infrastructure platform designed to optimize large language model serving by focusing on efficient management and transfer of model data and KV cache. The platform was originally developed as part of the serving infrastructure for the Kimi large language model system. Its architecture centers on a high-performance transfer engine that provides unified data transfer across different storage and networking technologies. This engine enables efficient movement of...

Downloads: 0 This Week

Last Update: 2026-05-24

See Project

tt-metal

TT-NN operator library, and TT-Metalium low level kernel programming

tt-metal, also referred to in its documentation as TT-Metalium, is Tenstorrent’s low-level software development kit for programming applications on Tenstorrent AI accelerators. The project is designed for developers who need direct access to the company’s Tensix processor architecture, exposing a programming model that is closer to hardware control than high-level inference frameworks. Instead of following a traditional GPU model centered on massive thread parallelism, the platform is built...

Downloads: 0 This Week

Last Update: 4 days ago

See Project

RTP-LLM

Alibaba's high-performance LLM inference engine for diverse apps

RTP-LLM is an open-source large language model inference acceleration engine developed by Alibaba to provide high-performance serving infrastructure for modern LLM deployments. The system focuses on improving throughput, latency, and resource utilization when running large models in production environments. It achieves this by implementing optimized GPU kernels, batching strategies, and memory management techniques tailored for transformer inference workloads. The framework is designed for...

Downloads: 0 This Week

Last Update: 2026-03-09

See Project

nndeploy

An Easy-to-Use and High-Performance AI Deployment Framework

nndeploy is an open-source framework designed to simplify the deployment of artificial intelligence models across multiple hardware platforms and devices. The framework focuses on making it easier to transform trained AI models into production-ready applications that can run efficiently on desktops, mobile devices, servers, and edge computing hardware. Developers can use visual workflows to design and configure AI processing pipelines by connecting modular nodes that represent different...

Downloads: 0 This Week

Last Update: 2026-04-04

See Project

pgvecto.rs

Vector database plugin for Postgres, written in Rust

pgvecto.rs is a Postgres extension that provides vector similarity search functions. It is written in Rust and based on pgrx. It is currently under heavy development, please take care when using it in production. pgvecto.rs is a Postgres extension, which means that you can use it directly within your existing database. This makes it easy to integrate into your existing workflows and applications. pgvecto.rs supports filtering. You can set conditions when searching or retrieving points. This...

Downloads: 0 This Week

Last Update: 2024-11-22

See Project

Search Results for "c memory allocator"

Showing 12 open source projects for "c memory allocator"

Kernel Memory

PicoLM

llama.cpp

PowerInfer

Tencent-Hunyuan-Large

llm.c

ChatGLM.cpp

Mooncake

tt-metal

RTP-LLM

nndeploy

pgvecto.rs

Search Results for "c memory allocator"

Showing 12 open source projects for "c memory allocator"

Kernel Memory

PicoLM

llama.cpp

PowerInfer

Tencent-Hunyuan-Large

llm.c

ChatGLM.cpp

Mooncake

tt-metal

RTP-LLM

nndeploy

pgvecto.rs

Related Categories