compact free download

GLM-4.5

GLM-4.5: Open-source LLM for intelligent agents by Z.ai

GLM-4.5 is a cutting-edge open-source large language model designed by Z.ai for intelligent agent applications. The flagship GLM-4.5 model has 355 billion total parameters with 32 billion active parameters, while the compact GLM-4.5-Air version offers 106 billion total parameters and 12 billion active parameters. Both models unify reasoning, coding, and intelligent agent capabilities, providing two modes: a thinking mode for complex reasoning and tool usage, and a non-thinking mode for immediate responses. They are released under the MIT license, allowing commercial use and secondary development. ...

1 Review

Downloads: 68 This Week

Last Update: 2026-02-01

See Project

nano-graphrag

A simple, easy-to-hack GraphRAG implementation

...GraphRAG expands traditional RAG pipelines by constructing knowledge graphs from documents and using relationships between entities to improve the quality and reasoning of AI responses. The nano-GraphRAG project focuses on reducing complexity by providing a compact and readable codebase that preserves the core functionality of graph-based retrieval systems while remaining easy to modify and extend. The system extracts entities and relationships from documents using language models and organizes them into graph structures that can be queried during generation. Developers can integrate different storage backends and embedding engines, including vector databases and graph databases such as Neo4j, allowing flexible experimentation with hybrid retrieval methods.

Downloads: 1 This Week

Last Update: 2026-03-05

See Project

llm.c

LLM training in simple, raw C/CUDA

...By stripping away heavy frameworks, it exposes the core math and memory flows of embeddings, attention, and feed-forward layers. The code illustrates how to wire forward passes, losses, and simple training or inference loops with direct control over arrays and buffers. Its compact design makes it easy to trace execution, profile hotspots, and understand the cost of each operation. Portability is a goal: it aims to compile with common toolchains and run on modest hardware for small experiments. Rather than delivering a production-grade stack, it serves as a reference and learning scaffold for people who want to “see the metal” behind LLMs.

Downloads: 0 This Week

Last Update: 2025-10-15

See Project

VibeThinker

Diversity-driven optimization and large-model reasoning ability

VibeThinker is a compact but high-capability open-source language model released by WeiboAI (Sina AI Lab). It contains about 1.5 billion parameters, far smaller than many “frontier” models, yet it is explicitly optimized for reasoning, mathematics, and code generation tasks rather than general open-domain chat. The innovation lies in its training methodology: the team uses what they call the Spectrum-to-Signal Principle (SSP), where a first stage emphasizes diversity of reasoning paths (the “spectrum” phase) and a second stage uses reinforcement techniques (the “signal” phase) to refine toward correctness and strong reasoning. ...

Downloads: 8 This Week

Last Update: 6 days ago

See Project

SkillOpt

Text-space optimizer that trains reusable natural-language skills

SkillOpt is a Microsoft research project for improving frozen LLM agents by optimizing reusable natural-language skill documents. Instead of changing model weights, it treats a compact skill file as the trainable state of the agent. The system learns from agent rollouts, reflection, bounded edits, and validation gates to produce better instructions over time. Its output is a deployable best_skill.md artifact that can be reused across agent tasks. The project is focused on making agents more effective through text-space optimization rather than traditional fine-tuning. ...

Downloads: 1 This Week

Last Update: 2026-06-02

See Project

ChatGLM3

ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat

...The family includes base and long-context variants (8K/32K/128K). The repo ships Python APIs, CLI and web demos (Gradio/Streamlit), an OpenAI-format API server, and a compact fine-tuning kit. Quantization (4/8-bit), CPU/MPS support, and accelerator backends (TensorRT-LLM, OpenVINO, chatglm.cpp) enable lightweight local or edge deployment.

Downloads: 2 This Week

Last Update: 1 day ago

See Project

Nano-vLLM

A lightweight vLLM implementation built from scratch

...The project recreates the core functionality of vLLM in a simplified architecture written in approximately a thousand lines of Python, making it easier for developers and researchers to understand how modern LLM inference systems work. Despite its compact design, nano-vllm incorporates advanced optimization techniques such as prefix caching, tensor parallelism, and CUDA graph execution to achieve high performance during model inference. The engine is intended primarily for educational use, experimentation, and lightweight deployments where a full production-grade inference stack may be unnecessary. ...

Downloads: 1 This Week

Last Update: 2026-04-26

See Project

TONL

TONL (Token-Optimized Notation Language)

TONL is a cutting-edge data platform built around a production-ready serialization format designed to be both compact and powerful, combining human readability with performance features that make it suitable for large-scale applications and AI workflows. It provides a serialization format that significantly reduces token usage compared with traditional JSON, which can result in lower costs and more efficient prompt size utilization in LLM-driven systems.

Downloads: 0 This Week

Last Update: 2026-02-07

See Project

Chat with LLMs Everywhere

Run PyTorch LLMs locally on servers, desktop and mobile

TorchChat is an open-source project from the PyTorch ecosystem designed to demonstrate how large language models can be executed efficiently across different computing environments. The project provides a compact codebase that illustrates how to run conversational AI systems using PyTorch models on laptops, servers, and mobile devices. It is intended primarily as a reference implementation that shows developers how to integrate large language models into applications without requiring a large or complex infrastructure stack. TorchChat supports running models through Python interfaces as well as integrating them directly into native applications written in languages such as C or C++. ...

Downloads: 0 This Week

Last Update: 2026-03-05

See Project

llama2.c

Inference Llama 2 in one file of pure C

...While it can technically load Meta’s official Llama 2 models, current support is limited to fp32 precision, meaning practical use is capped at models up to around 7B parameters. The goal of llama2.c is to demonstrate how a compact and transparent implementation can perform meaningful inference even with small models, emphasizing simplicity, clarity, and accessibility. The project builds upon lessons from nanoGPT and takes inspiration from llama.cpp, focusing instead on minimalism and educational value over large-scale performance.

Downloads: 2 This Week

Last Update: 2026-06-15

See Project

Cake

Distributed LLM and StableDiffusion inference

Cake is a compact, powerful toolkit that combines a flexible TCP/UDP proxy, port forwarding system, and connection manager designed for both development and penetration testing scenarios. It enables users to create complex networking flows where traffic can be proxied, relayed, and manipulated between endpoints — useful for debugging networked applications, inspecting protocols, or tunneling traffic through different hops.

Downloads: 0 This Week

Last Update: 2026-04-24

See Project

Llama 2 Everywhere (L2E)

Llama 2 Everywhere (L2E) is an open-source implementation of the LLaMA-2 large language model architecture designed to demonstrate how transformer-based language models can be executed with extremely minimal code. The project focuses on simplicity and educational clarity by implementing inference for LLaMA-style models in a compact C program rather than relying on large machine learning frameworks. Developers can train models using a Python training pipeline and then run inference using a lightweight C implementation that requires very few dependencies. The architecture mirrors the structure of the LLaMA-2 model family, allowing compatible model checkpoints to be converted and executed within the simplified runtime environment. ...

Downloads: 0 This Week

Last Update: 2026-03-06

See Project

ZAYA1-8B

Efficient MoE reasoning model for coding and math workloads

ZAYA1-8B is a compact Mixture-of-Experts reasoning model developed by Zyphra, designed to deliver unusually high intelligence density with fewer than 1 billion active parameters. The model contains 8.4B total parameters with around 760M active during inference, allowing it to achieve strong reasoning, mathematics, and coding performance while remaining lightweight enough for efficient local or on-device deployment.

Downloads: 0 This Week

Last Update: 2026-05-08

See Project

Search Results for "compact"

Showing 13 open source projects for "compact"

GLM-4.5

nano-graphrag

llm.c

VibeThinker

SkillOpt

ChatGLM3

Nano-vLLM

TONL

Chat with LLMs Everywhere

llama2.c

Cake

Llama 2 Everywhere (L2E)

ZAYA1-8B

Search Results for "compact"

Showing 13 open source projects for "compact"

GLM-4.5

nano-graphrag

llm.c

VibeThinker

SkillOpt

ChatGLM3

Nano-vLLM

TONL

Chat with LLMs Everywhere

llama2.c

Cake

Llama 2 Everywhere (L2E)

ZAYA1-8B

Related Searches

Related Categories