compact free download

GLM-4.5

GLM-4.5: Open-source LLM for intelligent agents by Z.ai

GLM-4.5 is a cutting-edge open-source large language model designed by Z.ai for intelligent agent applications. The flagship GLM-4.5 model has 355 billion total parameters with 32 billion active parameters, while the compact GLM-4.5-Air version offers 106 billion total parameters and 12 billion active parameters. Both models unify reasoning, coding, and intelligent agent capabilities, providing two modes: a thinking mode for complex reasoning and tool usage, and a non-thinking mode for immediate responses. They are released under the MIT license, allowing commercial use and secondary development. ...

1 Review

Downloads: 68 This Week

Last Update: 2026-02-01

See Project

nano-graphrag

A simple, easy-to-hack GraphRAG implementation

...GraphRAG expands traditional RAG pipelines by constructing knowledge graphs from documents and using relationships between entities to improve the quality and reasoning of AI responses. The nano-GraphRAG project focuses on reducing complexity by providing a compact and readable codebase that preserves the core functionality of graph-based retrieval systems while remaining easy to modify and extend. The system extracts entities and relationships from documents using language models and organizes them into graph structures that can be queried during generation. Developers can integrate different storage backends and embedding engines, including vector databases and graph databases such as Neo4j, allowing flexible experimentation with hybrid retrieval methods.

Downloads: 1 This Week

Last Update: 2026-03-05

See Project

llm.c

LLM training in simple, raw C/CUDA

...By stripping away heavy frameworks, it exposes the core math and memory flows of embeddings, attention, and feed-forward layers. The code illustrates how to wire forward passes, losses, and simple training or inference loops with direct control over arrays and buffers. Its compact design makes it easy to trace execution, profile hotspots, and understand the cost of each operation. Portability is a goal: it aims to compile with common toolchains and run on modest hardware for small experiments. Rather than delivering a production-grade stack, it serves as a reference and learning scaffold for people who want to “see the metal” behind LLMs.

Downloads: 0 This Week

Last Update: 2025-10-15

See Project

VibeThinker

Diversity-driven optimization and large-model reasoning ability

VibeThinker is a compact but high-capability open-source language model released by WeiboAI (Sina AI Lab). It contains about 1.5 billion parameters, far smaller than many “frontier” models, yet it is explicitly optimized for reasoning, mathematics, and code generation tasks rather than general open-domain chat. The innovation lies in its training methodology: the team uses what they call the Spectrum-to-Signal Principle (SSP), where a first stage emphasizes diversity of reasoning paths (the “spectrum” phase) and a second stage uses reinforcement techniques (the “signal” phase) to refine toward correctness and strong reasoning. ...

Downloads: 8 This Week

Last Update: 5 days ago

See Project

SkillOpt

Text-space optimizer that trains reusable natural-language skills

SkillOpt is a Microsoft research project for improving frozen LLM agents by optimizing reusable natural-language skill documents. Instead of changing model weights, it treats a compact skill file as the trainable state of the agent. The system learns from agent rollouts, reflection, bounded edits, and validation gates to produce better instructions over time. Its output is a deployable best_skill.md artifact that can be reused across agent tasks. The project is focused on making agents more effective through text-space optimization rather than traditional fine-tuning. ...

Downloads: 1 This Week

Last Update: 2026-06-02

See Project

ChatGLM3

ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat

...The family includes base and long-context variants (8K/32K/128K). The repo ships Python APIs, CLI and web demos (Gradio/Streamlit), an OpenAI-format API server, and a compact fine-tuning kit. Quantization (4/8-bit), CPU/MPS support, and accelerator backends (TensorRT-LLM, OpenVINO, chatglm.cpp) enable lightweight local or edge deployment.

Downloads: 2 This Week

Last Update: 13 hours ago

See Project

Nano-vLLM

A lightweight vLLM implementation built from scratch

...The project recreates the core functionality of vLLM in a simplified architecture written in approximately a thousand lines of Python, making it easier for developers and researchers to understand how modern LLM inference systems work. Despite its compact design, nano-vllm incorporates advanced optimization techniques such as prefix caching, tensor parallelism, and CUDA graph execution to achieve high performance during model inference. The engine is intended primarily for educational use, experimentation, and lightweight deployments where a full production-grade inference stack may be unnecessary. ...

Downloads: 1 This Week

Last Update: 2026-04-26

See Project

Chat with LLMs Everywhere

Run PyTorch LLMs locally on servers, desktop and mobile

TorchChat is an open-source project from the PyTorch ecosystem designed to demonstrate how large language models can be executed efficiently across different computing environments. The project provides a compact codebase that illustrates how to run conversational AI systems using PyTorch models on laptops, servers, and mobile devices. It is intended primarily as a reference implementation that shows developers how to integrate large language models into applications without requiring a large or complex infrastructure stack. TorchChat supports running models through Python interfaces as well as integrating them directly into native applications written in languages such as C or C++. ...

Downloads: 0 This Week

Last Update: 2026-03-05

See Project

llama2.c

Inference Llama 2 in one file of pure C

...While it can technically load Meta’s official Llama 2 models, current support is limited to fp32 precision, meaning practical use is capped at models up to around 7B parameters. The goal of llama2.c is to demonstrate how a compact and transparent implementation can perform meaningful inference even with small models, emphasizing simplicity, clarity, and accessibility. The project builds upon lessons from nanoGPT and takes inspiration from llama.cpp, focusing instead on minimalism and educational value over large-scale performance.

Downloads: 2 This Week

Last Update: 7 days ago

See Project

Search Results for "compact"

Showing 9 open source projects for "compact"

GLM-4.5

nano-graphrag

llm.c

VibeThinker

SkillOpt

ChatGLM3

Nano-vLLM

Chat with LLMs Everywhere

llama2.c

Search Results for "compact"

Showing 9 open source projects for "compact"

GLM-4.5

nano-graphrag

llm.c

VibeThinker

SkillOpt

ChatGLM3

Nano-vLLM

Chat with LLMs Everywhere

llama2.c

Related Searches

Related Categories