sparse free download - SourceForge

GLM-5

From Vibe Coding to Agentic Engineering

...Building on earlier GLM series models, GLM-5 dramatically scales the parameter count (to roughly 744 billion) and expands pre-training data to significantly improve performance on complex tasks such as multi-step reasoning, software engineering workflows, and agent orchestration compared to its predecessors like GLM-4.5. It incorporates innovations like DeepSeek Sparse Attention (DSA) to preserve massive context windows while reducing deployment costs and supporting long context processing, which is crucial for detailed plans and agent tasks.

Downloads: 100 This Week

Last Update: 2026-05-15

See Project

GLM-5.1

GLM-5: From Vibe Coding to Agentic Engineering

...GLM-5.1 is designed to remain effective over extended problem-solving sessions, allowing it to iteratively refine strategies, analyze failures, and sustain productivity across hundreds of reasoning cycles and tool calls. The model leverages large-scale pretraining, reinforcement learning infrastructure, and sparse attention mechanisms to improve efficiency while maintaining strong long-context understanding. It supports deployment through frameworks such as vLLM, SGLang, xLLM, and KTransformers, enabling scalable local inference for enterprise and research use cases.

Downloads: 31 This Week

Last Update: 2026-05-15

See Project

PowerInfer

High-speed Large Language Model Serving for Local Deployment

...Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into GPU memory while processing less common activations on the CPU. This hybrid execution strategy significantly reduces memory bottlenecks and improves overall inference speed. PowerInfer incorporates specialized algorithms and sparse operators to manage neuron activation patterns and minimize data transfers between hardware components. As a result, it enables powerful language models to run on consumer hardware while achieving performance comparable to more expensive server-grade systems.

Downloads: 0 This Week

Last Update: 2026-05-11

See Project

Ling

Ling is a MoE LLM provided and open-sourced by InclusionAI

Ling is a Mixture-of-Experts (MoE) large language model (LLM) provided and open-sourced by inclusionAI. The project offers different sizes (Ling-lite, Ling-plus) and emphasizes flexibility and efficiency: being able to scale, adapt expert activation, and perform across a range of natural language/reasoning tasks. Example scripts, inference pipelines, and documentation. The codebase includes inference, examples, models, documentation, and model download infrastructure. As more developers and...

Downloads: 0 This Week

Last Update: 2025-09-30

See Project

Ring

Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI

Ring is a reasoning Mixture-of-Experts (MoE) large language model (LLM) developed by inclusionAI. It is built from or derived from Ling. Its design emphasizes reasoning, efficiency, and modular expert activation. In its “flash” variant (Ring-flash-2.0), it optimizes inference by activating only a subset of experts. It applies reinforcement learning/reasoning optimization techniques. Its architectures and training approaches are tuned to enable efficient and capable reasoning performance....

Downloads: 0 This Week

Last Update: 2025-09-30

See Project

kg-gen

Knowledge Graph Generation from Any Text

kg-gen is an open-source framework developed by the STAIR Lab that automatically generates knowledge graphs from unstructured text using large language models. The system is designed to transform plain text sources such as documents, articles, or conversation transcripts into structured graphs composed of entities and relationships. Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing...

Downloads: 0 This Week

Last Update: 2026-03-09

See Project

Mixtral offloading

Run Mixtral-8x7B models in Colab or consumer desktops

...The project implements techniques that allow model components to be dynamically moved between CPU memory and GPU memory during inference, significantly reducing the amount of GPU VRAM required to run the model. This approach takes advantage of the sparse activation properties of mixture-of-experts architectures, where only a subset of expert networks are used for each token during generation. By selectively loading and caching the required experts, the system avoids keeping the entire model in GPU memory at once. The repository includes notebooks and code examples that demonstrate how to run large language models on consumer hardware such as personal GPUs or cloud notebook environments.

Downloads: 0 This Week

Last Update: 2026-03-06

See Project

LLaMA-MoE

Building Mixture-of-Experts from LLaMA with Continual Pre-training

...The repository is centered on making MoE research more accessible by offering smaller and more affordable models with only about 3.0 to 3.5 billion activated parameters, which helps reduce deployment and experimentation costs. Its architecture works by splitting LLaMA feed-forward networks into sparse experts and adding gating mechanisms so that only selected experts are activated during inference and training. The project is not just a model release, but also a research framework that includes multiple expert construction methods, several gating strategies, and tooling for continual pre-training on filtered SlimPajama-based datasets. ...

Downloads: 1 This Week

Last Update: 2026-03-10

See Project

Command A+

4-bit Command A+ model for enterprise agents and multilingual tasks

Command A+ 05-2026 W4A4 is a 4-bit quantized version of Cohere’s open-source Command A+ model, optimized for enterprise-grade agentic, multilingual, and reasoning-heavy workloads. It supports text and image inputs, generates text outputs, and uses a sparse Mixture-of-Experts Transformer architecture with 218B total parameters and 25B active parameters. The W4A4 release applies 4-bit weight and activation quantization mainly to MoE experts, preserving attention components at full precision to reduce quality loss while improving speed, latency, and hardware efficiency. Cohere recommends W4A4 for most users because it offers a smaller hardware footprint with negligible benchmark differences compared to BF16 and FP8 versions. ...

Downloads: 0 This Week

Last Update: 2026-05-21

See Project

DeepSeek-V4-Flash

Efficient MoE model for million-token reasoning and coding

...It has 284B total parameters with 13B activated and supports a 1M-token context window, making it suitable for long-document reasoning, complex coding, agentic workflows, and large-scale information processing. The model uses a hybrid attention architecture that combines Compressed Sparse Attention and Heavily Compressed Attention to improve long-context efficiency, while Manifold-Constrained Hyper-Connections strengthen signal stability across layers. It is trained on more than 32T tokens and refined through a post-training pipeline that includes supervised fine-tuning, reinforcement learning, domain-specific expert cultivation, and on-policy distillation. ...

Downloads: 0 This Week

Last Update: 2026-04-24

See Project

Search Results for "sparse"

Showing 10 open source projects for "sparse"

GLM-5

GLM-5.1

PowerInfer

Ling

Ring

kg-gen

Mixtral offloading

LLaMA-MoE

Command A+

DeepSeek-V4-Flash

Search Results for "sparse"

Showing 10 open source projects for "sparse"

GLM-5

GLM-5.1

PowerInfer

Ling

Ring

kg-gen

Mixtral offloading

LLaMA-MoE

Command A+

DeepSeek-V4-Flash

Related Searches

Related Categories