weight free download - SourceForge

Gemma

Gemma open-weight LLM library, from Google DeepMind

...Through included tutorials and Colab notebooks, users can explore examples covering sampling, multi-modal interactions, and fine-tuning workflows. By providing accessible open-weight models, Gemma enables researchers and developers to experiment with state-of-the-art LLM architectures.

Downloads: 5 This Week

Last Update: 7 days ago

See Project

MiniMax-M1

Open-weight, large-scale hybrid-attention reasoning model

MiniMax-M1 is presented as the world’s first open-weight, large-scale hybrid-attention reasoning model, designed to push the frontier of long-context, tool-using, and deeply “thinking” language models. It is built on the MiniMax-Text-01 foundation and keeps the same massive parameter budget, but reworks the attention and training setup for better reasoning and test-time compute scaling.

Downloads: 0 This Week

Last Update: 2025-12-01

See Project

RTP-LLM

Alibaba's high-performance LLM inference engine for diverse apps

RTP-LLM is an open-source large language model inference acceleration engine developed by Alibaba to provide high-performance serving infrastructure for modern LLM deployments. The system focuses on improving throughput, latency, and resource utilization when running large models in production environments. It achieves this by implementing optimized GPU kernels, batching strategies, and memory management techniques tailored for transformer inference workloads. The framework is designed for...

Downloads: 1 This Week

Last Update: 2026-03-09

See Project

LLM-Pruner

On the Structural Pruning of Large Language Models

...The framework relies on gradient-based analysis to determine which parameters contribute least to model performance, enabling targeted structural pruning rather than simple weight removal. After pruning, the framework applies lightweight fine-tuning methods such as LoRA to recover performance using relatively small datasets and short training times.

Downloads: 1 This Week

Last Update: 2026-03-09

See Project

UCCL

UCCL is an efficient communication library for GPUs

...UCCL is designed to work with heterogeneous hardware environments, allowing GPUs from different vendors and network interfaces to communicate efficiently without vendor lock-in. The system also supports specialized workloads such as reinforcement learning weight transfers, key-value cache sharing, and expert parallelism for mixture-of-experts models. Its architecture emphasizes flexibility and extensibility so that developers can implement custom communication protocols tailored to specific machine learning workloads.

Downloads: 0 This Week

Last Update: 2026-05-10

See Project

MatMul-Free LM

Implementation for MatMul-free LM

MatMul-Free LM is an experimental implementation of a large language model architecture designed to eliminate traditional matrix multiplication operations used in transformer networks. Since matrix multiplication is one of the most computationally expensive components of modern language models, the project explores alternative computational strategies that reduce hardware requirements while maintaining comparable performance. The architecture relies on quantization-aware training and...

Downloads: 0 This Week

Last Update: 2026-03-05

See Project

DeepSeek LLM

DeepSeek LLM: Let there be answers

...According to the evaluation files, DeepSeek LLM 67B Chat achieves strong performance on math benchmarks under both chain-of-thought (CoT) and tool-assisted reasoning modes. The model is trained from scratch, reportedly on a vast multilingual + code + reasoning dataset, and competes with other open or open-weight models. The architecture mirrors established decoder-only transformer families: pre-norm structure, rotational embeddings (RoPE), grouped query attention (GQA), and mixing in languages and tasks. It supports both “Base” (foundation model) and “Chat” (instruction / conversation tuned) variants.

Downloads: 9 This Week

Last Update: 2025-10-03

See Project

DeepSeek-V4-Pro

Flagship MoE model for advanced reasoning, coding, and agents

DeepSeek-V4-Pro is a flagship open-weight Mixture-of-Experts language model designed for high-performance reasoning, coding, and agent-based workflows at scale. It features approximately 1.6 trillion total parameters with around 49B activated during inference, enabling strong efficiency while maintaining frontier-level capability. The model supports an ultra-long context window of up to 1 million tokens, making it highly suitable for long-document reasoning, large codebases, and complex multi-step tasks. ...

Downloads: 0 This Week

Last Update: 2026-04-24

See Project

ZAYA1-8B

Efficient MoE reasoning model for coding and math workloads

ZAYA1-8B is a compact Mixture-of-Experts reasoning model developed by Zyphra, designed to deliver unusually high intelligence density with fewer than 1 billion active parameters. The model contains 8.4B total parameters with around 760M active during inference, allowing it to achieve strong reasoning, mathematics, and coding performance while remaining lightweight enough for efficient local or on-device deployment. ZAYA1-8B is optimized for long-form reasoning and test-time compute...

Downloads: 0 This Week

Last Update: 2026-05-08

See Project

Qwen3.6-35B-A3B

Open multimodal model for coding, agents, and long-context tasks

Qwen3.6-35B-A3B is an open-weight multimodal model built for real-world coding, agent workflows, and long-context reasoning. It combines a causal language model with a vision encoder, supports text, image, and video inputs, and is optimized for frameworks such as Transformers, vLLM, SGLang, and KTransformers. The model emphasizes stability, responsiveness, and practical developer productivity, with major improvements in agentic coding, frontend generation, and repository-level reasoning. ...

Downloads: 0 This Week

Last Update: 2026-04-20

See Project

Command A+

4-bit Command A+ model for enterprise agents and multilingual tasks

...It supports text and image inputs, generates text outputs, and uses a sparse Mixture-of-Experts Transformer architecture with 218B total parameters and 25B active parameters. The W4A4 release applies 4-bit weight and activation quantization mainly to MoE experts, preserving attention components at full precision to reduce quality loss while improving speed, latency, and hardware efficiency. Cohere recommends W4A4 for most users because it offers a smaller hardware footprint with negligible benchmark differences compared to BF16 and FP8 versions. ...

Downloads: 0 This Week

Last Update: 6 days ago

See Project

Qwen3.6-35B-A3B-FP8

FP8 Qwen model for efficient multimodal coding and agent tasks

Qwen3.6-35B-A3B-FP8 is an FP8-quantized version of Qwen3.6 designed to deliver nearly the same performance as the original model while improving deployment efficiency. It is a multimodal open-weight model that combines a causal language model with a vision encoder, supporting text, image, and video inputs. Built for stability and real-world developer use, it emphasizes agentic coding, repository-level reasoning, and productive long-context workflows. A key capability is thinking preservation, which allows the model to retain reasoning traces from earlier messages, helping reduce repeated computation and improving consistency in iterative tasks. ...

Downloads: 0 This Week

Last Update: 2026-04-20

See Project

Search Results for "weight"

Showing 12 open source projects for "weight"

Gemma

MiniMax-M1

RTP-LLM

LLM-Pruner

UCCL

MatMul-Free LM

DeepSeek LLM

DeepSeek-V4-Pro

ZAYA1-8B

Qwen3.6-35B-A3B

Command A+

Qwen3.6-35B-A3B-FP8

Search Results for "weight"

Showing 12 open source projects for "weight"

Gemma

MiniMax-M1

RTP-LLM

LLM-Pruner

UCCL

MatMul-Free LM

DeepSeek LLM

DeepSeek-V4-Pro

ZAYA1-8B

Qwen3.6-35B-A3B

Command A+

Qwen3.6-35B-A3B-FP8

Related Searches

Related Categories