Page 2 | reduce free download

Qwen2.5-Omni

Capable of understanding text, audio, vision, video

Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...

Downloads: 0 This Week

Last Update: 2025-09-23

See Project

Firefly LLM

A large model training tool that supports training large models

Firefly is an open-source framework designed to simplify the training and fine-tuning of large language models through a unified and configurable workflow. The project provides a comprehensive environment where developers can perform tasks such as model pre-training, instruction tuning, and preference optimization using widely adopted machine learning techniques. Its architecture supports both full-parameter training and parameter-efficient strategies like LoRA and QLoRA, making it suitable...

Downloads: 0 This Week

Last Update: 2026-03-08

See Project

Mixtral offloading

Run Mixtral-8x7B models in Colab or consumer desktops

Mixtral-Offloading is an open-source project designed to enable efficient inference of large Mixture-of-Experts language models such as Mixtral-8x7B on hardware with limited GPU memory. The project implements techniques that allow model components to be dynamically moved between CPU memory and GPU memory during inference, significantly reducing the amount of GPU VRAM required to run the model. This approach takes advantage of the sparse activation properties of mixture-of-experts...

Downloads: 0 This Week

Last Update: 2026-03-06

See Project

LLaMA-MoE

Building Mixture-of-Experts from LLaMA with Continual Pre-training

...The repository is centered on making MoE research more accessible by offering smaller and more affordable models with only about 3.0 to 3.5 billion activated parameters, which helps reduce deployment and experimentation costs. Its architecture works by splitting LLaMA feed-forward networks into sparse experts and adding gating mechanisms so that only selected experts are activated during inference and training. The project is not just a model release, but also a research framework that includes multiple expert construction methods, several gating strategies, and tooling for continual pre-training on filtered SlimPajama-based datasets. ...

Downloads: 0 This Week

Last Update: 2026-03-10

See Project

Search Results for "reduce" - Page 2

Showing 29 open source projects for "reduce"

Qwen2.5-Omni

Firefly LLM

Mixtral offloading

LLaMA-MoE

Search Results for "reduce" - Page 2

Showing 29 open source projects for "reduce"

Qwen2.5-Omni

Firefly LLM

Mixtral offloading

LLaMA-MoE

Related Categories