DeepSeek MoE
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
DeepSeek-MoE (“DeepSeek MoE”) is the DeepSeek open implementation of a Mixture-of-Experts (MoE) model architecture meant to increase parameter efficiency by activating only a subset of “expert” submodules per input. The repository introduces fine-grained expert segmentation and shared expert isolation to improve specialization while controlling compute cost. For example, their MoE variant with 16.4B parameters claims comparable or better performance to standard dense models like DeepSeek 7B...