MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models
..., deep and thin network design, embedding sharing, and grouped-query attention (GQA)—to achieve a superior trade-off between model size, inference speed, and accuracy. MobileLLM demonstrates remarkable performance, with the 125M and 350M variants outperforming previous state-of-the-art models of the same scale by up to 4.3% on zero-shot commonsense reasoning tasks.