BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) is a 176-billion parameter autoregressive language model developed by the BigScience Workshop. It generates coherent text in 46 natural languages and 13 programming languages, making it one of the most multilingual LLMs publicly available. BLOOM was trained on 366 billion tokens using Megatron-DeepSpeed and large-scale computational resources. It can perform various tasks via prompt-based learning, even without task-specific fine-tuning, by framing them as text generation problems. Released under the BigScience RAIL license, BLOOM promotes responsible AI usage and open-access research. Though capable and flexible, the model has known limitations, including potential biases, hallucinations, and misuse if deployed without safeguards. Its training and evaluation were documented transparently, including metrics, ethical considerations, and its estimated carbon emissions.
Features
- 176B parameters trained with Megatron-DeepSpeed architecture
- Supports 46 natural languages and 13 programming languages
- Performs zero-shot and few-shot learning via text continuation
- Openly released under the BigScience RAIL license
- Capable of generating text, answering questions, and coding
- Pretrained on 366B tokens from a wide variety of sources
- Publicly available evaluation results and carbon emissions data
- Designed for research, multilingual tasks, and responsible AI use