starcoder

StarCoder is a 15.5B parameter language model developed by BigCode for code generation tasks across more than 80 programming languages. It is trained on 1 trillion tokens from the permissively licensed dataset The Stack v1.2, using the Fill-in-the-Middle (FIM) objective and Multi-Query Attention to enhance performance. With an extended context window of 8192 tokens and pretraining in bfloat16, StarCoder can generate, complete, or refactor code in various languages, with English as the primary natural language. While it is not an instruction-tuned model, it can act as a capable technical assistant when prompted appropriately. Developers can use it for general-purpose code generation, with fine control over prefix/middle/suffix tokens. The model has some limitations: generated code may contain bugs or licensing constraints, and attribution must be observed when output resembles training data. StarCoder is licensed under the BigCode OpenRAIL-M license.

Features

15.5B parameters trained on 1T tokens from 80+ programming languages
Supports Fill-in-the-Middle (FIM) objective for smart code editing
Multi-Query Attention and 8192-token context window
Trained on permissively licensed GitHub code (The Stack v1.2)
Generates code in Python, JavaScript, Java, C++, and many more
Includes tools for tracing output to source code for attribution
Optimized with Megatron-LM and PyTorch on 512 A100 GPUs
Licensed under BigCode OpenRAIL-M for responsible open use

Project Samples

Project Activity

See All Activity >

Follow starcoder

starcoder Web Site

Other Useful Business Software

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Rate This Project

User Reviews

Be the first to post a review of starcoder!

Additional Project Details

Programming Language

Python

Related Categories

Python AI Models

Registered

2025-06-27

Similar Business Software

StarCoder

StarCoder and StarCoderBase are Large Language Models for Code (Code LLMs) trained on permissively licensed data from GitHub, including from 80+ programming languages, Git commits, GitHub issues, and Jupyter notebooks. Similar to LLaMA, we trained a ~15B parameter model for 1 trillion tokens. We...

See Software
CodeGemma

CodeGemma is a collection of powerful, lightweight models that can perform a variety of coding tasks like fill-in-the-middle code completion, code generation, natural language understanding, mathematical reasoning, and instruction following. CodeGemma has 3 model variants, a 7B pre-trained...

See Software
Kimi K2

Kimi K2 is a state-of-the-art open source large language model series built on a mixture-of-experts (MoE) architecture, featuring 1 trillion total parameters and 32 billion activated parameters for task-specific efficiency. Trained with the Muon optimizer on over 15.5 trillion tokens and...

See Software