Refraction
AI-Powered Code Generation and Refactoring Tool
StarCoder is a powerful 15B parameter model trained on 80+ programming languages. Generate, complete, or fill in the middle of code with high accuracy using Hugging Face’s open-source model.
StarCoder is a large language model built by the BigCode project, designed to generate and complete source code across more than 80 programming languages. With 15.5 billion parameters and a focus on fill-in-the-middle training, StarCoder supports advanced code generation tasks and assists developers with high-quality, context-aware completions.
Designed to run efficiently on modern hardware and deployed via Hugging Face, StarCoder is accessible for both developers seeking code assistance and researchers evaluating open-source coding models.
StarCoder was trained on the deduplicated dataset The Stack (v1.2) and includes code from over 80 programming languages. Whether you’re working in Python, JavaScript, C++, or niche languages, the model can adapt to your environment.
Unlike traditional left-to-right generation, StarCoder supports fill-in-the-middle (FIM) tasks. This allows developers to insert missing blocks of code between existing sections, enhancing the flexibility of auto-completion and snippet generation.
StarCoder uses GPT-2 architecture with Multi-Query Attention and a large 8192-token context window. It’s optimized for understanding and generating long, structured code sequences, making it ideal for real-world software development tasks.
Trained on over 1 trillion tokens, StarCoder was built using 512 A100 GPUs over a 24-day training cycle. The dataset was filtered to exclude opt-out content and includes only permissively licensed code.
StarCoder can generate new functions, complete unfinished code, and assist in writing boilerplate or repetitive logic. It's a helpful tool for prototyping, learning, and automating development workflows.
As an open-access model under the BigCode OpenRAIL-M license, StarCoder is ideal for academic research, benchmarking, and building downstream applications for coding tasks.
Developers can use StarCoder directly via Hugging Face Transformers with just a few lines of code. It’s fully accessible with GPU acceleration for local or cloud deployment.
StarCoder is released under the BigCode OpenRAIL-M license. While the training data was sourced from openly licensed code, users are responsible for ensuring proper attribution and respecting license requirements when using generated code.
A searchable index is available to trace the origin of any generated code segments, allowing developers to provide proper attribution when necessary.
StarCoder has demonstrated strong performance on coding benchmarks, including:
These scores highlight the model’s effectiveness across general-purpose programming tasks.