Phi-2 is a 2.7 billion parameter Transformer model developed by Microsoft, designed for natural language processing and code generation tasks. It was trained on a filtered dataset of high-quality web content and synthetic NLP texts created by GPT-3.5, totaling 1.4 trillion tokens. Phi-2 excels in benchmarks for common sense, language understanding, and logical reasoning, outperforming most models under 13B parameters despite not being instruction-tuned or aligned via RLHF. It performs best on QA-style prompts, code generation, and chat dialogues using structured input formats. The model has a context length of 2048 tokens and was trained over 14 days on 96 A100 GPUs using DeepSpeed and FlashAttention. Though compact, it still exhibits verbosity, potential bias, and may generate inaccurate or verbose code without supervision. Phi-2 is released under the MIT license to support open research on safe, controllable language modeling.
Features
- 2.7B parameter Transformer optimized for QA, chat, and code
- Trained on 1.4T tokens from high-quality web and synthetic data
- Excels at logical reasoning and language comprehension tasks
- Supports next-token generation with 2048 token context window
- Performs well without RLHF or instruction fine-tuning
- Built with DeepSpeed, FlashAttention, and PyTorch
- MIT-licensed and openly available for research and development
- Known issues include verbosity and limited instruction adherence