Overview
Phi-4 is a 14-billion-parameter dense, decoder-only Transformer from Microsoft Research, released on December 12, 2024 as the fourth generation of the Phi family of small language models. It was trained on 9.8 trillion tokens over 21 days on 1,920 H100 GPUs, drawing heavily on synthetic "textbook-like" data alongside filtered web documents and acquired academic books and Q&A sets. The training recipe deliberately prioritized data quality and reasoning over sheer parameter count.
What it's good at
Phi-4 punches above its weight on reasoning and math. It scores 84.8 on MMLU, 80.4 on MATH, 56.1 on GPQA, and 82.6 on HumanEval, beating the similarly sized Qwen 2.5 14B across most of these and edging out GPT-4o on the GPQA graduate-level science benchmark. The synthetic data, generated through multi-agent self-revision workflows, lets it distill and in places surpass its GPT-4 teacher on math and code. Its clear weakness is factual recall: a SimpleQA score of just 3.0 means it should not be relied on for memorized world knowledge.
Running locally
At 4-bit (Q4_K_M) quantization Phi-4 needs about 9 GB of VRAM, so a 12 GB GPU runs it without offloading. On an 8 GB card it spills over by roughly 1 GB and Ollama or llama.cpp moves the overflow to system RAM, trading speed for fit. It works with transformers, vLLM, llama.cpp, and Ollama (ollama run phi4), and supports NVIDIA, AMD ROCm, and Apple Silicon. The model expects the chat format with <|im_start|> role separators.
License
Phi-4 is released under the MIT license, one of the most permissive options available. You can use it commercially, modify it, and redistribute it without paying fees or asking permission. The model is English-focused, with multilingual data making up only about 8% of training, so non-English use cases will see weaker results.
