Phi-4

Updated

Reasoning

Code

A 14B open model from Microsoft Research tuned for math, reasoning, and code, competitive with much larger LLMs.

pip install -U transformers
huggingface-cli download microsoft/phi-4
# or with Ollama:
ollama run phi4

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/phi-4",
    "messages": [{"role": "user", "content": "Solve: 24*17"}]
  }'

import transformers
pipeline = transformers.pipeline(
    "text-generation",
    model="microsoft/phi-4",
    model_kwargs={"torch_dtype": "auto"},
    device_map="auto",
)
out = pipeline([{"role": "user", "content": "How should I explain the Internet?"}], max_new_tokens=256)
print(out[0]["generated_text"])

import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:8000/v1", apiKey: "local" });
const res = await client.chat.completions.create({
  model: "microsoft/phi-4",
  messages: [{ role: "user", content: "Write a Python function for fibonacci." }],
});
console.log(res.choices[0].message.content);

More models

View all

Name	Size / Usage	Context	Input
Phi-3.5-mini-instruct	General chat, multilingual	128K	Text
Phi-4-mini-instruct	Math, reasoning, function calling	128K	Text

At a glance

License: MIT
Context length: 16K tokens
Parameters: 14B dense decoder-only
Languages: primarily English
Minimum hardware: ~9 GB VRAM (4-bit), 12 GB GPU comfortable
Strengths: math, reasoning, code generation

Overview

Phi-4 is a 14-billion-parameter dense, decoder-only Transformer from Microsoft Research, released on December 12, 2024 as the fourth generation of the Phi family of small language models. It was trained on 9.8 trillion tokens over 21 days on 1,920 H100 GPUs, drawing heavily on synthetic "textbook-like" data alongside filtered web documents and acquired academic books and Q&A sets. The training recipe deliberately prioritized data quality and reasoning over sheer parameter count.

What it's good at

Phi-4 punches above its weight on reasoning and math. It scores 84.8 on MMLU, 80.4 on MATH, 56.1 on GPQA, and 82.6 on HumanEval, beating the similarly sized Qwen 2.5 14B across most of these and edging out GPT-4o on the GPQA graduate-level science benchmark. The synthetic data, generated through multi-agent self-revision workflows, lets it distill and in places surpass its GPT-4 teacher on math and code. Its clear weakness is factual recall: a SimpleQA score of just 3.0 means it should not be relied on for memorized world knowledge.

Running locally

At 4-bit (Q4_K_M) quantization Phi-4 needs about 9 GB of VRAM, so a 12 GB GPU runs it without offloading. On an 8 GB card it spills over by roughly 1 GB and Ollama or llama.cpp moves the overflow to system RAM, trading speed for fit. It works with transformers, vLLM, llama.cpp, and Ollama (ollama run phi4), and supports NVIDIA, AMD ROCm, and Apple Silicon. The model expects the chat format with <|im_start|> role separators.

License

Phi-4 is released under the MIT license, one of the most permissive options available. You can use it commercially, modify it, and redistribute it without paying fees or asking permission. The model is English-focused, with multilingual data making up only about 8% of training, so non-English use cases will see weaker results.

Desktop

macOS

(M1 or better)

Download

Windows

(x64)

Download

Linux

(x86_64)

Download

Frequently asked questions

Phi-4 is a 14-billion-parameter dense decoder-only language model from Microsoft Research, released on December 12, 2024. It was trained on a blend of synthetic "textbook-like" data, filtered web documents, and academic Q&A sets, with a focus on reasoning quality over raw scale. Phi-4 is tuned for math, coding, and logical reasoning, and is best used with chat-formatted prompts.

At 4-bit quantization (Q4_K_M) Phi-4 needs roughly 9 GB of VRAM, so a 12 GB GPU runs it comfortably. On an 8 GB card it slightly overflows and Ollama or llama.cpp will offload the extra layers to system RAM, which works but slows generation. Apple Silicon with unified memory and CPU offloading are also supported.

Yes. Microsoft released Phi-4 under the permissive MIT license, which allows commercial use, modification, and redistribution. The weights are available on Hugging Face and through Ollama, Azure AI Foundry, and GitHub Models at no cost.

Phi-4 has a 16K-token context window. That is enough for long chat sessions, multi-step reasoning problems, and moderately sized documents, though it is shorter than the 128K windows offered by some larger competing models.

Despite being only 14B parameters, Phi-4 is competitive with much larger models on reasoning and math. It scores 84.8 on MMLU, 80.4 on MATH, and 56.1 on GPQA, beating Qwen 2.5 14B and outperforming GPT-4o on the GPQA science benchmark. Its weak spot is factual recall, where SimpleQA scores are low, so it is better at reasoning than at memorized world knowledge.