Phi-3.5-mini-instruct

Updated
27.06.2026
Reasoning
Code
Multilingual

A 3.8B dense instruction-tuned LLM from Microsoft's Phi-3.5 family with a 128K context window and multilingual support.

pip install -U transformers accelerate
huggingface-cli download microsoft/Phi-3.5-mini-instruct
# or via Ollama:
ollama run phi3.5
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/Phi-3.5-mini-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3.5-mini-instruct", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct")
msgs = [{"role": "user", "content": "Explain quantum computing simply."}]
ids = tokenizer.apply_chat_template(msgs, return_tensors="pt")
out = model.generate(ids, max_new_tokens=256)
print(tokenizer.decode(out[0]))
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:8000/v1", apiKey: "local" });
const res = await client.chat.completions.create({
  model: "microsoft/Phi-3.5-mini-instruct",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(res.choices[0].message.content);

More models

NameSize / UsageContextInput
Phi-4
Reasoning, math, code16KText
Phi-4-mini-instruct
Math, reasoning, function calling128KText

At a glance

  • License: MIT
  • Context length: 128K tokens
  • Parameters: 3.8B (dense)
  • Languages: 23 supported languages
  • Minimum hardware: ~3-4 GB VRAM (4-bit), runs on CPU
  • Strengths: reasoning, long-context, multilingual chat

Overview

Phi-3.5-mini-instruct is a 3.8 billion parameter language model released by Microsoft in August 2024 as part of the Phi-3.5 family. It is a dense, decoder-only Transformer that uses the same tokenizer as Phi-3 Mini, with a 32,064 token vocabulary. Microsoft built it on the Phi-3 recipe of synthetic data and heavily filtered public web text, training on 3.4 trillion tokens. It is an update to the June 2024 Phi-3 Mini instruction-tuned release, refined with additional post-training data based on user feedback.

What it's good at

For its size the model punches above its weight on reasoning and instruction following. Microsoft reports it is competitive with much larger open-weight models such as Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3, and Mistral-Nemo-12B-Instruct on multilingual and long-context benchmarks. It scores 55.4 on multilingual MMLU and handles its full 128K context for tasks like long document summarization, document QA, and retrieval over large inputs. It supports 23 languages including Arabic, Chinese, French, German, Japanese, and Spanish, though English remains its strongest. It also handles code, with training centered on Python and common libraries.

Running locally

At 3.8B parameters Phi-3.5-mini-instruct is cheap to run. A 4-bit quantized build fits in roughly 3-4 GB of VRAM and runs on most consumer GPUs; full FP16 needs about 8 GB. It works with Hugging Face transformers (set trust_remote_code=True), vLLM for serving, and GGUF builds through llama.cpp or Ollama for CPU and laptop use. The 128K context can raise memory use, so shorter context windows help on constrained hardware.

License

Phi-3.5-mini-instruct is released under the MIT license. That allows free commercial and research use, modification, and redistribution with minimal restrictions. The weights are openly available on Hugging Face.

Desktop
macOS
(M1 or better)
Download
Windows
(x64)
Download
Linux
(x86_64)
Download

Frequently asked questions

Phi-3.5-mini-instruct has 3.8 billion parameters. It is a dense decoder-only Transformer, so all 3.8B parameters are active during inference. It was trained on 3.4 trillion tokens.

Phi-3.5-mini-instruct supports a 128K token context length. This makes it suitable for long-context tasks such as long document and meeting summarization, long document QA, and information retrieval over large inputs.

Yes. Microsoft released Phi-3.5-mini-instruct under the MIT license, which permits free commercial and research use, modification, and redistribution. The weights are available on Hugging Face and the model is intended for commercial and research use across multiple languages.

Phi-3.5-mini-instruct supports 23 languages, including Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, and Ukrainian. It was trained primarily on English, so non-English performance can be lower.

Because it has only 3.8B parameters, Phi-3.5-mini-instruct runs on modest hardware. A 4-bit quantized build fits in roughly 3-4 GB of VRAM and runs on most consumer GPUs, and it can also run on CPU through llama.cpp or Ollama. Full FP16 precision needs about 8 GB of VRAM.