Phi-3.5-mini-instruct

Updated

Reasoning

Code

Multilingual

Run

A 3.8B dense instruction-tuned LLM from Microsoft's Phi-3.5 family with a 128K context window and multilingual support.

pip install -U transformers accelerate
huggingface-cli download microsoft/Phi-3.5-mini-instruct
# or via Ollama:
ollama run phi3.5

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/Phi-3.5-mini-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3.5-mini-instruct", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct")
msgs = [{"role": "user", "content": "Explain quantum computing simply."}]
ids = tokenizer.apply_chat_template(msgs, return_tensors="pt")
out = model.generate(ids, max_new_tokens=256)
print(tokenizer.decode(out[0]))

import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:8000/v1", apiKey: "local" });
const res = await client.chat.completions.create({
  model: "microsoft/Phi-3.5-mini-instruct",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(res.choices[0].message.content);

More models

View all

Name	Size / Usage	Context	Input
Phi-4	Reasoning, math, code	16K	Text
Phi-4-mini-instruct	Math, reasoning, function calling	128K	Text

At a glance

License: MIT
Context length: 128K tokens
Parameters: 3.8B (dense)
Languages: 23 supported languages
Minimum hardware: ~3-4 GB VRAM (4-bit), runs on CPU
Strengths: reasoning, long-context, multilingual chat

Overview

Phi-3.5-mini-instruct is a 3.8 billion parameter language model released by Microsoft in August 2024 as part of the Phi-3.5 family. It is a dense, decoder-only Transformer that uses the same tokenizer as Phi-3 Mini, with a 32,064 token vocabulary. Microsoft built it on the Phi-3 recipe of synthetic data and heavily filtered public web text, training on 3.4 trillion tokens. It is an update to the June 2024 Phi-3 Mini instruction-tuned release, refined with additional post-training data based on user feedback.

What it's good at

For its size the model punches above its weight on reasoning and instruction following. Microsoft reports it is competitive with much larger open-weight models such as Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3, and Mistral-Nemo-12B-Instruct on multilingual and long-context benchmarks. It scores 55.4 on multilingual MMLU and handles its full 128K context for tasks like long document summarization, document QA, and retrieval over large inputs. It supports 23 languages including Arabic, Chinese, French, German, Japanese, and Spanish, though English remains its strongest. It also handles code, with training centered on Python and common libraries.

Running locally

At 3.8B parameters Phi-3.5-mini-instruct is cheap to run. A 4-bit quantized build fits in roughly 3-4 GB of VRAM and runs on most consumer GPUs; full FP16 needs about 8 GB. It works with Hugging Face transformers (set trust_remote_code=True), vLLM for serving, and GGUF builds through llama.cpp or Ollama for CPU and laptop use. The 128K context can raise memory use, so shorter context windows help on constrained hardware.

License

Phi-3.5-mini-instruct is released under the MIT license. That allows free commercial and research use, modification, and redistribution with minimal restrictions. The weights are openly available on Hugging Face.

Desktop

macOS

(M1 or better)

Download

Windows

(x64)

Download

Linux

(x86_64)

Download

Frequently asked questions

Phi-3.5-mini-instruct has 3.8 billion parameters. It is a dense decoder-only Transformer, so all 3.8B parameters are active during inference. It was trained on 3.4 trillion tokens.

Phi-3.5-mini-instruct supports a 128K token context length. This makes it suitable for long-context tasks such as long document and meeting summarization, long document QA, and information retrieval over large inputs.

Yes. Microsoft released Phi-3.5-mini-instruct under the MIT license, which permits free commercial and research use, modification, and redistribution. The weights are available on Hugging Face and the model is intended for commercial and research use across multiple languages.

Phi-3.5-mini-instruct supports 23 languages, including Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, and Ukrainian. It was trained primarily on English, so non-English performance can be lower.

Because it has only 3.8B parameters, Phi-3.5-mini-instruct runs on modest hardware. A 4-bit quantized build fits in roughly 3-4 GB of VRAM and runs on most consumer GPUs, and it can also run on CPU through llama.cpp or Ollama. Full FP16 precision needs about 8 GB of VRAM.