Overview
Phi-3.5-mini-instruct is a 3.8 billion parameter language model released by Microsoft in August 2024 as part of the Phi-3.5 family. It is a dense, decoder-only Transformer that uses the same tokenizer as Phi-3 Mini, with a 32,064 token vocabulary. Microsoft built it on the Phi-3 recipe of synthetic data and heavily filtered public web text, training on 3.4 trillion tokens. It is an update to the June 2024 Phi-3 Mini instruction-tuned release, refined with additional post-training data based on user feedback.
What it's good at
For its size the model punches above its weight on reasoning and instruction following. Microsoft reports it is competitive with much larger open-weight models such as Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3, and Mistral-Nemo-12B-Instruct on multilingual and long-context benchmarks. It scores 55.4 on multilingual MMLU and handles its full 128K context for tasks like long document summarization, document QA, and retrieval over large inputs. It supports 23 languages including Arabic, Chinese, French, German, Japanese, and Spanish, though English remains its strongest. It also handles code, with training centered on Python and common libraries.
Running locally
At 3.8B parameters Phi-3.5-mini-instruct is cheap to run. A 4-bit quantized build fits in roughly 3-4 GB of VRAM and runs on most consumer GPUs; full FP16 needs about 8 GB. It works with Hugging Face transformers (set trust_remote_code=True), vLLM for serving, and GGUF builds through llama.cpp or Ollama for CPU and laptop use. The 128K context can raise memory use, so shorter context windows help on constrained hardware.
License
Phi-3.5-mini-instruct is released under the MIT license. That allows free commercial and research use, modification, and redistribution with minimal restrictions. The weights are openly available on Hugging Face.
