Overview
Mistral-7B-Instruct-v0.2 is a 7-billion-parameter instruction-tuned language model from Mistral AI, the Paris-based lab founded in 2023. It is a chat-oriented fine-tune of the Mistral-7B-v0.2 base model and sits in the original Mistral 7B family. Released in late 2023, it was one of the most-downloaded open models of its generation and has since been superseded by v0.3. Compared with v0.1, this version widens the context window from 8K to 32K tokens, sets rope-theta to 1e6, and removes sliding-window attention.
What it's good at
The model handles general English chat, instruction following, summarization, and light coding. It uses the [INST] ... [/INST] prompt format, available through the tokenizer's chat template. On standard benchmarks the underlying Mistral 7B base outperformed Llama 2 13B across most tasks despite being roughly half the size, which is why the instruct variant became a common baseline for fine-tuning and RAG projects. Its 32K window suits longer documents and multi-turn conversations. It is primarily an English model and was not trained for native function calling, which arrived in v0.3.
Running locally
At FP16 the weights need about 15 GB of VRAM, so a 16 GB or 24 GB GPU runs it without quantization. With 4-bit GGUF quantization through llama.cpp or Ollama, memory drops to roughly 6 GB, and it will run on CPU with enough system RAM at reduced speed. It is supported by transformers, vLLM, llama.cpp, Ollama, and TGI, and is hosted by providers such as Cloudflare Workers AI and Fireworks.
License
Mistral-7B-Instruct-v0.2 is licensed under Apache 2.0. That permits commercial use, modification, and redistribution with no royalty and no copyleft obligation. Note that the model ships without built-in moderation or safety guardrails, so production deployments need their own content filtering.
