Mistral-7B-Instruct-v0.2

Updated

Reasoning

Code

A 7B instruction-tuned LLM from Mistral AI with a 32K context window, released under Apache 2.0.

pip install -U transformers
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistralai/Mistral-7B-Instruct-v0.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
messages = [{"role": "user", "content": "What is your favourite condiment?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
out = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(out[0]))

import OpenAI from "openai";
const client = new OpenAI({ baseURL: "http://localhost:8000/v1", apiKey: "local" });
const res = await client.chat.completions.create({
  model: "mistralai/Mistral-7B-Instruct-v0.2",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(res.choices[0].message.content);

More models

View all

Name	Size / Usage	Context	Input
qwen3.6-27b-mtp	3421 GB	421K	Text, Image

At a glance

License: Apache 2.0
Context length: 32K tokens
Parameters: 7B (dense)
Minimum hardware: ~6 GB VRAM (4-bit), ~15 GB (FP16)
Strengths: instruction following, English chat, reasoning, code

Overview

Mistral-7B-Instruct-v0.2 is a 7-billion-parameter instruction-tuned language model from Mistral AI, the Paris-based lab founded in 2023. It is a chat-oriented fine-tune of the Mistral-7B-v0.2 base model and sits in the original Mistral 7B family. Released in late 2023, it was one of the most-downloaded open models of its generation and has since been superseded by v0.3. Compared with v0.1, this version widens the context window from 8K to 32K tokens, sets rope-theta to 1e6, and removes sliding-window attention.

What it's good at

The model handles general English chat, instruction following, summarization, and light coding. It uses the [INST] ... [/INST] prompt format, available through the tokenizer's chat template. On standard benchmarks the underlying Mistral 7B base outperformed Llama 2 13B across most tasks despite being roughly half the size, which is why the instruct variant became a common baseline for fine-tuning and RAG projects. Its 32K window suits longer documents and multi-turn conversations. It is primarily an English model and was not trained for native function calling, which arrived in v0.3.

Running locally

At FP16 the weights need about 15 GB of VRAM, so a 16 GB or 24 GB GPU runs it without quantization. With 4-bit GGUF quantization through llama.cpp or Ollama, memory drops to roughly 6 GB, and it will run on CPU with enough system RAM at reduced speed. It is supported by transformers, vLLM, llama.cpp, Ollama, and TGI, and is hosted by providers such as Cloudflare Workers AI and Fireworks.

License

Mistral-7B-Instruct-v0.2 is licensed under Apache 2.0. That permits commercial use, modification, and redistribution with no royalty and no copyleft obligation. Note that the model ships without built-in moderation or safety guardrails, so production deployments need their own content filtering.

Desktop

macOS

(M1 or better)

Download

Windows

(x64)

Download

Linux

(x86_64)

Download

Frequently asked questions

Mistral-7B-Instruct-v0.2 is a 7-billion-parameter instruction-tuned large language model from Mistral AI. It is a fine-tuned version of the Mistral-7B-v0.2 base model, built for chat and instruction-following tasks. Compared to v0.1, it adds a 32K context window, sets rope-theta to 1e6, and drops sliding-window attention.

At full FP16 precision the model needs roughly 15 GB of VRAM, so a single 16 GB or 24 GB GPU handles it comfortably. With 4-bit quantization (GGUF via llama.cpp or Ollama), it runs on about 6 GB of VRAM and can even run on CPU with enough system RAM, at lower speed.

Yes. Mistral-7B-Instruct-v0.2 is released under the Apache 2.0 license, which permits free commercial use, modification, and redistribution without paying royalties. The weights are downloadable from Hugging Face, and the model can be self-hosted or accessed through providers like Ollama, Cloudflare Workers AI, and Fireworks.

Mistral-7B-Instruct-v0.2 supports a 32K-token context window. This is a fourfold increase over the 8K window in v0.1, and it was achieved by raising rope-theta to 1e6 and removing sliding-window attention, so the full window uses standard dense attention.

Mistral-7B-Instruct-v0.3 is the newer release that supersedes v0.2. The main additions in v0.3 are an extended vocabulary of 32,768 tokens and support for the v3 tokenizer and function calling. The v0.2 model keeps the same 7B size and 32K context but lacks the native tool-calling support and updated vocabulary of v0.3.