Overview
OpenELM-1_1B-Instruct is the 1.1-billion-parameter, instruction-tuned member of Apple's OpenELM family, released in April 2024. OpenELM stands for Open Efficient Language Models, and the lineup spans 270M, 450M, 1.1B, and 3B sizes in both pretrained and instruction-tuned variants. The defining idea is layer-wise scaling: instead of giving every transformer block the same width, OpenELM uses narrower early layers and wider later ones to spend parameters where they help accuracy most. Apple trained the models with its CoreNet library and published the full pipeline, from data preparation through evaluation.
What it's good at
For its size the model posts solid results on standard benchmarks. On the LLM360 evaluation suite the 1.1B Instruct version averages 49.94, with 71.83 on HellaSwag and 41.55 on ARC-Challenge, ahead of the base OpenELM-1_1B. Apple reports up to 2.36% higher accuracy than OLMo-1.2B while using roughly half the pretraining tokens. Training drew on RefinedWeb, deduplicated PILE, a subset of RedPajama, and Dolma v1.6, totaling about 1.8 trillion tokens of mostly English text, so the model is best suited to English prompts and short instruction-following tasks.
Running locally
The BF16 weights are about 2.2 GB, so the model runs on a GPU with roughly 4 GB of VRAM, and 4-bit quantization brings that under 1 GB. Load it through Hugging Face Transformers with trust_remote_code=True, since OpenELM ships custom modeling code. It relies on the Llama-2 tokenizer and needs add_bos_token=True; Apple's generate_openelm.py script handles this and supports speculative decoding for faster inference. The 2,048-token context window limits it to short inputs.
License
The weights are published under the Apple Sample Code License (apple-amlr), which is more restrictive than permissive licenses such as Apache 2.0 or MIT. Apple releases the models without safety guarantees and recommends users run their own testing and filtering. Read the license terms before using the model commercially.
