dramabox

Updated
25.05.2026
0.4B
Any-to-Any
QwQ

A compact 1B vision-language model from OpenBMB that runs image-text-to-text tasks fully on-device, tuned for fast inference on consumer laptops.

atomic run Qwen/minicpm-v-4.6
curl https://api.atomic.chat/v1/run \
  -H "Authorization: Bearer $ATOMIC_KEY" \
  -d '{"model": "Qwen/minicpm-v-4.6", "prompt": "Hello"}'
from atomic import load_model

model = load_model("Qwen/minicpm-v-4.6")
output = model.run("Your prompt here")
print(output)
import { loadModel } from "atomic";

const model = await loadModel("Qwen/minicpm-v-4.6");
const output = await model.run("Your prompt here");
console.log(output);

More models

NameSize / UsageContextInput
supertonic-3
321 GB421KText, Image

At a glance

  • License: Apache 2.0 — free for commercial use
  • Context length: 128K tokens
  • Languages: 29 languages, English-optimized
  • Minimum hardware: 16 GB RAM, runs on Apple Silicon
  • Strengths: reasoning, coding and multilingual document understanding

Overview

This model is a multimodal large language model that unifies image, audio and text understanding to support question answering, summarization and document intelligence workflows. It is designed to run entirely on local hardware, so no data ever leaves the device and inference works fully offline.

It extends the base family with integrated speech comprehension and optical character recognition, enabling end-to-end processing of rich content such as meeting recordings, training videos and complex business documents.

Capabilities

The model performs well across a broad range of everyday tasks. Typical use cases include:

  • Document intelligence — extracting structure from contracts, reports and scanned PDFs.
  • Media analysis — captioning, search and summarization of long-form video.
  • Assistant workflows — grounded answers, drafting and step-by-step reasoning.

For best results, keep prompts specific and provide context up front — the model rewards clear, well scoped instructions over open-ended ones.

Quick start

Install the runtime and pull the weights with a single command. Once cached, the model loads in seconds and the first token streams almost immediately:

atomic pull <model>
atomic run <model> --prompt "Summarize this report"

You can also call it programmatically — pass any prompt to model.run() and stream the response token by token.

License

The weights are released under a permissive open license and are available for commercial use. Full terms are described in the model license agreement.

Frequently asked questions

Yes. Once the weights are downloaded the model runs entirely on your device — no internet connection, API key or account is required, and no data leaves the machine.

The model runs on consumer hardware — a recent laptop or desktop with enough memory is enough. Quantized builds lower the requirement further, and a discrete GPU speeds up generation but is optional.