Skip to content
  1.  
  2. © 2023 – 2025 OpenRouter, Inc

    Qwen: Qwen3 VL 32B Instruct

    qwen/qwen3-vl-32b-instruct

    Created Oct 23, 2025262,144 context
    $0.35/M input tokens$1.10/M output tokens

    Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks.

    Providers for Qwen3 VL 32B Instruct

    OpenRouter routes requests to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.