Model Library/Kimi K2.5

Moonshot AI logoKimi K2.5

LLM
Reasoning

Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base

On-Demand Dedicated 8xH200

Details

Modalities

text, vision

Version

2.5

Recommended Hardware

8xH200

Estimated Price

Loading...

Provider

Moonshot AI

Family

Kimi K2

Parameters

1000B

Context

256000 tokens

License

MIT (Modified)

Kimi K2.5

Kimi K2.5 is an open-source, native multimodal agentic model developed by Moonshot AI. Built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base, this model seamlessly integrates vision and language understanding with advanced agentic capabilities.

Model Overview

Kimi K2.5 represents a significant advancement in multimodal AI, combining a trillion-parameter Mixture-of-Experts (MoE) architecture with native vision capabilities. The model activates 32 billion parameters per inference while maintaining efficiency through its expert-based design with 384 total experts and 8 selected per token.

The architecture features 61 layers, Multi-Latent Attention (MLA) for efficient attention computation, and a 400M parameter MoonViT vision encoder. This design enables the model to process text, images, and video inputs within a unified framework.

Key Capabilities

Native Multimodality

Unlike models that retrofit vision capabilities, Kimi K2.5 was pre-trained on vision-language tokens from the ground up. This native multimodal approach enables superior visual knowledge extraction and cross-modal reasoning, allowing the model to understand and reason about visual content with the same fluency as text.

Coding with Vision

Kimi K2.5 can generate code directly from visual specifications, transforming UI designs and video workflows into functional implementations. The model autonomously orchestrates tools for visual data processing, bridging the gap between design and development.

Agent Swarm

The model introduces a novel agent swarm capability, transitioning from single-agent execution to self-directed, coordinated multi-agent workflows. Kimi K2.5 can decompose complex tasks into parallel sub-tasks and dynamically instantiate domain-specific agents to handle them, enabling sophisticated problem-solving at scale.

Operational Modes

Kimi K2.5 supports two distinct operational modes:

Thinking Mode (Default): Provides detailed reasoning content alongside responses, ideal for complex analytical tasks. Uses temperature 1.0 and top_p 0.95 for optimal performance.

Instant Mode: Delivers faster responses with disabled thinking, suitable for straightforward queries. Uses temperature 0.6 for more focused outputs.

Benchmark Performance

Kimi K2.5 demonstrates strong performance across diverse evaluation benchmarks:

Reasoning and Knowledge

  • AIME 2025: 96.1
  • GPQA-Diamond: 87.6
  • MMLU-Pro: 87.1
  • HLE-Full (with tools): 50.2

Vision and Multimodal

  • MMMU-Pro: 78.5
  • VideoMMMU: 86.6
  • OCRBench: 92.3
  • OmniDocBench: 88.8
  • InfoVQA: 92.6

Coding

  • SWE-Bench Verified: 76.8
  • SWE-Bench Pro: 50.7
  • LiveCodeBench: 85.0
  • Terminal Bench 2.0: 50.8

Agentic Search

  • BrowseComp (Agent Swarm): 78.4
  • WideSearch (Agent Swarm): 79.0
  • DeepSearchQA: 77.1

Long Context

  • Longbench v2: 61.0
  • AA-LCR: 70.0

Use Cases

Kimi K2.5 excels in a variety of applications:

  • Multimodal Analysis: Understanding and reasoning about images, videos, and text in unified workflows
  • Complex Reasoning: Solving mathematical, logical, and analytical problems with detailed explanations
  • Software Engineering: Generating, reviewing, and debugging code across multiple languages
  • Visual Coding: Converting UI/UX designs directly into functional code
  • Document Understanding: Extracting information from documents using advanced OCR capabilities
  • Multi-step Problem Solving: Orchestrating tools and agents to tackle complex, multi-faceted tasks
  • Information Retrieval: Conducting thorough research using coordinated agent swarm capabilities

Quick Start Guide

Choose a model and click 'Deploy' above to find available GPUs recommended for this model.

Rent your dedicated instance preconfigured with the model you've selected.

Start sending requests to your model instance and getting responses right now.

Vast AI

© 2026 Vast.ai. All rights reserved.

Vast.ai