Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base
text, vision
2.5
8xH200
Loading...
Moonshot AI
Kimi K2
1000B
256000 tokens
MIT (Modified)
Kimi K2.5 is an open-source, native multimodal agentic model developed by Moonshot AI. Built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base, this model seamlessly integrates vision and language understanding with advanced agentic capabilities.
Kimi K2.5 represents a significant advancement in multimodal AI, combining a trillion-parameter Mixture-of-Experts (MoE) architecture with native vision capabilities. The model activates 32 billion parameters per inference while maintaining efficiency through its expert-based design with 384 total experts and 8 selected per token.
The architecture features 61 layers, Multi-Latent Attention (MLA) for efficient attention computation, and a 400M parameter MoonViT vision encoder. This design enables the model to process text, images, and video inputs within a unified framework.
Unlike models that retrofit vision capabilities, Kimi K2.5 was pre-trained on vision-language tokens from the ground up. This native multimodal approach enables superior visual knowledge extraction and cross-modal reasoning, allowing the model to understand and reason about visual content with the same fluency as text.
Kimi K2.5 can generate code directly from visual specifications, transforming UI designs and video workflows into functional implementations. The model autonomously orchestrates tools for visual data processing, bridging the gap between design and development.
The model introduces a novel agent swarm capability, transitioning from single-agent execution to self-directed, coordinated multi-agent workflows. Kimi K2.5 can decompose complex tasks into parallel sub-tasks and dynamically instantiate domain-specific agents to handle them, enabling sophisticated problem-solving at scale.
Kimi K2.5 supports two distinct operational modes:
Thinking Mode (Default): Provides detailed reasoning content alongside responses, ideal for complex analytical tasks. Uses temperature 1.0 and top_p 0.95 for optimal performance.
Instant Mode: Delivers faster responses with disabled thinking, suitable for straightforward queries. Uses temperature 0.6 for more focused outputs.
Kimi K2.5 demonstrates strong performance across diverse evaluation benchmarks:
Kimi K2.5 excels in a variety of applications:
Choose a model and click 'Deploy' above to find available GPUs recommended for this model.
Rent your dedicated instance preconfigured with the model you've selected.
Start sending requests to your model instance and getting responses right now.

© 2026 Vast.ai. All rights reserved.