Lightweight agentic, reasoning and coding model
text
V4.7-Flash
1xH200
Loading...
Z.ai
GLM
31B
128000 tokens
MIT
GLM 4.7-Flash is a 30B-A3B Mixture of Experts (MoE) model developed by Z.ai, designed to deliver strong agentic, reasoning, and coding performance in a compact and efficient architecture. It is positioned as one of the strongest models in the 30B parameter class.
GLM 4.7-Flash uses a Mixture of Experts architecture with 30B total parameters and 3B active parameters per token. This sparse activation pattern enables the model to maintain high performance while requiring significantly fewer computational resources during inference compared to dense models of similar capability. The model supports thinking mode with preserved reasoning across conversation turns, enabling coherent multi-step agentic task completion.
The model was trained with emphasis on agentic task performance, coding, and reasoning. Evaluation uses temperature 1.0 with top-p 0.95 for general tasks, with specialized settings for coding and agentic benchmarks including temperature 0.7 for SWE-bench evaluations.
Deploy GLM 4.7-Flash on Vast.ai for efficient access to strong agentic and reasoning capabilities with flexible GPU infrastructure.
Choose a model and click 'Deploy' above to find available GPUs recommended for this model.
Rent your dedicated instance preconfigured with the model you've selected.
Start sending requests to your model instance and getting responses right now.

© 2026 Vast.ai. All rights reserved.