Liquid AI's LFM2 Just Dropped — Here's How to Run It on Vast.ai
Liquid AI just released LFM2-24B-A2B, a 24 billion parameter foundation model that only activates 2.3 billion parameters per token. That ratio — less than 10% of total parameters active at inference time — makes it one of the most aggressively sparse mixture-of-experts models available today.
What makes LFM2 different from other MoE models isn't just the sparsity. The architecture is a hybrid design that replaces most of the attention layers with gated short convolution blocks. Out of 40 total layers, only 10 use grouped query attention — the other 30 are convolutional. This is a fundamentally different approach from transformer-only models, and it's what allows Liquid AI to push the active parameter count so low while maintaining competitive quality across benchmarks like MMLU-Pro, GPQA Diamond, IFEval, and MATH-500.
Liquid AI, an MIT spinoff, has scaled this architecture from 350M to 24B parameters following log-linear scaling laws. LFM2 is the first model in the family large enough to be broadly useful — and with day-one support for vLLM, SGLang, and llama.cpp, it's ready to deploy today.
Note: LFM2-24B-A2B is an early checkpoint trained on 17 trillion tokens without reinforcement learning or extensive post-training. A post-trained version (LFM2.5-24B-A2B) is coming.
Model Overview
| Property | Value |
|---|---|
| Developer | Liquid AI |
| Model | LFM2-24B-A2B |
| Architecture | Hybrid MoE: 30 convolutional + 10 GQA attention layers, 64 experts, top-4 routing |
| Total Parameters | 24B |
| Active Parameters | 2.3B per token |
| Context Length | 32,768 tokens |
| Training Data | 17T tokens (mixed BF16/FP8) |
| License | LFM Open License v1.0 |
| HuggingFace | LiquidAI/LFM2-24B-A2B |
Deploy LFM2-24B-A2B on Vast.ai with vLLM
At BF16, the model weights are ~48 GB — it fits on a single A100-80GB with room for a 32K context KV cache. No quantization or multi-GPU setup needed.
Prerequisites
- A Vast.ai account with credits
- The Vast CLI installed and configured:
pip install --upgrade vastai # make sure you're on v0.5.0+
vastai set api-key <YOUR_API_KEY>
Find a GPU
Search for a single GPU with at least 80 GB VRAM and direct port access:
vastai search offers "gpu_ram>=80 num_gpus=1 direct_port_count>=1 cuda_vers>=12.4 cuda_vers<13" -o dph
Important: Use a machine with CUDA 12.x (not 13.x). The vLLM nightly image is built against CUDA 12 and will fail with a driver mismatch on CUDA 13 machines.
Launch the Instance
Pick an instance ID from the search results and deploy with vLLM. LFM2 requires a newer version of the transformers library than what ships in the vLLM image, so we upgrade it at startup:
vastai create instance <INSTANCE_ID> \
--image vllm/vllm-openai:nightly \
--env "-p 8000:8000" \
--disk 100 \
--onstart-cmd "pip install 'transformers>=5.0.0' && \
vllm serve LiquidAI/LFM2-24B-A2B \
--host 0.0.0.0 --port 8000 \
--max-model-len 32768 --dtype auto \
--trust-remote-code"
This upgrades the tokenizer library, pulls the model from HuggingFace, and starts an OpenAI-compatible API server on port 8000. Initial startup takes several minutes while the model downloads (~48 GB for the BF16 weights).
Once the instance is running, find your IP and port in the Vast console — click the IP button on your instance to see the port mapping.
Why
nightlyandtransformers>=5? LFM2's tokenizer uses theTokenizersBackendclass introduced intransformers 5.0. The stable vLLM image pinstransformers<5, so the tokenizer fails to load. Usingnightlywith a transformers upgrade resolves this. This was tested with vLLM nightly buildv0.16.0rc2. Once a stable vLLM release ships withtransformers>=5.0.0support, you can switch tovllm/vllm-openai:latestand drop thepip installstep.
Call the API
curl http://<VAST_IP>:<PORT>/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "LiquidAI/LFM2-24B-A2B",
"messages": [{"role": "user", "content": "What makes mixture-of-experts models efficient?"}],
"max_tokens": 300
}'
Cleanup
When you're done, destroy the instance to stop charges:
vastai destroy instance <INSTANCE_ID>



