Open-Source LLMs You Can Train and Deploy on Vast.ai Right Now

July 20, 2025

3 Min Read

By Team Vast

1. Oobabooga (LLM WebUI)

Oobabooga provides a user-friendly web interface for interacting with open-source LLMs. Its UI resembles the original ChatGPT style – ideal for those who prefer a graphical interface over command-line operations.

With support for multiple text generation backends in one UI/API, you can switch between different models easily without restarting and maintain fine control over settings. Models like Falcon, Llama, and Vicuna can be loaded and explored with just a few clicks.

Our Oobabooga template facilitates quick deployment, making it suitable for both beginners and experienced users.

2. HuggingFace TGI with Llama 3

Ideal for serving Llama 3, this template is optimized for high-performance text generation tasks using HuggingFace's Text Generation Inference (TGI) server. It supports other popular open-source LLMs from HuggingFace, including Falcon, StarCoder, BLOOM, and GPT-NeoX, and it's particularly beneficial for applications requiring low-latency responses and scalability, such as chatbots or content generation tools.

For this template, you'll need your own HuggingFace access token – and you'll also need to apply for permission to use Llama 3 on HuggingFace, since the model is hosted in a gated repository.

3. Open WebUI (Ollama)

Open WebUI is an extensible, user-friendly platform for self-hosted AI deployments. It operates entirely offline and supports a range of open-source LLM backends like Llama 4, Kimi K2, DeepSeek R1, and OpenAI-compatible APIs.

With built-in support for retrieval-augmented generation (RAG), Open WebUI seamlessly integrates document interactions and web search into the chat experience. You can even incorporate image generation capabilities as well as engage with multiple LLMs simultaneously in parallel.

Our Ollama + WebUI template will automatically set up Open WebUI as a web-based interface and expose a port for the Ollama API, making it easy to run and interact with LLMs directly from your instance.

4. vLLM

Optimized for serving open-source LLMs with high-throughput inference, vLLM uses a novel architecture that drastically reduces memory overhead, making it easier to serve large models like Llama, Mistral, and Falcon efficiently. Notably, it supports continuous batching for faster and more efficient multi-user inference at scale.

vLLM provides an OpenAI-compatible API server for easy integration into existing workflows, and it's particularly well suited for developers building commercial or research applications that require fast and stable model responses. The vLLM framework seamlessly supports most open-source models on HuggingFace, including the ones named above as well as Mixtral, DeepSeek, LLaVA, BLOOM, GPT-NeoX, Qwen, and more.

Our vLLM template contains everything needed for you to get started – all you have to do is specify the model you want to serve and the corresponding vLLM configuration.

From Setup to Serving in Minutes

With Vast.ai's market-based cloud GPU rental platform, you can avoid the usual infrastructure and budget roadblocks as you spin up high-performance instances tailored to your workloads.

Our library of ready-to-use templates – covering everything from web-based interfaces to optimized inference engines – make it easy to train and deploy LLMs at scale. We take care of the heavy lifting, so you can spend less time configuring and more time building.

Why wait? Get started with open-source LLMs on Vast.ai today!

Open-Source LLMs You Can Train and Deploy on Vast.ai Right Now

1. Oobabooga (LLM WebUI)

2. HuggingFace TGI with Llama 3

3. Open WebUI (Ollama)

4. vLLM

From Setup to Serving in Minutes

New Guide: Deploy MiniMax-M2 on Vast.ai

Meta Launches Llama 3.1: A New Era in Open-Source AI

Which NVIDIA RTX 6000 Is Right for You?

Subscribe for our product updates.