Deploying Oobabooga for Running Large Language Models

- Team Vast

June 21, 2023-Vast

In the rapidly evolving world of machine learning, the ability to leverage powerful computing resources on-demand has become crucial. The need to run large language models such as LLaMA, GPT-J, Pythia, and Falcon has grown more pronounced, as these models are increasingly important in various AI applications. However, these models require significant computational power, typically in the form of high-performance graphics processing units (GPUs). Many of the models require 40+ GB of GPU RAM to load the models.

This is where Vast and Oobabooga come in.

Vast offers on-demand GPU rentals, providing users with the raw power needed to run these models. It's known for being the most cost-effective offering on the market, making it an ideal choice for those who are budget-conscious or just starting out with AI and machine learning.

Oobabooga is a web user interface (UI) designed specifically for running large language models. It's simple and intuitive to use, requiring no advanced technical skills or extensive prior knowledge of AI and machine learning.

In this post, we will guide you through the process of deploying Oobabooga on Vast, showing you just how easy it is to run large language models on a budget.

Getting Started with Vast

The first step is to sign up for an account on Vast. You will need to add credits before you can rent. Go to the billing page and choose a payment method. Larger language models will require more expensive GPUs, so be sure to add enough credits.

Selecting the Oobabooga template

Vast rents out Linux Docker instances on-demand. We have a pre-built Oobabooga template that you will need to load. Load the template by clicking on this link.

Add enough storage

Underneath the Edit image and config options there is a slider which will allocate storage for your instance. Large language models are typically 70+ GB, so make sure to move the slider to 80 GB so your instance will have enough storage to download the model you want to run.

Pick a GPU

Now comes the fun part! Vast is a marketplace of on-demand cloud GPUs. You can easily select a powerful GPU that supports your selected configuration. Many large language models require the absolute best GPU right now. Read about how much GPU RAM your model needs to run. The top of the line GPU is the A100 SMX4 80GB or A100 PCIE 80GB. To ensure your instance will have enough GPU RAM, use the GPU RAM slider in the interface.

Once you find a suitable GPU, click RENT.

Running Your Models

The instance will need to load the Docker image and complete setup. Once this is complete, a blue OPEN button will appear. Click on that to open the Oobabooga web UI interface.

Read the Oobabooga documentation on their project website here. You will need to download an LLM to the instance which can take some time depending on the internet connection speed of the machine you picked and the place you are downloading the model from. HuggingFace is typically pretty fast.

Once you have the model loaded, you can start interacting with it. You'll be able to monitor the progress in real-time through the Oobabooga interface, and once the computations are complete, the results will be displayed directly on the screen. The whole process is seamless, and thanks to Vast and Oobabooga, you can run large language models without breaking the bank or needing advanced technical knowledge.


With the power of Vast's affordable GPU rentals and the simplicity of Oobabooga's user interface, running large language models has never been easier or more accessible. Whether you're a student, a researcher, or an AI enthusiast, this combination of tools provides an excellent way to delve into the world of AI and machine learning without the usual barriers of cost and complexity.

Share on