How to run Mixtral 8X7B

December 12, 2023

3 Min Read

By Team Vast

How to Run Mixtral 8X7B

Mixtral is the latest open source model from Mistral AI. The model was release with no documentation, explanation or other comment from Mistral AI on Dec 8th via a single Tweet of the magnet link. Mistral AI has several open source LLM models that are popular including Mistral 7B.

Mixtral 8X7B is notable in that it is a mixture of experts (MoE) model with exceptional ability. This guide uses some hacky implementations to get it to run. Once the model is out for a few months, it will certainly gain more support with open source tools.

To run it on Vast.ai, use a 8X 4090/3090 or a 4X A6000/A40 instance. You will need 120GB of total GPU RAM on the instance for the model. We use the typical development Pytorch recommended template and a modified version of Illama to run inference. This is a base model with no instruction fine tuning, so proper prompt techniques are helpful.

Steps:

Select Template:
- Choose the Devel template for Pytorch by clicking that link or selecting it in Templates.
Rent a Server:
- Select a 8X 4090/3090 or 4X A40/A6000
- Add at least 120GB of disk space
- Click the Rent button
- Purchase credits if needed.
- Refer to the quickstart guide for help.
SSH into the Machine:
- Obtain SSH information from the instance card by clicking on the >_ button
- Connect to the instance via the SSH command. It will look like this but have the port and server IP info of your instance:
```
ssh -p <yourport> root@<yourserverip> -L 8080:localhost:8080
```

Download Mixtral 8X7B Model Weights:

Use a torrent file. This typically takes about 15 minutes, depending on server download speed.

To download the torrent file, run:

apt install transmission-cli
transmission-cli magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%http://2Fopentracker.i2p.rocks%3A6969%2Fannounce&tr=http%3A%2F%http://2Ftracker.openbittorrent.com%3A80%2Fannounce

Install Illama for Inference:

Clone the repository:

git clone https://github.com/dzhulgakov/llama-mistral

Install dependencies:
```
pip install fire sentencepiece
```

Run Mixtral 8X7B!:
- Load the model weights and run the prompts in example_text_completion.py. If you use 8 GPUS, append "--num-gpus 8" at the end of the run command like in the example below. If you use 2 GPUS, remove that line.:
```
cd llama-mistral
python example_text_completion.py ../Downloads/mixtral-8x7b-32kseqlen/ ../Downloads/mixtral-8x7b-32kseqlen/tokenizer.model --num-gpus 8
```
  And there you go! You will get the output of the prompts that are currently listed
Modify Prompts:
- Install Nano:
```
apt install nano
```
- Edit the file:
```
nano example_text_completion.py
```
- Find and modify the prompts at the bottom.
- Re-run the python file to get new responses.

Thank you for using Vast.ai to run the latest open source LLM, Mixtral 8X7B. Drop us a line and let us know what you think of the model.

How to run Mixtral 8X7B

How to Run Mixtral 8X7B

Steps:

NVIDIA H200 vs. B200: Comparing Datacenter-Grade Accelerators

NVIDIA RTX 4090 vs. A100: Two Powerhouses, Two Purposes

NVIDIA H100 vs. H200: Two Hopper-based Heavyweights

Subscribe for our product updates.