December 12, 2023-Industry
Mixtral is the latest open source model from Mistral AI. The model was release with no documentation, explanation or other comment from Mistral AI on Dec 8th via a single Tweet of the magnet link. Mistral AI has several open source LLM models that are popular including Mistral 7B.
Mixtral 8X7B is notable in that it is a mixture of experts (MoE) model with exceptional ability. This guide uses some hacky implementations to get it to run. Once the model is out for a few months, it will certainly gain more support with open source tools.
To run it on Vast.ai, use a 8X 4090/3090 or a 4X A6000/A40 instance. You will need 120GB of total GPU RAM on the instance for the model. We use the typical development Pytorch recommended template and a modified version of Illama to run inference. This is a base model with no instruction fine tuning, so proper prompt techniques are helpful.
Rent a Server:
SSH into the Machine:
ssh -p <yourport> root@<yourserverip> -L 8080:localhost:8080
Download Mixtral 8X7B Model Weights:
apt install transmission-cli
Install Illama for Inference:
git clone https://github.com/dzhulgakov/llama-mistral
pip install fire sentencepiece
Run Mixtral 8X7B!:
example_text_completion.py. If you use 8 GPUS, append "--num-gpus 8" at the end of the run command like in the example below. If you use 2 GPUS, remove that line.:
And there you go! You will get the output of the prompts that are currently listed
python example_text_completion.py ../Downloads/mixtral-8x7b-32kseqlen/ ../Downloads/mixtral-8x7b-32kseqlen/tokenizer.model --num-gpus 8
apt install nano
Thank you for using Vast.ai to run the latest open source LLM, Mixtral 8X7B. Drop us a line and let us know what you think of the model.