How to run Mixtral 8X7B

- Team Vast

December 12, 2023-Industry

Mixtral is the latest open source model from Mistral AI. The model was release with no documentation, explanation or other comment from Mistral AI on Dec 8th via a single Tweet of the magnet link. Mistral AI has several open source LLM models that are popular including Mistral 7B.

Mixtral 8X7B is notable in that it is a mixture of experts (MoE) model with exceptional ability. This guide uses some hacky implementations to get it to run. Once the model is out for a few months, it will certainly gain more support with open source tools.

To run it on, use a 8X 4090/3090 or a 4X A6000/A40 instance. You will need 120GB of total GPU RAM on the instance for the model. We use the typical development Pytorch recommended template and a modified version of Illama to run inference. This is a base model with no instruction fine tuning, so proper prompt techniques are helpful.


  1. Select Template:

    • Choose the Devel template for Pytorch by clicking that link or selecting it in Templates.
  2. Rent a Server:

    • Select a 8X 4090/3090 or 4X A40/A6000
    • Add at least 120GB of disk space
    • Click the Rent button
    • Purchase credits if needed.
    • Refer to the quickstart guide for help.
  3. SSH into the Machine:

    • Obtain SSH information from the instance card by clicking on the >_ button
    • Connect to the instance via the SSH command. It will look like this but have the port and server IP info of your instance:
      ssh -p <yourport> root@<yourserverip> -L 8080:localhost:8080
  4. Download Mixtral 8X7B Model Weights:

    • Use a torrent file. This typically takes about 15 minutes, depending on server download speed.
    • To download the torrent file, run:
      apt install transmission-cli
      transmission-cli magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%
  5. Install Illama for Inference:

    • Clone the repository:
      git clone
    • Install dependencies:
      pip install fire sentencepiece
  6. Run Mixtral 8X7B!:

    • Load the model weights and run the prompts in If you use 8 GPUS, append "--num-gpus 8" at the end of the run command like in the example below. If you use 2 GPUS, remove that line.:
      cd llama-mistral
      python ../Downloads/mixtral-8x7b-32kseqlen/ ../Downloads/mixtral-8x7b-32kseqlen/tokenizer.model --num-gpus 8
      And there you go! You will get the output of the prompts that are currently listed
  7. Modify Prompts:

    • Install Nano:
      apt install nano
    • Edit the file:
    • Find and modify the prompts at the bottom.
    • Re-run the python file to get new responses.

Thank you for using to run the latest open source LLM, Mixtral 8X7B. Drop us a line and let us know what you think of the model.

