How to run Mixtral 8X7B

- Team Vast

December 12, 2023-Industry

How to Run Mixtral 8X7B

Mixtral is the latest open source model from Mistral AI. The model was release with no documentation, explanation or other comment from Mistral AI on Dec 8th via a single Tweet of the magnet link. Mistral AI has several open source LLM models that are popular including Mistral 7B.

Mixtral 8X7B is notable in that it is a mixture of experts (MoE) model with exceptional ability. This guide uses some hacky implementations to get it to run. Once the model is out for a few months, it will certainly gain more support with open source tools.

To run it on, use a 8X 4090/3090 or a 4X A6000/A40 instance. You will need 120GB of total GPU RAM on the instance for the model. We use the typical development Pytorch recommended template and a modified version of Illama to run inference. This is a base model with no instruction fine tuning, so proper prompt techniques are helpful.


  1. Select Template:

    • Choose the Devel template for Pytorch by clicking that link or selecting it in Templates.
  2. Rent a Server:

    • Select a 8X 4090/3090 or 4X A40/A6000
    • Add at least 120GB of disk space
    • Click the Rent button
    • Purchase credits if needed.
    • Refer to the quickstart guide for help.
  3. SSH into the Machine:

    • Obtain SSH information from the instance card by clicking on the >_ button
    • Connect to the instance via the SSH command. It will look like this but have the port and server IP info of your instance:
      ssh -p <yourport> root@<yourserverip> -L 8080:localhost:8080
  4. Download Mixtral 8X7B Model Weights:

    • Use a torrent file. This typically takes about 15 minutes, depending on server download speed.
    • To download the torrent file, run:
      apt install transmission-cli
      transmission-cli magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%
  5. Install Illama for Inference:

    • Clone the repository:
      git clone
    • Install dependencies:
      pip install fire sentencepiece
  6. Run Mixtral 8X7B!:

    • Load the model weights and run the prompts in If you use 8 GPUS, append "--num-gpus 8" at the end of the run command like in the example below. If you use 2 GPUS, remove that line.:
      cd llama-mistral
      python ../Downloads/mixtral-8x7b-32kseqlen/ ../Downloads/mixtral-8x7b-32kseqlen/tokenizer.model --num-gpus 8
      And there you go! You will get the output of the prompts that are currently listed
  7. Modify Prompts:

    • Install Nano:
      apt install nano
    • Edit the file:
    • Find and modify the prompts at the bottom.
    • Re-run the python file to get new responses.

Thank you for using to run the latest open source LLM, Mixtral 8X7B. Drop us a line and let us know what you think of the model.

Share on
  • Contact
  • Get in Touch