The world of Open Source AI has gotten many updates in the last month or so. There are now many new models with great quality:speed ratios and models that challenge the frontier of closed source models. This makes it even easier to build applications and automate workflows with open source models, which you can deploy on Vast.ai. Meta, Mistral, and Nvidia have made the biggest waves with their recent releases.
On September 25th, Meta introduced several exciting updates to their AI model lineup, focusing on lightweight models and vision capabilities.
First, Meta released impressive 1B and 3B parameter models that are best in class for their size. These models allow for near-instantaneous interactions on edge devices and can be used for processing large amounts of tokens with simple processing tasks.
The development process for these models involved pruning from Llama 3.1-8B, followed by knowledge distillation using the 8B and 70B models as teachers. The instruction-tuned versions underwent additional training with Supervised Fine-tuning (SFT), Direct Policy Optimization (DPO), and Rejection Sampling, incorporating synthetic data from Llama 3.1-405B.
These small models are easy to download and get playing with locally, but can be leveraged for large scale data processing tasks as they are much less expensive to run and still have great capabilities.
Meta's new vision models come in 11B and 90B parameter sizes. Interestingly, these are built as adapters for the 8B and 70B text models, respectively, maintaining the same performance on text tasks while adding visual capabilities. This approach saved compute during training and allows for flexible deployment options.
The vision models were created by pretraining adapters and image encoders with noisy image and text data, followed by post-training similar to the text-only models. This has two advantages: For the same compute budget, Meta could train the models on more data, and the models are deployable on the same GPU's at the same time as their text-only models and develoeprs don't have to worry about regressions in their text-only tasks.
Meta also released new Llama Guard models for safety and moderation tasks, based on the 11B vision model and the 1B and 3B text models. These models are designed to help with safety and moderation tasks at the edge.
Mistral has been on a similar research path as Meta, releasing models that are focused on similar efforts and qualities, small efficient models for the edge, and multimodal models as drop-in replacements for existing text-only models.
Released on September 17th, Pixtral is a 12B parameter multimodal model. Similar to Meta's Llama 3.2 Vision approach, Pixtral was created as a drop-in replacement for Mistral-Nemo-11B, a text-only model. It features a 400M image encoder as an adapter, maintaining deployment flexibility. It performs well on a variety of multi-modal tasks.
Pixtral is available under the Apache 2.0 license and can be downloaded for the base model here and for the instruct model behind a gated release here.
Also released on September 17th, 2024, Mistral-Small v24.09 is a 22B parameter text model that bridges the gap between Mistral Nemo-11B and Mistral Large v2. This model offers substantial improvements in reasoning, code generation, and function calling compared to its predecessor. It's available under the Mistral Research License, and can be downloaded here behind the research gate.
On September 25th, 2024, Mistral introduced two new models: Ministral 3B and Ministral 8B.
Ministral 3B is a compact powerhouse, offering capabilities roughly equivalent to the original Mistral 7B but in just 3B parameters. Its instruct-tuned version outperforms Mistral-7B instruct models and shows superior performance to Llama 3.2-3B. However, it is not availabel on HuggingFace, and requires an Enterprise License from Mistral to access.
Ministral 8B represents a significant upgrade over the Mistral 7B series, offering a substantial leap in capabilities. According to benchmarks, it outperforms Llama 3.1-8B across of series of tasks. You can download the model [here] (https://huggingface.co/mistralai/Ministral-8B-Instruct-2410) behind the gated access and run it on Vast to evaluate it for your workloads.
Nvidia has also focused on 70B models to compete with frontier models like those from OpenAI and Anthropic.
Nvidia has introduced two notable models based on the Llama 3.1 architecture:
Llama 3.1-Nemotron-70B-Reward: This model was trained via Reinforcement Learning from Human Feedback (RLHF) to serve as a reward model. It currently holds the top position on the RewardBench leaderboard, showcasing its effectiveness in evaluating AI responses. It was also used to train the Instruct model, showing how well this model can be used to train other models synthetically.
Llama 3.1-Nemotron-70B-Instruct: Leveraging the reward model mentioned above, this instruct-tuned version has achieved impressive results. It currently leads the ArenaHard Leaaderboard, demonstrating its superior performance in complex instruction-following tasks.
The Llama 3.1-Nemotron-70B-Reward model was trained on the HelpSteer-v2 dataset, which likely contributed to its strong performance as a reward model.
What's particularly noteworthy is that the Llama 3.1-Nemotron-70B-Instruct model has shown outstanding results in automatic benchmarks, outperforming even some of the most advanced models in the field, including Claude 3.5 Sonnet and GPT-4.
These developments from Nvidia show that methods to move beyond the GPT-4 wall of capabilities are not just being developed, but open sourced and available to developers to use wherever they use compute.
It's a great time to be a developer using AI. In the past few months, there have been major capability improvements for LLM's and Multimodal models, enabling tasks that were not possible before with open source models. And the release of many smaller language models that are still as capable as previous generation models means that the cost curve is coming down for deploying to production.
At Vast, we're very excited to see these advancements, and you can look out for more updates and building on top of these types of models for specific workflows in the coming months.