January 31, 2025-DeepseekDeepseek R1Vast.aiAIOpensource
DeepSeek, a Chinese AI firm, has drawn attention in recent weeks for its release of DeepSeek R1 – a large language model (LLM) that aims to stand shoulder to shoulder with more established players. Its rapid rise has shaken up the AI race, with comparisons to OpenAI's o1 in particular.
Our Ollama + WebUI guide uses DeepSeek r1:70b as an example. Follow it to try out DeepSeek on our GPU cloud.
DeepSeek's impact isn't limited to industry insiders. It's already gaining massive popularity among everyday users. The DeepSeek mobile app, which provides a chatbot interface for DeepSeek R1, shot to the top of the Apple App Store charts within days of its release on January 20, 2025.
But the biggest reason DeepSeek is making waves? The company has seemingly disrupted the prevailing wisdom that developing cutting-edge LLMs costs a ton of money. The DeepSeek R1 LLM was developed at a fraction of the cost of comparable models from other companies – reportedly less than $6 million as opposed to hundreds of millions or even billions of dollars.
Another key factor setting it apart: DeepSeek R1 is free and open source. Researchers and developers can freely experiment with, modify, and deploy the model without the restrictions of proprietary alternatives.
The meteoric rise of the R1 LLM threw the U.S. financial markets into turmoil, prompting investors to rethink the valuation of companies like NVIDIA and reevaluate the AI prospects of tech giants like Microsoft and Meta. A significant stock market sell-off occurred on January 27, 2025.
Beyond its financial impact, though, what exactly is DeepSeek and how did it develop a model that's turning heads? Let's back up and take a closer look at this new player on the AI scene.
Based in Hangzhou, China, DeepSeek is an AI development firm that focuses on creating open-source LLMs. While it operates as an independent lab, DeepSeek is backed by High-Flyer Capital Management, a China-based quantitative hedge fund specializing in AI-driven financial strategies.
From the start, DeepSeek built its own data center clusters to train its generative AI models. U.S. export restrictions on advanced chips had an impact on operations, however. DeepSeek was forced to train its more recent models using NVIDIA H800 chips, a less powerful alternative to the H100 chips available to U.S. companies.
DeepSeek's first model was released in November 2023, and since then the firm has developed several different variations. But DeepSeek only catapulted to global prominence in January 2025, when its R1 model was released – along with a web interface, mobile application, and API access.
The DeepSeek R1 model focuses on logical inference, mathematical reasoning, and real-time problem-solving. Compared to other LLMs, it uses a more efficient and sparse Mixture-of-Experts (MoE) architecture, where different parts of the model specialize in handling different tasks. For every input, only a few "expert" neural networks are selected to process each piece of data.
What's really sparked buzz, however, is DeepSeek R1's ability to match performance with OpenAI's latest ChatGPT models across various tasks. So R1 is a more efficient model, developed much more cheaply, with impressive performance. It sounds almost too good to be true...
How was it trained?
According to DeepSeek, the R1 model leveraged the following training innovations:
Not all of this is new. DeepSeek does present some algorithmic and architectural novelties – such as the sparse MoE design and the use of reinforcement learning to refine synthetic data selection and optimize reasoning pathways during training – but the company isn't necessarily the first to implement these techniques.
Still, it has combined them in a way that makes its R1 model notably efficient and competitive.
Unfortunately, there is a catch. DeepSeek R1 reportedly relied on synthetic data from OpenAI's GPT-4o for its training, pushing some of the computational burden onto the latter, much less efficient model.
So while R1 is touted as being highly efficient, those claims only apply to the model itself, not to the entire process used to train it. The R1 model's lean MoE architecture means it is indeed efficient when running inference, but its training pipeline requires external synthetic data generated by other models like GPT-4o – and that external data generation is computationally expensive.
This means that a significant part of the total AI development cost and compute burden is still being offloaded onto other, more complex models. It's a cost-saving move for DeepSeek, but it doesn't negate the need for massive specialized compute in the industry overall.
Time will tell how sustainable DeepSeek's approach is – and what consequences it may have for the company as scrutiny over training methods increases and competition in the AI space continues to grow.
...There may even be national security concerns, as the U.S. government eyes the potential for harmful foreign influence. But that's a topic for another day.
Regardless of how one feels about it, DeepSeek R1 has certainly made a bold entrance into the world. While its efficiency claims come with important caveats, the model remains a fascinating case study in how AI development is evolving.
Plus, its open-source license allows for commercial use, which is a pretty major draw for developers around the world.
DeepSeek R1 is a massive model that will not fit on a single 8xH200 instance unless you run a quantized version. The 4bit quantization will run on a 8xH100 or 4xH200 node. The sweet spot might be the 8bit quantized model. If you are interested in trying it out, reach out to us so we can talk to you about your use case and help get it setup.
As always, we'd be happy to assist you in any way we can! We're here to support your AI experiments, providing the compute power you need to explore, test, and innovate – every day.