Building Your First MCP Server on Vast.ai

September 10, 2025

10 Min Read

By Team Vast

Tags

Remember when ChatGPT couldn't tell you the weather or gather top headlines? Are you still frustrated asking Gemini to create a JIRA ticket from a bug report email? Perhaps you're tired of Cursor hallucinating database column names because it's walled off from your live schema, so you're constantly copy-pasting the same context across your Claude and Cursor apps. Despite their eloquent email-writing powers, out-of-the-box Large Language Models (LLMs) are powerful predictors, not contextually-aware assistants. They operate in silos, forcing you to start with fresh context -- or none at all -- for every new task.

This constant context-juggling isn't just annoying, it's fragile. When a better model is released or your data governance rules change, do you have to re-implement your entire workflow? What if you want to easily swap out Gemini's new video generation model for one from Stable Diffusion?

With Model Context Protocol (MCP), an open-source framework released by Anthropic for building truly helpful, environment-agnostic AI assistants, you can create a persistent, shareable context layer across models and tools. And with Vast.ai's secure and affordable cloud GPU instances, implementing your own MCP has never been easier.

Model Context Protocol Architecture

MCP architecture is built on three key components:

Clients: The user-facing applications (like Cursor, a custom chatbot, or Claude) that coordinate between the user, the AI model, and the MCP server.
Servers: The adapter between your tools and AI models, serving tools, resources, and prompts to your LLM provider and other services.
Service Providers: The software tools and data sources (like Slack, Notion, GitHub, or your company's internal database) that have existing APIs.

With MCP, any developer can build a server to connect an API to any MCP-compatible client. You no longer have to wait for official integrations from LLM providers.

An MCP server provides three categories of capabilities to the model – tools, resources, and prompts. In the following guide, you'll focus on creating MCP tools and learn to create useful actions to spawn from your clients via a remote MCP server on Vast.ai's platform.

Remote MCP Servers

Initially, MCP architecture only supported locally running servers using stdio transport. Since summer 2025, it supports remote access via streamable-http, enabling scalable, AI-enabled workflows that integrate LLMs across your organization. Now teams can more easily integrate data services and build custom workflows within their workspaces.

Vast.ai's affordable GPU instance rentals provide flexible and affordable hosting for as many MCP servers as you want to create – scaling up your team's AI workflows quickly and securely.

Tutorial: Build a Stock Sentiment Analysis MCP

Today, you'll build an MCP server for stock sentiment analysis correlation hosted on Vast.ai's platform. This setup provides a foundation for team-based workflows. For example, a financial analyst company may use this MCP tool to probe further into market conditions all within the Claude interface, or spur discussions by connecting this output to a shared Slack channel, providing up to date and collective information to multiple teams. To begin, let's review what you'll need for this process.

Vast.ai's GPU rentals offer space for a vast variety of MCP servers, easily accommodating 70B large language models or image diffusion models on the most up-to-date graphics card offerings. Their platform creates opportunities for creative, custom server integrations like creating a shared, product-specific image prompt for your product teams or pulling in proprietary climate energy data to your coding and marketing workflows.

By the end of this tutorial, you will understand how to build a similar bespoke MCP server for your organization's needs, and have a running blueprint for creating your collection of remote MCP servers on Vast.ai. Let's get started.

Prerequisites

A Vast.ai account with an API key
A Hugging Face account with Mistral7B access token
Python 3.10+ environment

Rent and Configure Your Vast.ai GPU Instance
Start by selecting a GPU Instance from the Vast.ai console. For this project, you'll use a Mistral 7B LLM to summarize and analyze market sentiments, so we recommend using an RTX 4090 GPU with 24GB of VRAM. Vast.ai offers a variety of powerful graphics cards, including A100 and 80GB VRAM for larger LLM and diffusion tasks.
Begin by selecting a PyTorch (Vast) template. Modify this template to launch as an interactive shell server without a Jupyter notebook instance and with 24 GB of disk space. For more information on selecting templates, please refer to the template guide.
![][image3]Now select an RTX 4090 offering at your required price point and select "Rent". Navigate to the "Instances" section in the sidebar. When the server is up and running, you should see "Running" next to "Status". Now you're ready to create your MCP server.
![][image4]
Create Your Environment
In your terminal, install uv and set up your Python environment using the following command:

curl -LsSf https://astral.sh/uv/install.sh | sh

Now, restart your terminal to activate the `uv` command. Then, initialize a Python project.

uv init vast_mcp_server
cd vast_mcp_server

Finally, create and activate your virtual environment to manage project dependencies.

uv venv
source .venv/bin/activate

Install Project Dependencies
Create a requirements.txt file to outline project dependencies. You'll use this to write your server code locally and then install dependencies on your Vast.ai instance later.

touch requirements.txt

Then add the following dependencies:

fastmcp
pydantic
transformers
torch
requests
uvicorn
accelerate

Install the dependencies using the following command.

pip install -r requirements.txt

Create Your MCP Server
Now you'll create your first MCP server. This server will facilitate summarizing and analyzing news headlines using an LLM and correlating them with real-time stock data. This is a real-time task that out-of-the-box LLMs like Claude may struggle with. You'll create three MCP tools for stock analysis retrieval, sentiment analysis, and correlation – all available via your remote server.
This example uses FastMCP to facilitate simple protocol implementation. FastMCP uses simple decorators to handle the low-level implementation details, similar to the FastAPI approach.

Create a 'server.py' and 'test_server.py' file using the command:

touch server.py test_server.py

First, you'll configure a Mistral 7B model for summarization and sentiment analysis for a given news article. You'll also set up a FastMCP server. Add the following code to the 'server.py' file.

import random
import logging
import asyncio
import os
import json
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForCausalLM
from fastmcp import FastMCP
import torch


logging.basicConfig(level=logging.INFO)

# Initialize the MCP server
mcp = FastMCP("Financial Market Analyzer")

# Configure Mistral 7B LLM model
MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.2"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {DEVICE}")

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto"
)

class NewsItem(BaseModel):
    title: str
    description: str

Now, you'll create three tools. The first uses the Mistral 7B LLM to summarize and analyze a provided news article. In a real-world implementation, you may want to update this example to accept a URL as input. Add the following code:

# Analyzes the sentiment and provides a summary of a news headline using a large language model.
@mcp.tool()
def analyze_sentiment(news_item: NewsItem):
    prompt = (
        f"You are a financial news analysis expert. "
        f"Analyze the following news headline and description. "
        f"Provide a brief summary and determine the sentiment as POSITIVE, NEGATIVE, or NEUTRAL. "
        f"Respond only with a JSON object containing two keys: 'summary' and 'sentiment'.\n\n"
        f"Headline: \"{news_item.title}\"\n"
        f"Description: \"{news_item.description}\"\n\n"
        f"JSON Response:"
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)

    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=100, do_sample=False)

    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    try:
        json_start_index = generated_text.find("{")
        if json_start_index != -1:
            json_str = generated_text[json_start_index:]
            analysis_result = json.loads(json_str)
            summary = analysis_result.get("summary", "No summary found.")
            sentiment_label = analysis_result.get("sentiment", "NEUTRAL").upper()
        else:
            summary = "Could not generate a structured summary."
            sentiment_label = "NEUTRAL"
            logging.warning("LLM response did not contain a valid JSON object.")

    except json.JSONDecodeError as e:
        summary = "Could not parse LLM response."
        sentiment_label = "NEUTRAL"
        logging.error(f"Failed to parse JSON from LLM: {e}")

    logging.info(f"Analyzed '{news_item.title}': Sentiment={sentiment_label}, Summary='{summary}'")
    return {"title": news_item.title, "sentiment": sentiment_label, "summary": summary}

Now you'll add a stock retrieval tool. In this example, you'll mock the stock data. However, in an actual implementation, you might pull this data from a stock-related API. Add the following code:

# Fetches mock real-time stock data. In a real-world implementation, you would connect to a secure financial database API.
# For this example, we are using a simple mock to simulate real-time data.
@mcp.tool()
def get_stock_data(symbol: str):
    price = round(random.uniform(100.0, 500.0), 2)
    volume = random.randint(100000, 5000000)
    logging.info(f"Fetched mock data for {symbol}: Price={price}, Volume={volume}")
    return {"symbol": symbol, "price": price, "volume": volume}

Finally, you'll add a tool that correlates the stock prices and news sentiment. Add the following code:

@mcp.tool()
async def correlate_data(stock_data: dict, news_analysis: dict):
    sentiment = news_analysis['sentiment']
    price = stock_data['price']

    correlation_message = "No clear correlation found."
    if sentiment == "POSITIVE" and stock_data['volume'] > 2000000:
        correlation_message = f"POSITIVE news coincided with a high trading volume. Possible impact. News Summary: {news_analysis['summary']}"
    elif sentiment == "NEGATIVE" and price < 300:
        correlation_message = f"NEGATIVE news for a low-priced stock. Potential further decline. News Summary: {news_analysis['summary']}"

    logging.info(f"Correlated data: {correlation_message}")
    return {"stock_data": stock_data, "news_analysis": news_analysis, "correlation": correlation_message}

Now, you'll create the ability to run your MCP remote server via streamable-http. Add the following code:

# Runs the MCP server remotely using streamable-http transport
if __name__ == "__main__":
    logging.info(f"MCP server started on port {os.getenv('PORT', 8080)}")
    asyncio.run(
        mcp.run_async(
            transport="streamable-http",
            host="0.0.0.0",
            port=os.getenv("PORT", 8080),
        )
    )

Great work! Now you have three MCP tools defined on your server. After deploying via Vast.ai, you can access any of these tools remotely by integrating with Claude, Cursor, or other MCP supporting apps. Let's upload it to your Vast.ai GPU instance.

Upload Your MCP Server to Vast.ai
First, retrieve your API key from the Vast.ai Console. Then, install the Vast.ai CLI in your Linux terminal using the following command.

pip install --upgrade vastai

Set your API key in your local Vast CLI configuration.

vastai set api-key #your-api-key-here

Now you'll be able to connect by SSH to your remote GPU instance. Navigate to your Vast.ai Console and select your GPU instance. Click on "Open Terminal Access". Here, you should see your SSH commands to connect to your GPU instance with the following structure:

ssh -p XXXXX root@XXX.XXX.XXX.XXX -L XXXX:localhost:XXXX

Navigate back to your terminal and use the command to connect to your instance. If your instance is running, you should see your terminal login to the remote Vast.ai instance with a welcome message.

Since you are using a Hugging Face model, you'll need to login to the Hugging Face CLI using the following:

huggingface-cli login

This command with prompt you for your access token from the Hugging Face website.

Then install the dependencies using:

pip install -r requirements.txt

Finally, you'll securely copy your server.py and requirement.txt files to the remote GPU. Navigate to the terminal you used to create your server environment and ensure the virtual environment is activated. Use the following command to move your files.

scp -P 29946 server.py requirements.txt root@172.81.127.37:/root/

Now you're ready to run and test your server!

Run and Test Your Server
In your open terminal connected to your Vast.ai GPU instance, simply run

uv run server.py

If all is well, you should see the models downloading and the server startup success message:

Starting MCP server 'Financial Market Analyzer' with transport 'streamable-http' on http://0.0.0.0:8080/mcp
INFO: Started server process [2135]
INFO: Waiting for application startup.
INFO:mcp.server.streamable_http_manager:StreamableHTTP session manager started
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080

As a final step, test the server connection to ensure your tools are available. In your Python environment, create a file called 'test_server.py' and paste the script below.

import asyncio
import json

from fastmcp import Client

# Test the MCP server using streamable-http transport.
async def test_server():
    async with Client("http://localhost:8080/mcp") as client:

        # List available tools
        tools = await client.list_tools()
        for tool in tools:
            print(f"Tool found: {tool.name}")

        # Call the news analysis tool
        print("Calling stock tool for INVC")
        stock_result = await client.call_tool("get_stock_data", {"symbol": "INVC"})
        print(f"Result: {stock_result[0].text}")

        # Call the sentiment analysis and summarization tool
        print("2. Calling 'analyze_sentiment' tool...")
        news_item_payload = {
            "news_item": {
                "title": "Tech Giant Announces Record Profits",
                "description": "Shares surged after the company reported quarterly earnings that exceeded all analyst expectations, signaling strong growth."
            }
        }
        sentiment_result = await client.call_tool("analyze_sentiment", news_item_payload)
        print(f"Result: {sentiment_result[0].text}")

        # Call the correlation tool
        print("3. Calling 'correlate_data' tool...")
        correlation_payload = {
            "stock_data": json.loads(stock_result[0].text),
            "news_analysis": json.loads(sentiment_result[0].text)
        }
        correlation_result = await client.call_tool("correlate_data", correlation_payload)
        print(f"Result: {correlation_result[0].text}")

if __name__ == "__main__":
    asyncio.run(test_server())

uv run test_server.py

Navigate back to your Python environment terminal and run the script.

If your server is working correctly, you should see output showing each of the tools and calling each tool in turn.

Congratulations! You have created your first MCP server on Vast.ai. Now you can integrate this into Claude, Cursor, Windsurf, or any other MCP-supporting client, so you can customize, scale, and create more MCP servers for your organization and AI-enabled workflows.

Building Your First MCP Server on Vast.ai

The NVIDIA DGX Spark: The New Desktop AI Supercomputer That Fits in Your Hand

Less Coding, More Creating: The Rise of Vibe Coding

Running Private AI Models Without the Risk of Data Exposure

Subscribe for our product updates.