Remember when ChatGPT couldn't tell you the weather or gather top headlines? Are you still frustrated asking Gemini to create a JIRA ticket from a bug report email? Perhaps you're tired of Cursor hallucinating database column names because it's walled off from your live schema, so you're constantly copy-pasting the same context across your Claude and Cursor apps. Despite their eloquent email-writing powers, out-of-the-box Large Language Models (LLMs) are powerful predictors, not contextually-aware assistants. They operate in silos, forcing you to start with fresh context -- or none at all -- for every new task.
This constant context-juggling isn't just annoying, it's fragile. When a better model is released or your data governance rules change, do you have to re-implement your entire workflow? What if you want to easily swap out Gemini's new video generation model for one from Stable Diffusion?
With Model Context Protocol (MCP), an open-source framework released by Anthropic for building truly helpful, environment-agnostic AI assistants, you can create a persistent, shareable context layer across models and tools. And with Vast.ai's secure and affordable cloud GPU instances, implementing your own MCP has never been easier.
Model Context Protocol Architecture
MCP architecture is built on three key components:
With MCP, any developer can build a server to connect an API to any MCP-compatible client. You no longer have to wait for official integrations from LLM providers.
An MCP server provides three categories of capabilities to the model – tools, resources, and prompts. In the following guide, you'll focus on creating MCP tools and learn to create useful actions to spawn from your clients via a remote MCP server on Vast.ai's platform.
Remote MCP Servers
Initially, MCP architecture only supported locally running servers using stdio transport. Since summer 2025, it supports remote access via streamable-http, enabling scalable, AI-enabled workflows that integrate LLMs across your organization. Now teams can more easily integrate data services and build custom workflows within their workspaces.
Vast.ai's affordable GPU instance rentals provide flexible and affordable hosting for as many MCP servers as you want to create – scaling up your team's AI workflows quickly and securely.
Tutorial: Build a Stock Sentiment Analysis MCP
Today, you'll build an MCP server for stock sentiment analysis correlation hosted on Vast.ai's platform. This setup provides a foundation for team-based workflows. For example, a financial analyst company may use this MCP tool to probe further into market conditions all within the Claude interface, or spur discussions by connecting this output to a shared Slack channel, providing up to date and collective information to multiple teams. To begin, let's review what you'll need for this process.
Vast.ai's GPU rentals offer space for a vast variety of MCP servers, easily accommodating 70B large language models or image diffusion models on the most up-to-date graphics card offerings. Their platform creates opportunities for creative, custom server integrations like creating a shared, product-specific image prompt for your product teams or pulling in proprietary climate energy data to your coding and marketing workflows.
By the end of this tutorial, you will understand how to build a similar bespoke MCP server for your organization's needs, and have a running blueprint for creating your collection of remote MCP servers on Vast.ai. Let's get started.
Prerequisites
curl -LsSf https://astral.sh/uv/install.sh | sh
Now, restart your terminal to activate the `uv` command. Then, initialize a Python project.
uv init vast_mcp_server
cd vast_mcp_server
Finally, create and activate your virtual environment to manage project dependencies.
uv venv
source .venv/bin/activate
touch requirements.txt
Then add the following dependencies:
fastmcp
pydantic
transformers
torch
requests
uvicorn
accelerate
Install the dependencies using the following command.
pip install -r requirements.txt
Create a 'server.py' and 'test_server.py' file using the command:
touch server.py test_server.py
First, you'll configure a Mistral 7B model for summarization and sentiment analysis for a given news article. You'll also set up a FastMCP server. Add the following code to the 'server.py' file.
import random
import logging
import asyncio
import os
import json
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForCausalLM
from fastmcp import FastMCP
import torch
logging.basicConfig(level=logging.INFO)
# Initialize the MCP server
mcp = FastMCP("Financial Market Analyzer")
# Configure Mistral 7B LLM model
MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.2"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {DEVICE}")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
device_map="auto"
)
class NewsItem(BaseModel):
title: str
description: str
Now, you'll create three tools. The first uses the Mistral 7B LLM to summarize and analyze a provided news article. In a real-world implementation, you may want to update this example to accept a URL as input. Add the following code:
# Analyzes the sentiment and provides a summary of a news headline using a large language model.
@mcp.tool()
def analyze_sentiment(news_item: NewsItem):
prompt = (
f"You are a financial news analysis expert. "
f"Analyze the following news headline and description. "
f"Provide a brief summary and determine the sentiment as POSITIVE, NEGATIVE, or NEUTRAL. "
f"Respond only with a JSON object containing two keys: 'summary' and 'sentiment'.\n\n"
f"Headline: \"{news_item.title}\"\n"
f"Description: \"{news_item.description}\"\n\n"
f"JSON Response:"
)
inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=False)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
try:
json_start_index = generated_text.find("{")
if json_start_index != -1:
json_str = generated_text[json_start_index:]
analysis_result = json.loads(json_str)
summary = analysis_result.get("summary", "No summary found.")
sentiment_label = analysis_result.get("sentiment", "NEUTRAL").upper()
else:
summary = "Could not generate a structured summary."
sentiment_label = "NEUTRAL"
logging.warning("LLM response did not contain a valid JSON object.")
except json.JSONDecodeError as e:
summary = "Could not parse LLM response."
sentiment_label = "NEUTRAL"
logging.error(f"Failed to parse JSON from LLM: {e}")
logging.info(f"Analyzed '{news_item.title}': Sentiment={sentiment_label}, Summary='{summary}'")
return {"title": news_item.title, "sentiment": sentiment_label, "summary": summary}
Now you'll add a stock retrieval tool. In this example, you'll mock the stock data. However, in an actual implementation, you might pull this data from a stock-related API. Add the following code:
# Fetches mock real-time stock data. In a real-world implementation, you would connect to a secure financial database API.
# For this example, we are using a simple mock to simulate real-time data.
@mcp.tool()
def get_stock_data(symbol: str):
price = round(random.uniform(100.0, 500.0), 2)
volume = random.randint(100000, 5000000)
logging.info(f"Fetched mock data for {symbol}: Price={price}, Volume={volume}")
return {"symbol": symbol, "price": price, "volume": volume}
Finally, you'll add a tool that correlates the stock prices and news sentiment. Add the following code:
@mcp.tool()
async def correlate_data(stock_data: dict, news_analysis: dict):
sentiment = news_analysis['sentiment']
price = stock_data['price']
correlation_message = "No clear correlation found."
if sentiment == "POSITIVE" and stock_data['volume'] > 2000000:
correlation_message = f"POSITIVE news coincided with a high trading volume. Possible impact. News Summary: {news_analysis['summary']}"
elif sentiment == "NEGATIVE" and price < 300:
correlation_message = f"NEGATIVE news for a low-priced stock. Potential further decline. News Summary: {news_analysis['summary']}"
logging.info(f"Correlated data: {correlation_message}")
return {"stock_data": stock_data, "news_analysis": news_analysis, "correlation": correlation_message}
Now, you'll create the ability to run your MCP remote server via streamable-http. Add the following code:
# Runs the MCP server remotely using streamable-http transport
if __name__ == "__main__":
logging.info(f"MCP server started on port {os.getenv('PORT', 8080)}")
asyncio.run(
mcp.run_async(
transport="streamable-http",
host="0.0.0.0",
port=os.getenv("PORT", 8080),
)
)
Great work! Now you have three MCP tools defined on your server. After deploying via Vast.ai, you can access any of these tools remotely by integrating with Claude, Cursor, or other MCP supporting apps. Let's upload it to your Vast.ai GPU instance.
pip install --upgrade vastai
Set your API key in your local Vast CLI configuration.
vastai set api-key #your-api-key-here
Now you'll be able to connect by SSH to your remote GPU instance. Navigate to your Vast.ai Console and select your GPU instance. Click on "Open Terminal Access". Here, you should see your SSH commands to connect to your GPU instance with the following structure:
ssh -p XXXXX root@XXX.XXX.XXX.XXX -L XXXX:localhost:XXXX
Navigate back to your terminal and use the command to connect to your instance. If your instance is running, you should see your terminal login to the remote Vast.ai instance with a welcome message.
Since you are using a Hugging Face model, you'll need to login to the Hugging Face CLI using the following:
huggingface-cli login
This command with prompt you for your access token from the Hugging Face website.
Then install the dependencies using:
pip install -r requirements.txt
Finally, you'll securely copy your server.py and requirement.txt files to the remote GPU. Navigate to the terminal you used to create your server environment and ensure the virtual environment is activated. Use the following command to move your files.
scp -P 29946 server.py requirements.txt root@172.81.127.37:/root/
Now you're ready to run and test your server!
uv run server.py
If all is well, you should see the models downloading and the server startup success message:
Starting MCP server 'Financial Market Analyzer' with transport 'streamable-http' on http://0.0.0.0:8080/mcp
INFO: Started server process [2135]
INFO: Waiting for application startup.
INFO:mcp.server.streamable_http_manager:StreamableHTTP session manager started
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080
As a final step, test the server connection to ensure your tools are available. In your Python environment, create a file called 'test_server.py' and paste the script below.
import asyncio
import json
from fastmcp import Client
# Test the MCP server using streamable-http transport.
async def test_server():
async with Client("http://localhost:8080/mcp") as client:
# List available tools
tools = await client.list_tools()
for tool in tools:
print(f"Tool found: {tool.name}")
# Call the news analysis tool
print("Calling stock tool for INVC")
stock_result = await client.call_tool("get_stock_data", {"symbol": "INVC"})
print(f"Result: {stock_result[0].text}")
# Call the sentiment analysis and summarization tool
print("2. Calling 'analyze_sentiment' tool...")
news_item_payload = {
"news_item": {
"title": "Tech Giant Announces Record Profits",
"description": "Shares surged after the company reported quarterly earnings that exceeded all analyst expectations, signaling strong growth."
}
}
sentiment_result = await client.call_tool("analyze_sentiment", news_item_payload)
print(f"Result: {sentiment_result[0].text}")
# Call the correlation tool
print("3. Calling 'correlate_data' tool...")
correlation_payload = {
"stock_data": json.loads(stock_result[0].text),
"news_analysis": json.loads(sentiment_result[0].text)
}
correlation_result = await client.call_tool("correlate_data", correlation_payload)
print(f"Result: {correlation_result[0].text}")
if __name__ == "__main__":
asyncio.run(test_server())
uv run test_server.py
Navigate back to your Python environment terminal and run the script.
If your server is working correctly, you should see output showing each of the tools and calling each tool in turn.
Congratulations! You have created your first MCP server on Vast.ai. Now you can integrate this into Claude, Cursor, Windsurf, or any other MCP-supporting client, so you can customize, scale, and create more MCP servers for your organization and AI-enabled workflows.