Build a Weather Assistant with OpenAI GPT-OSS and Harmony SDK on Vast.ai

September 7, 2025

13 Min Read

By Team Vast

Introduction

OpenAI's GPT-OSS-20B model provides transparency in AI reasoning through their Harmony SDK with visibility into the model's thought process. Unlike traditional chatbots that hide their decision-making, GPT-OSS models expose their analytical reasoning, function calling logic, and structured responses across multiple communication channels.

This multi-channel approach changes how we interact with AI. The model separates its internal reasoning (analysis channel), function execution (commentary channel), and user-facing responses (final channel), giving developers insight into the AI's decision-making process. This transparency is essential for building AI applications where understanding "why" is as important as getting the right answer.

Running GPT-OSS on Vast.ai combines AI capabilities with affordable GPU infrastructure. Instead of relying on opaque API calls, you control your own deployment, customize the behavior, and gain visibility into the model's reasoning while using Vast.ai's GPU marketplace.

In this guide, you'll deploy OpenAI's GPT-OSS-20B model on Vast.ai and build a weather assistant that demonstrates Harmony SDK's multi-channel reasoning system.

Setting Up the Environment

The first step is installing the necessary tools to interact with Vast.ai's GPU marketplace and OpenAI's Harmony SDK. These tools provide everything needed to rent GPUs, deploy models, and build applications with structured AI reasoning.

pip install --quiet vastai openai openai-harmony requests

The vastai CLI enables programmatic GPU rental and instance management. The openai library provides the client interface for model interaction, while openai-harmony adds support for multi-channel reasoning format. The requests library will power our weather data fetching.

Next, configure your Vast.ai API credentials. This key authenticates your account and enables GPU rental through the CLI:

export VAST_API_KEY="<your-api-key>"  # Get from https://cloud.vast.ai/account/
vastai set api-key $VAST_API_KEY

Your API key is available in your Vast.ai account settings. This authentication persists across sessions, so you only need to set it once per environment.

Choosing the Right Hardware

We'll find a GPU with enough space for GPT-OSS-20B. The model requires H100 or newer architecture.

Hardware Requirements:

GPU: H100 SXM or H100 NVL
Memory: 40GB minimum
Disk: 60GB
Network: Static IP with direct port

vastai search offers " \
gpu_name in [H100_SXM, H100_NVL] \
gpu_ram >= 40 \
geolocation=US \
static_ip = true \
direct_port_count >= 1 \
verified = true \
disk_space >= 60 \
rentable = true"

This search command filters for H100 GPUs with the necessary specifications. The verified = true flag ensures you're renting from reliable providers, while static_ip and direct_port_count enable external API access. Geographic filtering (geolocation=US) can reduce latency for US-based users.

Deploying GPT-OSS-20B

With suitable hardware identified, deploy the model using vLLM's optimized inference server. vLLM provides an OpenAI-compatible API endpoint with performance optimizations through PagedAttention and continuous batching.

export INSTANCE_ID=<instance-id>  # From your search results

vastai create instance $INSTANCE_ID \
    --image vllm/vllm-openai:gptoss \
    --env '-p 8000:8000' \
    --disk 60 \
    --args --model openai/gpt-oss-20b

Deployment parameters explained:

--image vllm/vllm-openai:gptoss: vLLM's official Docker image specifically built for GPT-OSS models
--env '-p 8000:8000': Maps container port 8000 to host, enabling API access
--disk 60: Allocates 60GB storage for model weights and cache
--args --model openai/gpt-oss-20b: Specifies the exact model variant to load

The deployment process downloads model weights and initializes the inference server. vLLM handles model sharding, memory allocation, and optimization setup.

Connecting to Your Instance

Once deployment completes, retrieve your instance's connection details through the Vast.ai console. The platform assigns a public IP address and port mapping for API access.

To connect to your deployed model:

Navigate to the Instances tab
Locate your running instance in the list
Click the "IP" button to reveal connection details
Copy the external IP address and mapped port

The connection information appears in the format:

XX.XX.XXX.XX:YYYY -> 8000/tcp

Where XX.XX.XXX.XX is your public IP and YYYY is the external port mapped to the container's port 8000.

Test your connection with a completion request:

from openai import OpenAI

# Your instance details from Vast.ai
VAST_IP = "<your-instance-ip>"
VAST_PORT = "<your-port>"

client = OpenAI(
    api_key="EMPTY",  # vLLM doesn't require authentication
    base_url=f"http://{VAST_IP}:{VAST_PORT}/v1"
)

# Quick connection test
try:
    response = client.completions.create(
        model="openai/gpt-oss-20b",
        prompt="Hello, I am"
    )
    print("IT WORKS! GPT-OSS is running!")
    print(f"Test response: {response.choices[0].text}")
except Exception as e:
    print(f"Not ready yet. Error: {e}")
    print("Wait for the model to load.")

A successful response confirms your model is operational and ready for Harmony SDK integration.

Understanding Harmony SDK's Multi-Channel Architecture

OpenAI's Harmony is a response format designed for GPT-OSS models to structure conversations, generate reasoning output, and handle function calls. Unlike traditional single-stream outputs, Harmony enables models to separate their thoughts, actions, and responses into distinct channels.

The three channels are:

Final Channel: User-facing responses
Analysis Channel: Chain of thought reasoning (internal model thinking)
Commentary Channel: Function tool calls and preambles

This format provides transparency into AI decision-making. Developers can observe how the model analyzes problems, why it chooses specific functions, and how it constructs its final response. This visibility helps with debugging, improving prompts, and building trust in AI systems.

Let's initialize the Harmony SDK and explore its capabilities:

from openai_harmony import (
    load_harmony_encoding,
    HarmonyEncodingName,
    Role,
    Message,
    Conversation,
    SystemContent,
    DeveloperContent
)

# Load the Harmony encoding for GPT-OSS models
enc = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)

print("Harmony SDK loaded successfully!")

The Harmony encoding handles the special token format that enables multi-channel communication.

Building the Weather Function

Before constructing our AI assistant, we need a weather data source. The wttr.in API provides free, anonymous weather data:

import requests
import json

def get_weather(city):
    """Get weather data using the free wttr.in API"""
    try:
        url = f"https://wttr.in/{city}?format=j1"
        response = requests.get(url)
        data = response.json()

        current = data['current_condition'][0]
        return {
            "city": city,
            "temperature_c": current['temp_C'],
            "temperature_f": current['temp_F'],
            "description": current['weatherDesc'][0]['value'],
            "humidity": current['humidity'],
            "wind_speed_kmh": current['windspeedKmph']
        }
    except Exception as e:
        return {"error": f"Could not get weather for {city}"}

# Test the weather function
test_weather = get_weather("London")
print("Weather function test:")
print(json.dumps(test_weather, indent=2))

This function fetches weather data with error handling. It returns structured data that the AI can interpret and present to users.

Crafting Harmony Conversations

Harmony conversations follow a hierarchical structure with three message types, each serving a distinct purpose in guiding model behavior:

def create_weather_conversation(user_query):
    return Conversation.from_messages([
        # 1. SYSTEM (highest priority) - Core behavior and identity
        Message.from_role_and_content(
            Role.SYSTEM,
            SystemContent(
                model_identity="You are WeatherBot, a helpful weather assistant. Always show your reasoning in the analysis channel. Valid channels: analysis, commentary, final."
            )
        ),

        # 2. DEVELOPER - Custom instructions and constraints
        Message.from_role_and_content(
            Role.DEVELOPER,
            DeveloperContent(
                instructions="Use metric units by default. If a city is ambiguous, ask for clarification. Always show your reasoning process. You have access to a get_weather function that takes a city parameter."
            )
        ),

        # 3. USER - The actual query
        Message.from_role_and_content(Role.USER, user_query)
    ])

print("Conversation builder ready!")

The SYSTEM message establishes the assistant's identity and fundamental behavior. It defines available channels and ensures the model always explains its reasoning. This message has the highest priority and cannot be overridden by user input.

The DEVELOPER message adds application-specific logic and constraints. Here we specify metric units as default, ambiguity handling rules, and available functions. These instructions shape how the assistant interprets and responds to queries.

The USER message contains the actual query. By structuring conversations this way, we maintain consistent behavior while allowing flexible user interaction.

Parsing Multi-Channel Responses

vLLM's inference server strips special tokens that Harmony uses for channel separation. We need a custom parser to reconstruct the multi-channel structure and extract function calls:

def parse_harmony_response(response_tokens):
    """Parse the model's response and execute any function calls

    Why we need this: vLLM strips the special tokens that Harmony uses for multi-channel output.
    We need to parse the response to extract function calls and separate channels.
    """

    # Parse the response into structured messages
    parsed = enc.parse_messages_from_completion_tokens(
        response_tokens,
        role=Role.ASSISTANT
    )

    channels = {
        "analysis": [],
        "commentary": [],
        "final": []
    }

    for message in parsed:
        # Get channel designation
        channel = getattr(message, 'channel', 'final')

        # message.content is a LIST of TextContent objects (not a single object)
        # We must iterate through each item to extract the actual text
        if hasattr(message, 'content') and isinstance(message.content, list):
            for content_item in message.content:
                # Extract text from TextContent objects
                if hasattr(content_item, 'text'):
                    content_text = content_item.text
                elif isinstance(content_item, str):
                    content_text = content_item
                else:
                    continue

                # Check if this is a function call in the commentary channel
                if channel == "commentary" and "to=functions.get_weather" in content_text:
                    # Extract JSON arguments - they come after "json"
                    if "json" in content_text:
                        try:
                            import json
                            # Find where "json" appears and take everything after it
                            json_start = content_text.find("json") + 4
                            json_str = content_text[json_start:].strip()

                            # Parse the JSON arguments
                            func_args = json.loads(json_str)
                            city = func_args.get("city")

                            if city:
                                # Call the weather function
                                result = get_weather(city)
                                channels["commentary"].append({
                                    "type": "function_call",
                                    "function": "get_weather",
                                    "args": func_args,
                                    "result": result
                                })
                            else:
                                channels[channel].append({"type": "text", "content": content_text})

                        except Exception as e:
                            print(f"Failed to parse function call: {e}")
                            channels[channel].append({"type": "text", "content": content_text})
                    else:
                        channels[channel].append({"type": "text", "content": content_text})
                else:
                    # Regular content for all channels
                    if channel in channels:
                        channels[channel].append({
                            "type": "text",
                            "content": content_text
                        })

    return channels

print("Response parser ready!")

This parser performs several functions:

Channel Separation: Identifies and routes content to appropriate channels based on Harmony markers
Function Detection: Recognizes function call patterns in the commentary channel
Argument Extraction: Parses JSON arguments from function call specifications
Function Execution: Calls the weather function with extracted parameters
Result Integration: Includes function results in the response structure

The parser handles edge cases like malformed JSON, missing arguments, and unexpected content formats.

Assembling the Weather Assistant

Now we combine all components into a weather assistant that uses Harmony's multi-channel capabilities:

def weather_assistant(user_query, model_client):
    """Main weather assistant function with proper Harmony support"""

    # Create the conversation with proper role hierarchy
    conversation = create_weather_conversation(user_query)

    # Render the conversation into Harmony format tokens
    prompt_tokens = enc.render_conversation_for_completion(conversation, Role.ASSISTANT)
    prompt_text = enc.decode(prompt_tokens)

    # Send to model for completion
    response = model_client.completions.create(
        model="openai/gpt-oss-20b",
        prompt=prompt_text,
        temperature=0.7,
        stop=["<|return|>", "<|call|>", "<|end|>"]
    )

    response_text = response.choices[0].text

    # Reconstruct proper Harmony format (vLLM strips special tokens, so we add them back)
    formatted = response_text

    if formatted.startswith("analysis"):
        formatted = "<|start|>assistant<|channel|>" + formatted
        formatted = formatted.replace("analysis", "analysis<|message|>", 1)

    # Handle channel transitions
    formatted = formatted.replace("assistantfinal", "<|end|><|start|>assistant<|channel|>final<|message|>")
    formatted = formatted.replace("assistantanalysis", "<|end|><|start|>assistant<|channel|>analysis<|message|>")
    formatted = formatted.replace("assistantcommentary", "<|end|><|start|>assistant<|channel|>commentary<|message|>")

    # Ensure proper termination
    if not formatted.endswith(("<|return|>", "<|call|>", "<|end|>")):
        formatted += "<|return|>"

    # Encode with special tokens allowed
    response_tokens = enc.encode(formatted, allowed_special='all')

    # Parse the harmony response
    try:
        channels = parse_harmony_response(response_tokens)
    except Exception as e:
        return {
            "error": str(e),
            "raw_response": response_text
        }

    return channels

print("Weather assistant ready!")

The assistant orchestrates the interaction flow:

Conversation Creation: Builds a structured Harmony conversation with system, developer, and user messages
Token Rendering: Converts the conversation to Harmony token format for model processing
Model Inference: Sends the formatted prompt to GPT-OSS for completion
Token Reconstruction: Adds back special tokens that vLLM strips during inference
Response Parsing: Extracts multi-channel content and executes function calls
Error Handling: Manages parsing failures and unexpected responses

The temperature setting balances creativity with consistency, while the stop tokens prevent the model from generating beyond logical endpoints.

Displaying Multi-Channel Results

To make the multi-channel output human-readable, we need a display function that clearly presents each channel's content:

def display_harmony_result(result):
    """Display the Harmony multi-channel response in a formatted way"""

    if "error" in result:
        print(f"Error: {result['error']}")
        return

    # Show the reasoning (analysis channel)
    if result.get("analysis"):
        print("\n📊 AI REASONING (analysis channel):")
        for item in result["analysis"]:
            if item["type"] == "text":
                text = item['content']
                if hasattr(text, 'text'):
                    text = text.text
                print(f"   {text}")

    # Show function calls (commentary channel)
    if result.get("commentary"):
        print("\n🔧 FUNCTION CALLS (commentary channel):")
        for item in result["commentary"]:
            if item["type"] == "function_call":
                print(f"   Calling: {item['function']}({item['args']})")
                print(f"\n   Weather Data Retrieved:")
                weather = item['result']
                if 'error' not in weather:
                    print(f"   • Temperature: {weather['temperature_c']}°C ({weather['temperature_f']}°F)")
                    print(f"   • Conditions: {weather['description']}")
                    print(f"   • Humidity: {weather['humidity']}%")
                    print(f"   • Wind: {weather['wind_speed_kmh']} km/h")
                else:
                    print(f"   • Error: {weather['error']}")

    # Show final response (final channel)
    if result.get("final"):
        print("\n💬 FINAL RESPONSE (final channel):")
        for item in result["final"]:
            if item["type"] == "text":
                text = item['content']
                if hasattr(text, 'text'):
                    text = text.text
                print(f"   {text}")

    # If no final response, generate one from the weather data
    elif result.get("commentary"):
        print("\n💬 FINAL RESPONSE (generated):")
        for item in result["commentary"]:
            if item["type"] == "function_call" and item.get("result"):
                weather = item['result']
                if 'error' not in weather:
                    print(f"   The current weather in {weather['city']} is {weather['description'].lower()}.")
                    print(f"   It's {weather['temperature_c']}°C ({weather['temperature_f']}°F) with {weather['humidity']}% humidity")
                    print(f"   and winds at {weather['wind_speed_kmh']} km/h.")

This display function transforms raw channel data into organized output. Each channel gets its own section with formatting, making it easy to understand the AI's thought process from reasoning through execution to final response.

Testing the Weather Assistant

Let's see the weather assistant in action with different queries to demonstrate its capabilities:

Test 1: Simple Weather Query

# Test the weather assistant
query = "What's the weather like in Tokyo?"
print(f"\nQuery: {query}")
print("-" * 60)

result = weather_assistant(query, client)
display_harmony_result(result)

Output:

Query: What's the weather like in Tokyo?
------------------------------------------------------------

📊 AI REASONING (analysis channel):
   The user asks: "What's the weather like in Tokyo?" It's a straightforward request for weather in Tokyo. There's no ambiguity. We can use the get_weather function. The user wants the weather. We should call get_weather with city="Tokyo". And then display the result. We should also show reasoning. The answer should be in metric units. The developer instructions say to use metric units by default. That may refer to temperature in Celsius. The get_weather function presumably returns data in metric units. We'll call it.

We need to show reasoning in the analysis channel. Then produce final answer with the weather.

🔧 FUNCTION CALLS (commentary channel):
   Calling: get_weather({'city': 'Tokyo'})

   Weather Data Retrieved:
   • Temperature: 27°C (80°F)
   • Conditions: Clear
   • Humidity: 85%
   • Wind: 23 km/h

💬 FINAL RESPONSE (generated):
   The current weather in Tokyo is clear.
   It's 27°C (80°F) with 85% humidity
   and winds at 23 km/h.

Notice how the analysis channel reveals the model's complete reasoning process. It identifies the request type, confirms there's no ambiguity, recalls the developer instructions about metric units, and plans its response strategy. This transparency helps with understanding and debugging AI behavior.

Test 2: Handling Ambiguous Cities

# Test the weather assistant
query = "What's the weather like in Paris?"
print(f"\nQuery: {query}")
print("-" * 60)

result = weather_assistant(query, client)
display_harmony_result(result)

Output:

Query: What's the weather like in Paris?
------------------------------------------------------------

📊 AI REASONING (analysis channel):
   We need to answer weather for Paris. But Paris is ambiguous? Possibly Paris, France or Paris, Texas? Typically default to Paris, France. But could ask clarification. The instruction says if city ambiguous, ask for clarification. But the user says "What's the weather like in Paris?" Without specifying country. We can assume Paris, France. But ambiguous: there are many Paris. Usually default is Paris, France. But the instruction says if ambiguous, ask for clarification. But "Paris" is ambiguous by definition. However guidelines say "If a city is ambiguous, ask for clarification." So we should ask clarifying question. But maybe we can interpret that user likely means Paris, France. But better to ask for clarification to be safe. So we ask: "Which Paris? Paris, France or Paris, Texas, etc." Let's ask: "Do you mean Paris, France or another Paris?" Then we can proceed.

💬 FINAL RESPONSE (final channel):
   Sure! Paris can refer to several places. Could you let me know which one you're interested in? For example, Paris, France or Paris, Texas (or another city named Paris).

The model demonstrates reasoning about ambiguity. It recognizes multiple cities share the name "Paris," weighs the likelihood of different interpretations, recalls the developer instruction to ask for clarification, and decides to request more information rather than making assumptions.

Test 3: Clarified Location

# Test the weather assistant
query = "What's the weather like in Paris, France?"
print(f"\nQuery: {query}")
print("-" * 60)

result = weather_assistant(query, client)
display_harmony_result(result)

Output:

Query: What's the weather like in Paris, France?
------------------------------------------------------------

📊 AI REASONING (analysis channel):
   The user asks for weather in Paris, France. That is a city. It's ambiguous? "Paris" could be Paris, France or Paris, Texas etc. The user added "France" to clarify. So it's clear: Paris, France. Should use get_weather function with city parameter "Paris, France". Ensure metric units? The function may return data. We need to call get_weather.

🔧 FUNCTION CALLS (commentary channel):
   Calling: get_weather({'city': 'Paris, France'})

   Weather Data Retrieved:
   • Temperature: 22°C (72°F)
   • Conditions: Partly cloudy
   • Humidity: 46%
   • Wind: 6 km/h

💬 FINAL RESPONSE (generated):
   The current weather in Paris, France is partly cloudy.
   It's 22°C (72°F) with 46% humidity
   and winds at 6 km/h.

With clarification provided, the model proceeds confidently. The analysis channel shows it recognizes the disambiguation, confirming "Paris, France" removes ambiguity. It then executes the weather function and presents results.

Conclusion

This guide demonstrated deploying OpenAI's GPT-OSS-20B model on Vast.ai with the Harmony SDK for multi-channel reasoning. The weather assistant shows how the model separates its internal reasoning, function calls, and user responses into distinct channels, providing full transparency into its decision-making process.

The combination of Vast.ai's GPU infrastructure and Harmony's structured format enables building AI applications where understanding the model's reasoning is as important as getting the right answer.

Build a Weather Assistant with OpenAI GPT-OSS and Harmony SDK on Vast.ai

Introduction

Setting Up the Environment

Choosing the Right Hardware

Deploying GPT-OSS-20B

Connecting to Your Instance

Understanding Harmony SDK's Multi-Channel Architecture

Building the Weather Function

Crafting Harmony Conversations

Parsing Multi-Channel Responses

Assembling the Weather Assistant

Displaying Multi-Channel Results

Testing the Weather Assistant

Test 1: Simple Weather Query

Test 2: Handling Ambiguous Cities

Test 3: Clarified Location

Conclusion

Running OpenAI's GPT-OSS on Vast.ai

The NVIDIA DGX Spark: The New Desktop AI Supercomputer That Fits in Your Hand

Less Coding, More Creating: The Rise of Vibe Coding

Subscribe for our product updates.