How to Build an Agent on Vast.ai in Less Than 100 Lines of Python
Building an AI agent doesn't have to require a heavy framework such as LangChain, AutoGen, CrewAI, and many others. In this tutorial, we build a fully functional software engineering agent in under 100 lines of Python using a Vast.ai GPU instance, Ollama for local inference, and a single shell tool to give the model real-world capabilities. If you want to understand how agentic AI actually works under the hood, this is the clearest starting point.
In this project, we build a minimal, functional software engineering agent in just a few screens of code. By implementing a simple loop that connects a language model to a shell tool, we can demonstrate how agents can inspect files, execute commands, and solve tasks autonomously. We will gain a clear understanding of how agents function, and build a foundation of how more complex agent systems operate.
Building a Small Agent in Python
A Python AI agent is not a magic object. It is a program that gives a model a goal, lets the model request actions, performs those actions, and returns the results to the model. The interesting behavior comes from the loop, not from a large framework.
In this article, we will build a small command-line agent in less than 100 lines of Python. The agent will use an OpenAI-compatible chat API, expose a single tool named shell, and run until the model either finishes the task or reaches a fixed turn limit.
This version is intentionally simple. It is not sandboxed. It is not a product. It is the smallest useful version of the mechanism. Later articles can harden it, add better tools, and give it safer execution boundaries.
How the AI Agent Loop Works
Most agent systems reduce to the same loop:
- Start with a conversation.
- Ask the model what to do next.
- If the model asks for a tool, run the tool.
- Add the tool result to the conversation.
- Ask the model again.
Without tools, a model can only produce text. With tools, the model can inspect the outside world and act on it. The model still does not execute anything itself. Your program executes the tool call and decides what result to feed back.
That separation is important. The model chooses an action. The host program performs it. Then the host tells the model what happened and it chooses an appropriate action. In this way, the tool protocol becomes a handshake between the agent harness and the LLM model itself.
Running an AI Agent on Vast.ai
Before we even start, we need a place to run our models. For this article, we'll spin up a GPU machine on Vast.ai. For this exercise, the RTX 5090 is the sweet spot for price/performance, but you might want to try the RTX 6000 Blackwell for faster speed and more VRAM for long contexts, or running many agents at the same time.
First, we will need the Vast.ai command-line tool installed. It's a useful tool to do almost anything you can do with the UI, as long as it's configured with your API key. To get a new key, log in and go to https://cloud.vast.ai/account/api/ to create one. If you haven't installed Vast.ai before, just do this:
python -m pip install openai
python -m pip install vastai
vastai set api-key <YOUR_API_KEY_HERE>
vastai show user # for testing, you should see yourself here
Note we also installed the Python OpenAI library; that's the only external dependency we will use.
Templates are pre-configured Docker images for Vast.ai machines that include the operating system, drivers, and pre-installed software environments tailored for specific tasks - in this case, an image that comes ready with Ollama and the necessary dependencies to get started immediately.
Let's select a template that will allow us to get going quickly; we'll choose an Ollama image.
Then select your GPU from the drop-down. While this project only requires a single GPU, you can provision additional GPUs if your use case requires more resources. We would also recommend filtering by "North America" (or your region of the world), as you might encounter network latency if the instance is geographically distant.
Make sure your container size has enough space; models can be very large, so a size of 64G at minimum is recommended for this project. If you run out of space, there is no way to resize your storage on the fly, so you will have to recreate another instance from scratch. Although spinning up a new instance is quick, it can disrupt your workflow significantly, so allocate a reasonable amount of space to start.
You can see your instance by selecting "Instances" on the menu on the left side. It should say "Creating..." then "Loading..."; at first, this might take a minute. When it switches to "Open", you can log in and finish the configuration. If this step takes too long (it should be under a minute), just destroy the instance with the trashcan button and select another one.
Running a Model with Ollama
The script below talks to an OpenAI-compatible endpoint. For this example, the model runs on a Vast.ai GPU instance using Ollama. Ollama is not the only way to run LLMs but is generally regarded as one of the easiest to start with, and it is a very popular choice.
Since we have already provisioned our Vast.ai remote machine with Ollama installed and running, let's now connect to the instance with SSH and forward the Ollama port back to your local machine:
VASTAI_INSTANCE=$(vastai show instances -q | head -n 1)
VASTAI_SSH_URL=$(vastai ssh-url $VASTAI_INSTANCE)
ssh -i ~/.ssh/id_vastai -L 11444:localhost:11434 $VASTAI_SSH_URL
$(...) is a shell subcommand - the first line gets your instance ID, the second resolves it to an SSH URL, and the third connects while forwarding the Ollama port to localhost:11444 on your machine. We're assuming your SSH key is at ~/.ssh/id_vastai - adjust the path if yours differs.
Ollama listens on port 11434 on the remote machine. The -L option exposes that remote service as localhost:11444 on your laptop. From the Python program's point of view, the model server is local.
On the Vast.ai instance, list the installed models:
ollama list
It should show perhaps one preconfigured model. Since the model we want is not present, we need to pull (download) it:
ollama pull qwen3.6:35b-a3b
Models are typically very large so this will probably take a few minutes. We could have also configured a template to load this model when we spun up.
The agent will use this endpoint by default:
http://localhost:11444/v1
You can override the endpoint and model with environment variables:
export OPENAI_BASE_URL=http://localhost:11444/v1
export OPENAI_MODEL=qwen3.6:35b-a3b
export OPENAI_API_KEY=dummy
The API key is dummy because Ollama does not require one, but the OpenAI Python client expects the field to exist and will error out if it's either blank or not defined.
A Directly Executable Script
The first line of the file is a shebang:
#!/usr/bin/env python3
This tells the operating system to run the script with python3 when it is invoked as a command. That matters later when we install the script as ~/bin/agent.
The System Prompt
The system prompt defines the agent's role and operating style. We will use the Python docstring to embed this, so we have it right at the top.
"""
You are a software engineering agent.
Your objective is to complete the user's requested task, not merely describe how to complete it.
You have access to a shell tool that can inspect files, create files, run programs, and verify results.
General behavior:
- Be proactive.
- Gather information before making assumptions.
- Verify your work whenever possible.
- Prefer evidence from tool output over speculation.
- If you encounter an error, investigate it and try to fix it.
- Continue until the task is complete or you are genuinely blocked.
Communication:
- Before a new investigation, implementation, or testing phase, briefly explain what you intend to do.
- Group related actions together. Do not narrate every single command.
- Keep updates concise.
- When complete, summarize what you accomplished.
Coding workflow:
- Understand the task.
- Inspect relevant files when needed.
- Make the smallest reasonable change that solves the problem.
- Run the code.
- Run tests when available.
- Fix failures and verify again.
Safety:
- Do not intentionally destroy user data.
- Do not run dangerous system commands unless explicitly requested.
"""
The prompt is plain English because the model needs behavioral instructions, not an API manual. It tells the model to gather evidence, verify work, and continue until the task is complete or blocked.
More complex harnesses will use more than a single prompt/model/agent to dictate workflow, but nowadays models are so powerful that a single prompt works surprisingly well, and we don't need any more than this to create a useful tool that we can use for a real-world coding assistant.
For brevity, the script uses the module docstring as the system prompt:
{"role": "system", "content": __doc__}
That is a very useful trick for small, self-contained programs. The prompt remains at the top of the file where it is easy to read, and Python gives us access to it through __doc__.
Configuring the OpenAI-Compatible LLM Client
While Ollama has its own API, the OpenAI version is industry standard and virtually every LLM hosting tool can use it out of the box, which means we retain maximum flexibility for moving to different backends in the future.
The OpenAI Python client can talk to any compatible server:
import os, sys, json, subprocess, openai
MODEL = os.getenv("OPENAI_MODEL", "qwen3.6:35b-a3b")
client = openai.OpenAI(
base_url=os.getenv("OPENAI_BASE_URL", "http://localhost:11444/v1"),
api_key=os.getenv("OPENAI_API_KEY", "dummy"),
)
The important option is base_url. Instead of sending requests to OpenAI's hosted API, the client sends them to the Ollama server behind the SSH tunnel.
The model name may also come from the environment. The default matches the model we pulled above, but the script does not require that exact model.
Giving the Agent a Shell Tool
In a mature coding agent, we'd have a plethora of tools designed to assist the model, but the dirty little secret is: we don't actually need that. Since every model has studied every coding site like Stack Overflow online, it can do everything it needs using bash.
Say it needs to see a file, it can say "cat filename". If it wants to save a file, it can output "cat >filename", so it can do anything it wants as long as there is a command tool for its purposes.
So the agent has only one tool:
def shell_tool(command):
"""Execute a shell command and return a JSON object with success, stdout, stderr, exitcode."""
print(f"\n$ {command}")
result = subprocess.run(
command, shell=True, text=True, capture_output=True)
return {"success": result.returncode == 0,
"stdout": result.stdout,
"stderr": result.stderr,
"exitcode": result.returncode }
The function prints the command, runs it, and returns a dictionary containing the exit status and captured output. The model will see that dictionary after we serialize it as JSON.
The docstring is also part of the design. For this function, the docstring is not only Python documentation. It is the short description the model receives for the tool.
This tool uses shell=True. That is deliberately powerful and deliberately unsafe. Treat this script as a trusted local experiment. Do not run it in a directory or account where arbitrary shell commands would be unacceptable.
Describe the Tool to the Model
The model does not receive the Python function. It receives a JSON schema that describes the function:
TOOLS = [{
"type": "function",
"function": {
"name": "shell",
"description": shell_tool.__doc__,
"parameters": {
"type": "object",
"properties": { "command": {"type": "string"} },
"required": ["command"]}}}]
This schema says there is a tool named shell, and it requires one string argument named command.
The description comes from shell_tool.__doc__. That keeps the implementation and the model-facing documentation close together. For larger tools, a longer docstring can explain when to use the tool, what input it expects, and what shape it returns.
Sometimes we even give the model examples of how to use the tool, but in this case, models are well-versed in the language of Unix shells.
How the Agent Loop Works
The run_agent function creates the conversation and drives the loop:
def run_agent(goal):
messages = [{"role": "system", "content": __doc__},
{"role": "user", "content": goal}]
for _ in range(20):
response = client.chat.completions.create(
model=MODEL, messages=messages, tools=TOOLS)
msg = response.choices[0].message
if msg.content:
print("\n" + msg.content)
messages.append(msg)
if not msg.tool_calls:
return
for call in msg.tool_calls:
args = json.loads(call.function.arguments)
output = {"shell": shell_tool}[call.function.name](**args)
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": json.dumps(output),
})
print("\nStopped after 20 turns.")
The conversation starts with two messages: the system prompt and the user's goal. Each iteration sends the full conversation to the model.
The model response may contain text, tool calls, or both. The message always has role "assistant". If there is text, the script prints it to the user. Then it appends the assistant message to the conversation. That matters because tool results must be associated with the assistant message that requested them. Note that msg is a ChatCompletionMessage object, not a plain dictionary - but the OpenAI Python client serializes it automatically when it is included in the next request, so no manual conversion is needed.
If the assistant did not request any tools, the agent is finished. In these cases, LLMs typically return content to explain or summarize what happened.
If there are tool calls, the script parses the function arguments (they arrive in string format):
args = json.loads(call.function.arguments)
Then it dispatches the call:
output = {"shell": shell_tool}[call.function.name](**args)
This one-line dictionary lookup is the tool router. With one tool, it is almost trivial. With more tools, the dictionary can grow:
{"shell": shell_tool, "read_file": read_file, "write_file": write_file}
Finally, the result goes back into the conversation as a tool message:
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": json.dumps(output),
})
The tool_call_id connects the result to the specific tool call the model made. The content is JSON serialized because chat content is always a string.
The loop has a fixed limit of 20 model turns. That prevents a confused model from running forever. In future versions, we may want to prompt the user if they want to continue, or make this value configurable.
Finish It Up
The final block of code makes the script executable. It checks for the user's goal in the command-line arguments and passes it to the run_agent function to initiate the interaction loop.
if __name__ == "__main__":
if len(sys.argv) < 2:
sys.exit(f"Usage: {sys.argv[0]} <goal>")
run_agent(" ".join(sys.argv[1:]))
The Complete Program
That is the entire agent: under 100 lines. While the logic appears minimal, we have successfully implemented the core mechanics necessary for a functional system that meets every definition of an autonomous agent.
Save the following as agent.py:
#!/usr/bin/env python3
"""
You are a software engineering agent.
Your objective is to complete the user's requested task, not merely describe how to complete it.
You have access to a shell tool that can inspect files, create files, run programs, and verify results.
General behavior:
- Be proactive.
- Gather information before making assumptions.
- Verify your work whenever possible.
- Prefer evidence from tool output over speculation.
- If you encounter an error, investigate it and try to fix it.
- Continue until the task is complete or you are genuinely blocked.
Communication:
- Before a new investigation, implementation, or testing phase, briefly explain what you intend to do.
- Group related actions together. Do not narrate every single command.
- Keep updates concise.
- When complete, summarize what you accomplished.
Coding workflow:
- Understand the task.
- Inspect relevant files when needed.
- Make the smallest reasonable change that solves the problem.
- Run the code.
- Run tests when available.
- Fix failures and verify again.
Safety:
- Do not intentionally destroy user data.
- Do not run dangerous system commands unless explicitly requested.
"""
import os, sys, json, subprocess, openai
MODEL = os.getenv("OPENAI_MODEL", "qwen3.6:35b-a3b")
client = openai.OpenAI(
base_url=os.getenv("OPENAI_BASE_URL", "http://localhost:11444/v1"),
api_key=os.getenv("OPENAI_API_KEY", "dummy"),
)
def shell_tool(command):
"""Execute a shell command and return a JSON object with success, stdout, stderr, exitcode."""
print(f"\n$ {command}")
result = subprocess.run(
command, shell=True, text=True, capture_output=True)
return {"success": result.returncode == 0,
"stdout": result.stdout,
"stderr": result.stderr,
"exitcode": result.returncode }
TOOLS = [{
"type": "function",
"function": {
"name": "shell",
"description": shell_tool.__doc__,
"parameters": {
"type": "object",
"properties": { "command": {"type": "string"} },
"required": ["command"]}}}]
def run_agent(goal):
messages = [{"role": "system", "content": __doc__},
{"role": "user", "content": goal}]
for _ in range(20):
response = client.chat.completions.create(
model=MODEL, messages=messages, tools=TOOLS)
msg = response.choices[0].message
if msg.content:
print("\n" + msg.content)
messages.append(msg)
if not msg.tool_calls:
return
for call in msg.tool_calls:
args = json.loads(call.function.arguments)
output = {"shell": shell_tool}[call.function.name](**args)
messages.append({
"role": "tool",
"tool_call_id": call.id,
"content": json.dumps(output),
})
print("\nStopped after 20 turns.")
if __name__ == "__main__":
if len(sys.argv) < 2:
sys.exit(f"Usage: {sys.argv[0]} <goal>")
run_agent(" ".join(sys.argv[1:]))
That is the entire agent: under 100 lines.
Installing the Agent as a CLI Command
Make the script executable and copy it somewhere on your path:
chmod +x agent.py
mkdir -p ~/bin
cp agent.py ~/bin/agent
export PATH=~/bin:$PATH
Put the PATH line in ~/.bashrc, ~/.zshrc, or the shell startup file you use.
Now you can run:
agent inspect this project and tell me how to run the tests
The script joins all command-line arguments back into one goal, so simple requests do not need quotes.
Use quotes when the request contains characters your shell might interpret, such as ;, *, $, &, |, parentheses, or redirects:
agent "find every use of subprocess.run(..., shell=True) and explain the risk"
Other examples:
agent create a small Python script that prints the current weather from wttr.in
agent find the bug in this repository and fix it
Where This Version Stops
This small agent is a starting point. It demonstrates the essential structure, but it deliberately leaves several production problems unsolved.
The first problem is sandboxing. A raw shell tool is useful for learning, but a serious agent should run commands inside a restricted workspace, container, VM, or remote execution environment.
The second problem is tool design. A shell is universal, but it is not precise. Dedicated tools for reading files, editing files, searching code, running tests, and requesting approval are easier to validate and safer to expose.
The third problem is control. Some actions can run automatically. Others should require explicit human approval. Deleting files, installing packages, pushing code, or spending money should not be treated like reading a directory listing.
The fourth problem is state. This program keeps all context in the chat history. A longer-running agent needs a more durable record of tasks, decisions, attempts, and changes. Every session is temporary and ephemeral.
The fifth problem is observability. Once an agent does real work, you need logs or traces that show prompts, tool calls, command output, errors, and final results.
Those improvements matter, but they do not change the core loop. The model receives messages, requests actions, gets results, and continues. Starting with the small version makes the larger system easier to understand.
Vast.ai gives you the GPU infrastructure to run capable models locally, iterate fast, and keep costs predictable as your agent grows in complexity. Spin up an instance, swap in a different model, and see how far 89 lines can take you.
While this serves as an effective foundation, there is much more to explore. In future articles, we will refine this architecture by introducing robust sandboxing, persistent memory, and sophisticated tool sets to turn this minimal script into a production-grade system. In this way, you'll see the nuts and bolts of how to effectively use remote GPUs in your daily workflow and inspire even more sophisticated ideas for integrating other Vast.ai images to expand your repertoire of AI skills.



