April 4, 2025-AIDocument AnalysisMistral
Analyzing lengthy documents like SEC filings has traditionally required complex chunking strategies and careful management of context. However, with Mistral AI's Mistral-Small-3.1-24B-Instruct-2503 model and its 128k token context window, we can now process entire documents at once. This post demonstrates how to leverage this capability using Vast.ai's cloud GPU platform.
The Mistral Small 3.1 model represents a significant advancement in open-source language models. With 24 billion parameters, it combines state-of-the-art vision capabilities with an extended context window that makes it particularly suited for document analysis tasks.
Vast.ai's marketplace provides GPU configurations needed for optimal performance at cost-effective prices.
In this post, we will:
This setup demonstrates how Mistral Small 3.1 with extended context windows can transform document analysis workflows, making it possible to process and understand complex documents in their entirety.
Let's start by getting our environment set up. First, we need to install the Vast AI client and configure our access.
Head over to the Vast.ai Account Page to grab your API key. Once you have it, we'll use it to set up the client:
#In an environment of your choice
pip install --upgrade vastai
With the client installed, we can configure our API key:
# Here we will set our api key
export VAST_API_KEY="VAST_API_KEY" #Your key here
vastai set api-key $VAST_API_KEY
Now we are going to search for GPUs on Vast.ai to run the Mistral-Small-3.1-24B-Instruct-2503 model. This model requires specific hardware capabilities to run efficiently with vLLM's optimizations. Here are our requirements:
A minimum of 60GB GPU RAM to accommodate:
A single GPU configuration, as Mistral-Small-3.1-24B-Instruct-2503 can be efficiently served on one GPU: Note: Multi-GPU configurations are supported if higher throughput is needed.
A static IP address for:
At least one direct port that we can forward for:
At least 80GB of disk space to hold the model and other dependencies
We'll use vast ai search offers
to find the right instance.
vastai search offers "compute_cap >= 750 \
gpu_ram >= 60 \
num_gpus = 1 \
static_ip = true \
direct_port_count >= 1 \
verified = true \
disk_space >= 80 \
rentable = true"
Select a machine from the output above and copy its ID into the INSTANCE_ID
variable below.
We will use vastai create instance
to create an instance that:
vllm/vllm-openai:latest
docker image. This gives us an OpenAI-compatible server.8000
to the outside of the container, which is the default OpenAI server port.--model mistralai/Mistral-Small-3.1-24B-Instruct-2503
to serve the Mistral model.--tokenizer_mode mistral
to use the Mistral tokenizer.--config_format mistral
for the correct model config.--load_format mistral
to properly load the model weights.--gpu-memory-utilization 0.95
to maximize context window.Note: Ensure that you fill in your huggingface token HUGGING_FACE_HUB_TOKEN
to access the model. You'll need to accept the terms on the model's huggingface page https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503.
export INSTANCE_ID= #insert instance ID
vastai create instance $INSTANCE_ID --image vllm/vllm-openai:latest --env '-p 8000:8000 -e HUGGING_FACE_HUB_TOKEN=HUGGING_FACE_HUB_TOKEN' --disk 80 --args --model mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --gpu-memory-utilization 0.95
Now, we need to verify that our setup is working. We first need to wait for our machine to download the image and the model and start serving. This will take a few minutes. The logs will show you when it's done.
Next, go to the Instances tab in the Vast AI Console and find the instance you just created.
At the top of the instance, there is a button with an IP address in it. Click this and a panel will show up of the IP address and the forwarded ports. You should see something like:
Open Ports
XX.XX.XXX.XX:YYYY -> 8000/tcp
Copy and paste the IP address (XX.XX.XXX.XX) into VAST_IP_ADDRESS
and the port (YYYY) into VAST_PORT
as inputs to the curl command below.
This curl command sends an OpenAI compatible request to your vLLM
server. You should see the response if everything is set up correctly.
Note: It may take a few minutes for the OpenAI server to initialize.
export VAST_IP_ADDRESS="VAST_IP_ADDRESS"
export VAST_PORT="VAST_PORT"
curl -X POST http://$VAST_IP_ADDRESS:$VAST_PORT/v1/completions -H "Content-Type: application/json" -d '{"model" : "mistralai/Mistral-Small-3.1-24B-Instruct-2503", "prompt": "Hello, how are you?", "max_tokens": 50}'
First, we'll install the OpenAI SDK to communicate with our model on the Vast server.
pip install --upgrade openai
Next, let's configure our model using the IP address and port from the Vast.ai Console. We'll create a simple function to interact with the model, using Mistral's recommended temperature of 0.15.
from openai import OpenAI
# Your Vast.ai instance details
VAST_IP_ADDRESS = "VAST_IP_ADDRESS"
VAST_PORT = "VAST_PORT"
# Initialize the client with your server URL
client = OpenAI(
api_key="EMPTY", # vLLM doesn't require an actual API key
base_url=f"http://{VAST_IP_ADDRESS}:{VAST_PORT}/v1"
)
# Simple function to chat with the model
def chat_with_model(prompt):
response = client.chat.completions.create(
model="mistralai/Mistral-Small-3.1-24B-Instruct-2503",
messages=[
{"role": "user", "content": prompt}
],
temperature=0.15
)
return response.choices[0].message.content
Before we go on to long document analysis, let's test our function to make sure everything is working correctly.
response = chat_with_model("what is the capital of France?")
print(response)
The retail landscape has undergone significant changes, with major department stores closing across the US. Using our model's long context window, we'll analyze Macy's annual reports (10-K filings) to track their store count over the years.
First, we need to download our data. We've gathered a list of Macy's annual report URLs from the SEC's EDGAR database (https://www.sec.gov/edgar/search/). Let's create a macys_filings
list to store these URLs:
import requests
from bs4 import BeautifulSoup
import re
# List of Macy's 10-K filing URLs
macys_filings = [
"https://www.sec.gov/Archives/edgar/data/794367/000162828024012734/m-20240203.htm",
"https://www.sec.gov/Archives/edgar/data/794367/000162828023009154/m-20230128.htm",
"https://www.sec.gov/Archives/edgar/data/794367/000156459022011726/m-10k_20220129.htm",
"https://www.sec.gov/Archives/edgar/data/794367/000156459021016119/m-10k_20210130.htm",
"https://www.sec.gov/Archives/edgar/data/794367/000079436720000040/m-0201202010xk.htm",
"https://www.sec.gov/Archives/edgar/data/794367/000079436719000038/m-0202201910xk.htm",
"https://www.sec.gov/Archives/edgar/data/794367/000079436718000036/m-0203201810k.htm",
"https://www.sec.gov/Archives/edgar/data/794367/000079436717000041/m-0128201710k.htm",
"https://www.sec.gov/Archives/edgar/data/794367/000079436716000221/m-0130201610k.htm",
"https://www.sec.gov/Archives/edgar/data/794367/000079436715000073/m-0131201510k.htm"
]
Next, we'll create a function that downloads and cleans the text from each filing, removing HTML formatting and other unnecessary elements:
# Simple SEC headers
headers = {
"User-Agent": "Your Name (your.email@example.com)",
"Accept-Encoding": "gzip, deflate",
"Host": "www.sec.gov"
}
def download_filing_text(url):
"""Download a filing and extract its text."""
try:
# Download the filing
response = requests.get(url, headers=headers, timeout=30)
response.raise_for_status()
# Parse the HTML and extract text
soup = BeautifulSoup(response.text, 'html.parser')
# Remove script and style elements
for script in soup(["script", "style"]):
script.decompose()
# Get all text with minimal processing
text = soup.get_text(separator=' ')
# Basic cleaning - collapse whitespace
text = re.sub(r'\s+', ' ', text).strip()
return text
except Exception as e:
print(f"Error downloading filing: {str(e)}")
return ""
Now that we have our download function ready, let's use it to process all ten years of Macy's reports. We'll loop through each URL, download the content, and store the cleaned text:
# Process all filings
filings_text = []
for i, url in enumerate(macys_filings):
# Download and process filing
print(f"Downloading {url} filing...")
text = download_filing_text(url)
if text: # If we got text back (not empty string)
print(f"✓ Successfully processed filing ({len(text)} characters)")
filings_text.append(text)
else:
print(f"✗ Failed to process filing")
# Print summary
print("\n=== PROCESSING SUMMARY ===")
print(f"Successfully processed: {len(filings_text)}/{len(macys_filings)} filings")
Now comes the exciting part - analyzing the filings to track Macy's retail footprint over time. We'll use our model's long context window to examine each document in its entirety, asking a simple but powerful question: "How many stores did Macy's operate this fiscal year?" This approach demonstrates the advantage of processing complete documents at once - we don't need to worry about missing context or searching through different sections manually.
For each filing, we'll store the model's response in an all_responses
list, building a comprehensive picture of Macy's store count evolution:
all_responses = []
for i, text in enumerate(filings_text):
print(f"\nProcessing file {i+1}/{len(filings_text)}")
try:
prompt = f"How many stores did Macy's operate this fiscal year? DATA: {text}"
response = chat_with_model(prompt)
all_responses.append(f"File {i+1} analysis: {response}")
print(f"✓ File {i+1} processed successfully")
print(response)
except Exception as e:
print(f"✗ Error processing file {i+1}: {str(e)}")
all_responses.append(f"File {i+1} error: {str(e)}")
# Print all responses
print("\n=== ANALYSIS OF ALL FILES ===")
for response in all_responses:
print(f"\n{response}\n")
The model's responses show detailed information extraction - accurately identifying store counts while providing precise dates (e.g., "February 3, 2024"), section references (e.g., "Item 1. Business. General"), and geographical context (e.g., "43 states, the District of Columbia, Puerto Rico and Guam") from each filing:
=== ANALYSIS OF ALL FILES ===
File 1 analysis: As of February 3, 2024, Macy's operated 718 store locations.
File 2 analysis: As of January 28, 2023, Macy's operated 722 store locations. This information is explicitly stated in the document under the section "Item 1. Business. General" where it mentions:
"As of January 28, 2023, the Company operated 722 store locations in 43 states, the District of Columbia, Puerto Rico and Guam."
File 3 analysis: As of January 29, 2022, Macy's operated 725 store locations. This information is explicitly stated in the "Item 2. Properties" section of the 10-K filing:
"As of January 29, 2022, the operations of the Company included 725 store locations in 43 states, the District of Columbia, Puerto Rico and Guam, comprising a total of approximately 112 million square feet."
File 4 analysis: Macy's, Inc. operated 727 store locations in 43 states, the District of Columbia, Puerto Rico, and Guam as of January 30, 2021.
File 5 analysis: As of February 1, 2020, Macy's operated 775 store locations in 43 states, the District of Columbia, Puerto Rico, and Guam.
File 6 analysis: As of February 2, 2019, Macy's operated a total of **867 stores**. This information is detailed in the 10-K filing under the section discussing the company's properties and store count activity.
File 7 analysis: As of February 3, 2018, Macy's operated approximately **852 stores** in 44 states, the District of Columbia, Puerto Rico, and Guam.
File 8 analysis: As of January 28, 2017, Macy's operated a total of **829 stores**. This number includes stores across various brands such as Macy's, Bloomingdale's, and Bluemercury, located in 45 states, the District of Columbia, Guam, and Puerto Rico.
File 9 analysis: As of January 30, 2016, Macy's operated a total of **870 stores**. This included Macy's, Macy's Backstage, Bloomingdale's, Bloomingdale's Outlet, and Bluemercury stores across 45 states, the District of Columbia, Guam, and Puerto Rico.
File 10 analysis: As of January 31, 2015, Macy's operated a total of 823 stores.
Now that we have the store counts for each year, we will have Mistral summarize the results for us.
prompt = "Give me a list of macys store numbers from 2015-2024 DATA: " + " ".join(all_responses)
response = chat_with_model(prompt)
print(response)
We now have a concise summary of the number of stores that Macy's operated in each year.
Based on the provided data, here is the list of Macy's store numbers from 2015 to 2024:
- **2015:** 823 stores
- **2016:** 870 stores
- **2017:** 829 stores
- **2018:** 852 stores
- **2019:** 867 stores
- **2020:** 775 stores
- **2021:** 727 stores
- **2022:** 725 stores
- **2023:** 722 stores
- **2024:** 718 stores
This list reflects the number of store locations Macy's operated as of the specified dates each year.
In this post, we demonstrated how Mistral Small 3.1's 128k token context window transforms SEC filing analysis. We:
With this foundation in place, we can tackle more sophisticated document analysis tasks. While tracking store counts offers a simple demonstration, the real value lies in analyzing complex relationships across entire documents. Financial analysts can:
This capability extends beyond financial analysis. The same approach works for legal contracts, academic papers, technical documentation, and any other domain where processing long documents as complete units provides better insights.
By combining Mistral's context window capabilities with Vast.ai's accessible GPU infrastructure, comprehensive document analysis becomes more practical and accessible for a wider range of users and use cases.