Blog

SEC Filing Analysis Using Mistral Small 3.1's Long Context Window

- Team Vast

April 4, 2025-AI Document Analysis Mistral

Analyzing lengthy documents like SEC filings has traditionally required complex chunking strategies and careful management of context. However, with Mistral AI's Mistral-Small-3.1-24B-Instruct-2503 model and its 128k token context window, we can now process entire documents at once. This post demonstrates how to leverage this capability using Vast.ai's cloud GPU platform.

The Mistral Small 3.1 model represents a significant advancement in open-source language models. With 24 billion parameters, it combines state-of-the-art vision capabilities with an extended context window that makes it particularly suited for document analysis tasks.

Vast.ai's marketplace provides GPU configurations needed for optimal performance at cost-effective prices.

In this post, we will:

Set up a Vast.ai instance with the right GPU specifications
Deploy the model using vLLM for optimal serving performance
Utilize the model's native chat completions API
Leverage the long context window to analyze full public company annual reports (10-K filings)

This setup demonstrates how Mistral Small 3.1 with extended context windows can transform document analysis workflows, making it possible to process and understand complex documents in their entirety.

Deploy Mistral Small 3.1 on Vast

Let's start by getting our environment set up. First, we need to install the Vast AI client and configure our access.

Install Vast

Head over to the Vast.ai Account Page to grab your API key. Once you have it, we'll use it to set up the client:

#In an environment of your choice
pip install --upgrade vastai

With the client installed, we can configure our API key:

# Here we will set our api key
export VAST_API_KEY="VAST_API_KEY" #Your key here
vastai set api-key $VAST_API_KEY

Choosing the Right Hardware

Now we are going to search for GPUs on Vast.ai to run the Mistral-Small-3.1-24B-Instruct-2503 model. This model requires specific hardware capabilities to run efficiently with vLLM's optimizations. Here are our requirements:

A minimum of 60GB GPU RAM to accommodate:
- Mistral model weights (24B Parameters)
- KV Cache for handling the 128k token context window
A single GPU configuration, as Mistral-Small-3.1-24B-Instruct-2503 can be efficiently served on one GPU: Note: Multi-GPU configurations are supported if higher throughput is needed.
A static IP address for:
- Stable API endpoint hosting
- Consistent client connections
- Reliable Langchain integration
At least one direct port that we can forward for:
- vLLM's OpenAI-compatible API server
- External access to the model endpoint
- Secure request routing
At least 80GB of disk space to hold the model and other dependencies

We'll use vast ai search offers to find the right instance.

vastai search offers "compute_cap >= 750 \
gpu_ram >= 60 \
num_gpus = 1 \
static_ip = true \
direct_port_count >= 1 \
verified = true \
disk_space >= 80 \
rentable = true"

Deploying the Server via Vast

Select a machine from the output above and copy its ID into the INSTANCE_ID variable below.

We will use vastai create instance to create an instance that:

Uses vllm/vllm-openai:latest docker image. This gives us an OpenAI-compatible server.
Forwards port 8000 to the outside of the container, which is the default OpenAI server port.
Forwards --model mistralai/Mistral-Small-3.1-24B-Instruct-2503 to serve the Mistral model.
Uses Mistral-specific parameters:
- --tokenizer_mode mistral to use the Mistral tokenizer.
- --config_format mistral for the correct model config.
- --load_format mistral to properly load the model weights.
Uses --gpu-memory-utilization 0.95 to maximize context window.
Ensures that we have 80 GB of disk space.

Note: Ensure that you fill in your huggingface token HUGGING_FACE_HUB_TOKEN to access the model. You'll need to accept the terms on the model's huggingface page https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503.

export INSTANCE_ID= #insert instance ID
vastai create instance $INSTANCE_ID --image vllm/vllm-openai:latest --env '-p 8000:8000 -e HUGGING_FACE_HUB_TOKEN=HUGGING_FACE_HUB_TOKEN' --disk 80 --args --model mistralai/Mistral-Small-3.1-24B-Instruct-2503   --tokenizer_mode mistral   --config_format mistral   --load_format mistral   --gpu-memory-utilization 0.95

Verify Setup and Get Instance IP Address and Port

Now, we need to verify that our setup is working. We first need to wait for our machine to download the image and the model and start serving. This will take a few minutes. The logs will show you when it's done.

Next, go to the Instances tab in the Vast AI Console and find the instance you just created.

At the top of the instance, there is a button with an IP address in it. Click this and a panel will show up of the IP address and the forwarded ports. You should see something like:

Open Ports
XX.XX.XXX.XX:YYYY -> 8000/tcp

Copy and paste the IP address (XX.XX.XXX.XX) into VAST_IP_ADDRESS and the port (YYYY) into VAST_PORT as inputs to the curl command below.

This curl command sends an OpenAI compatible request to your vLLM server. You should see the response if everything is set up correctly.

Note: It may take a few minutes for the OpenAI server to initialize.

export VAST_IP_ADDRESS="VAST_IP_ADDRESS"
export VAST_PORT="VAST_PORT"
curl -X POST http://$VAST_IP_ADDRESS:$VAST_PORT/v1/completions -H "Content-Type: application/json"  -d '{"model" : "mistralai/Mistral-Small-3.1-24B-Instruct-2503", "prompt": "Hello, how are you?", "max_tokens": 50}'

Set Up Our Model

Setup Dependencies

First, we'll install the OpenAI SDK to communicate with our model on the Vast server.

pip install --upgrade openai

Set up Our Model

Next, let's configure our model using the IP address and port from the Vast.ai Console. We'll create a simple function to interact with the model, using Mistral's recommended temperature of 0.15.

from openai import OpenAI

# Your Vast.ai instance details
VAST_IP_ADDRESS = "VAST_IP_ADDRESS"
VAST_PORT = "VAST_PORT"

# Initialize the client with your server URL
client = OpenAI(
    api_key="EMPTY",  # vLLM doesn't require an actual API key
    base_url=f"http://{VAST_IP_ADDRESS}:{VAST_PORT}/v1"
)

# Simple function to chat with the model
def chat_with_model(prompt):
    response = client.chat.completions.create(
        model="mistralai/Mistral-Small-3.1-24B-Instruct-2503",
        messages=[
            {"role": "user", "content": prompt}
        ],
        temperature=0.15
    )
    return response.choices[0].message.content

Test our Model

Before we go on to long document analysis, let's test our function to make sure everything is working correctly.

response = chat_with_model("what is the capital of France?")
print(response)

Analyze Macy's Annual Reports to Track Store Numbers

The retail landscape has undergone significant changes, with major department stores closing across the US. Using our model's long context window, we'll analyze Macy's annual reports (10-K filings) to track their store count over the years.

Download Filings

First, we need to download our data. We've gathered a list of Macy's annual report URLs from the SEC's EDGAR database (https://www.sec.gov/edgar/search/). Let's create a macys_filings list to store these URLs:

import requests
from bs4 import BeautifulSoup
import re

# List of Macy's 10-K filing URLs
macys_filings = [
    "https://www.sec.gov/Archives/edgar/data/794367/000162828024012734/m-20240203.htm",
    "https://www.sec.gov/Archives/edgar/data/794367/000162828023009154/m-20230128.htm",
    "https://www.sec.gov/Archives/edgar/data/794367/000156459022011726/m-10k_20220129.htm",
    "https://www.sec.gov/Archives/edgar/data/794367/000156459021016119/m-10k_20210130.htm",
    "https://www.sec.gov/Archives/edgar/data/794367/000079436720000040/m-0201202010xk.htm",
    "https://www.sec.gov/Archives/edgar/data/794367/000079436719000038/m-0202201910xk.htm",
    "https://www.sec.gov/Archives/edgar/data/794367/000079436718000036/m-0203201810k.htm",
    "https://www.sec.gov/Archives/edgar/data/794367/000079436717000041/m-0128201710k.htm",
    "https://www.sec.gov/Archives/edgar/data/794367/000079436716000221/m-0130201610k.htm",
    "https://www.sec.gov/Archives/edgar/data/794367/000079436715000073/m-0131201510k.htm"
]

Next, we'll create a function that downloads and cleans the text from each filing, removing HTML formatting and other unnecessary elements:

# Simple SEC headers
headers = {
    "User-Agent": "Your Name (your.email@example.com)",
    "Accept-Encoding": "gzip, deflate",
    "Host": "www.sec.gov"
}

def download_filing_text(url):
    """Download a filing and extract its text."""
    try:
        # Download the filing
        response = requests.get(url, headers=headers, timeout=30)
        response.raise_for_status()
        
        # Parse the HTML and extract text
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Remove script and style elements
        for script in soup(["script", "style"]):
            script.decompose()
        
        # Get all text with minimal processing
        text = soup.get_text(separator=' ')
        
        # Basic cleaning - collapse whitespace
        text = re.sub(r'\s+', ' ', text).strip()
        
        return text
        
    except Exception as e:
        print(f"Error downloading filing: {str(e)}")
        return ""

Now that we have our download function ready, let's use it to process all ten years of Macy's reports. We'll loop through each URL, download the content, and store the cleaned text:

# Process all filings
filings_text = []

for i, url in enumerate(macys_filings):
    
    # Download and process filing
    print(f"Downloading {url} filing...")
    text = download_filing_text(url)
    
    if text:  # If we got text back (not empty string)
        print(f"✓ Successfully processed filing ({len(text)} characters)")
        filings_text.append(text)
    else:
        print(f"✗ Failed to process filing")

# Print summary
print("\n=== PROCESSING SUMMARY ===")
print(f"Successfully processed: {len(filings_text)}/{len(macys_filings)} filings")

Extract Store Counts from Filings

Now comes the exciting part - analyzing the filings to track Macy's retail footprint over time. We'll use our model's long context window to examine each document in its entirety, asking a simple but powerful question: "How many stores did Macy's operate this fiscal year?" This approach demonstrates the advantage of processing complete documents at once - we don't need to worry about missing context or searching through different sections manually.

For each filing, we'll store the model's response in an all_responses list, building a comprehensive picture of Macy's store count evolution:

all_responses = []
for i, text in enumerate(filings_text):
    print(f"\nProcessing file {i+1}/{len(filings_text)}")
    
    try:
        prompt = f"How many stores did Macy's operate this fiscal year? DATA: {text}"
        response = chat_with_model(prompt)
        all_responses.append(f"File {i+1} analysis: {response}")
        print(f"✓ File {i+1} processed successfully")

        print(response)
        
    except Exception as e:
        print(f"✗ Error processing file {i+1}: {str(e)}")
        all_responses.append(f"File {i+1} error: {str(e)}")
        
# Print all responses
print("\n=== ANALYSIS OF ALL FILES ===")
for response in all_responses:
    print(f"\n{response}\n")

The model's responses show detailed information extraction - accurately identifying store counts while providing precise dates (e.g., "February 3, 2024"), section references (e.g., "Item 1. Business. General"), and geographical context (e.g., "43 states, the District of Columbia, Puerto Rico and Guam") from each filing:

=== ANALYSIS OF ALL FILES ===

File 1 analysis: As of February 3, 2024, Macy's operated 718 store locations.


File 2 analysis: As of January 28, 2023, Macy's operated 722 store locations. This information is explicitly stated in the document under the section "Item 1. Business. General" where it mentions:

"As of January 28, 2023, the Company operated 722 store locations in 43 states, the District of Columbia, Puerto Rico and Guam."


File 3 analysis: As of January 29, 2022, Macy's operated 725 store locations. This information is explicitly stated in the "Item 2. Properties" section of the 10-K filing:

"As of January 29, 2022, the operations of the Company included 725 store locations in 43 states, the District of Columbia, Puerto Rico and Guam, comprising a total of approximately 112 million square feet."


File 4 analysis: Macy's, Inc. operated 727 store locations in 43 states, the District of Columbia, Puerto Rico, and Guam as of January 30, 2021.


File 5 analysis: As of February 1, 2020, Macy's operated 775 store locations in 43 states, the District of Columbia, Puerto Rico, and Guam.


File 6 analysis: As of February 2, 2019, Macy's operated a total of **867 stores**. This information is detailed in the 10-K filing under the section discussing the company's properties and store count activity.


File 7 analysis: As of February 3, 2018, Macy's operated approximately **852 stores** in 44 states, the District of Columbia, Puerto Rico, and Guam.


File 8 analysis: As of January 28, 2017, Macy's operated a total of **829 stores**. This number includes stores across various brands such as Macy's, Bloomingdale's, and Bluemercury, located in 45 states, the District of Columbia, Guam, and Puerto Rico.


File 9 analysis: As of January 30, 2016, Macy's operated a total of **870 stores**. This included Macy's, Macy's Backstage, Bloomingdale's, Bloomingdale's Outlet, and Bluemercury stores across 45 states, the District of Columbia, Guam, and Puerto Rico.


File 10 analysis: As of January 31, 2015, Macy's operated a total of 823 stores.

Summarize Store Count Findings

Now that we have the store counts for each year, we will have Mistral summarize the results for us.

prompt = "Give me a list of macys store numbers from 2015-2024 DATA: " + " ".join(all_responses)
response = chat_with_model(prompt)

print(response)

We now have a concise summary of the number of stores that Macy's operated in each year.

Based on the provided data, here is the list of Macy's store numbers from 2015 to 2024:

- **2015:** 823 stores
- **2016:** 870 stores
- **2017:** 829 stores
- **2018:** 852 stores
- **2019:** 867 stores
- **2020:** 775 stores
- **2021:** 727 stores
- **2022:** 725 stores
- **2023:** 722 stores
- **2024:** 718 stores

This list reflects the number of store locations Macy's operated as of the specified dates each year.

Conclusion

In this post, we demonstrated how Mistral Small 3.1's 128k token context window transforms SEC filing analysis. We:

Deployed Mistral's 24B parameter model on a cost-effective Vast.ai GPU instance
Processed complete 10-K filings without splitting them into smaller chunks
Extracted store count data from various sections of Macy's annual reports
Maintained the full document context throughout the analysis process

With this foundation in place, we can tackle more sophisticated document analysis tasks. While tracking store counts offers a simple demonstration, the real value lies in analyzing complex relationships across entire documents. Financial analysts can:

Extract specific metrics while understanding their broader context
Connect information from management discussions with financial data
Assess risk factors in relation to company performance
Identify patterns across different sections of a single document

This capability extends beyond financial analysis. The same approach works for legal contracts, academic papers, technical documentation, and any other domain where processing long documents as complete units provides better insights.

By combining Mistral's context window capabilities with Vast.ai's accessible GPU infrastructure, comprehensive document analysis becomes more practical and accessible for a wider range of users and use cases.

Share on

Continue Reading:

Running Google's Gemma 3 on Vast.ai

Alibaba's Qwen: An Open-Source AI Model that Surpasses DeepSeek?

NVIDIA GeForce RTX 5090 Specs: Everything You Need to Know

Solutions
Hosting
Console

Contact
Get in Touch

All the answers you need in 24h or less.