DeepSeek OCR

Model Library/DeepSeek OCR

Vision Language

Contexts Optical Compression vision language model

On-Demand Dedicated 1xRTX 4090

Details

Modalities

vision

Recommended Hardware

1xRTX 4090

Estimated Price

Provider

DeepSeek AI

Family

DeepSeek

Parameters

Context

8192 tokens

License

MIT

DeepSeek OCR is a vision-language model from DeepSeek AI that specializes in optical character recognition and document understanding. The model introduces "Contexts Optical Compression" as its core innovation, optimizing how visual information is compressed when processing text-heavy documents.

Key Features

DeepSeek OCR excels at converting documents and images into structured formats, with particular emphasis on markdown conversion and raw text extraction. The model supports flexible inference modes through multiple configuration sizes (Tiny, Small, Base, Large, Gundam) that can be adjusted based on processing requirements with varying base_size and image_size parameters.

The model includes specialized grounding capabilities using grounding tokens for enhanced document understanding, making it particularly effective at maintaining context and structure during OCR operations. It employs n-gram logit processing for structured output generation, which proves especially useful for complex table extraction tasks.

Architecture

Built on the Transformers framework with Safetensors format, DeepSeek OCR utilizes Flash Attention 2 for optimized performance on NVIDIA GPUs. The architecture supports custom inference parameters including crop_mode for flexible processing of various document layouts and formats. Integration with vLLM enables accelerated inference with batch processing support for production workloads.

Use Cases

DeepSeek OCR is designed for a wide range of document processing applications:

Document digitization and conversion to markdown format
Table extraction from complex document layouts
Multi-page PDF processing and analysis
Batch OCR operations for production workflows
Text extraction from images and scanned documents

Performance and Adoption

The model has achieved significant adoption in the community, with over 4 million downloads monthly. It is actively deployed in more than 78 community Spaces, demonstrating diverse real-world applications across document understanding tasks.

DeepSeek OCR is published under the MIT license, making it accessible for both commercial and non-commercial use.