Baidu Open-Sources Unlimited-OCR Document Parsing Model With Reference Sliding Window Attention Mechanism
2026-06-23 18:45

Baidu has disclosed the Unlimited-OCR document intelligent parsing large model alongside a technical report, with industry speculation linking the project's CTO 'YY' to former DeepSeek-OCR core author Wei Haoran. Data compiled by Woofun AI shows that Unlimited-OCR achieved a score of 93.92% in the OmniDocBench v1.6 long document parsing benchmark, establishing a new end-to-end SOTA record.

To mitigate the linear surge in key-value cache (KV cache) that typically causes slowdowns and excessive GPU memory consumption in traditional models, Baidu deployed the Reference Sliding Window Attention mechanism (R-SWA). This approach limits the model's focus during decoding to all image features and a fixed window of recently generated text (default 128 tokens), keeping the total KV cache volume constant. Consequently, R-SWA prevents image detail blurring during window updates and ensures stable inference speed and GPU memory usage for documents exceeding 40 pages, delivering a 12.7% speedup compared to DeepSeek-OCR. Baidu has released the code and weights under the MIT license, supporting Hugging Face Transformers, vLLM, and SGLang, with plans to extend R-SWA to Automatic Speech Recognition (ASR) and translation tasks.

Disclaimer: Views are the author's own and do not represent the platform. Do not reproduce without permission. Content is for reference only, not investment advice. Trade at your own risk.
Tags:
YY
Wei Haoran
DeepSeek
Unlimited OCR
Unlimited-OCR
DeepSeek-OCR
OmniDocBench v1.6
Reference Sliding Window Attention
R-SWA
Hugging Face Transformers
vLLM
SGLang
Baidu
Share:
back