--- license: apache-2.0 language: - zh - en pipeline_tag: object-detection tags: - document-ai - document-layout-analysis - patent - pdf - hiro - patsnap datasets: - in-house metrics: - precision - recall - f1 library_name: transformers --- # Hiro-Layout: Document Layout Analysis for Patent and Technical PDFs English | [简体中文](README_zh.md) Hiro-Layout is a document layout analysis model for patent and technical PDF pages. It detects and classifies page regions such as text, titles, headers, footers, tables, formulas, chemical structures, figures, captions, search reports, bibliographies, and other patent-specific layout elements. ## Highlights - Patent-focused layout understanding: covers common patent PDF regions and patent-specific structures. - Technical document coverage: evaluated on both patent PDFs and NPD PDFs. - Fine-grained taxonomy: 25 layout categories across figure, text, and complex document elements. ## Model Overview | Item | Details | | --- | --- | | Model name | Hiro-Layout | | Current artifact | `layout_model/RT-DETR_25.onnx` | | Task | Document layout analysis / page region detection | | Input | Rendered PDF page image | | Output | Layout regions with class labels | | Domains | Patent PDFs, technical/NPD PDFs | | License | Apache-2.0 | ## Layout Taxonomy | Group | Class | Abbr. | Chinese | | --- | --- | --- | --- | | figure | graph | graph | 图表 | | figure | drawing | draw | 绘制图 | | figure | structure diagram | struc | 结构图 | | figure | photograph | photo | 照片 | | figure | table | tab | 表格 | | figure | math equation | eqn | 数学公式 | | figure | chemical formula | chem | 化学式 | | figure | noise | noise | 噪声 | | text | text | text | 文本 | | text | title | title | 标题 | | text | section title | sec | 章节标题 | | text | page header | head | 页眉 | | text | page footer | foot | 页脚 | | text | marginal note | mnote | 边注 | | text | caption | cap | 说明 | | text | figure number | figno | 编号 | | text | line number | lineno | 行号 | | text | column number | colno | 栏号 | | text | sequence | seq | 序列表 | | complex | figure complex | figcx | 图片组 | | complex | chemical reaction | rxn | 反应式 | | complex | bibliography | bib | 著录页 | | complex | search report | srep | 搜索报告 | | complex | Table of Contents | toc | 目录 | | complex | reference | ref | 参考文献 | ## Benchmarks Metrics are reported as Precision, Recall, and F1. | Benchmark | Labels | Precision | Recall | F1 | | --- | ---: | ---: | ---: | ---: | | Patent PDF | 33,054 | 0.8144 | 0.7711 | 0.7922 | | NPD PDF | 17,769 | 0.7090 | 0.6983 | 0.7036 | ### Patent PDF | # | Group | Abbr. | Class | Chinese | Labels | Precision | Recall | F1 | |---:|---|---|---|---|---:|---:|---:|---:| | 1 | figure | graph | graph | 图表 | 215 | 0.7611 | 0.8000 | 0.7800 | | 2 | figure | draw | drawing | 绘制图 | 420 | 0.8649 | 0.3048 | 0.4507 | | 3 | figure | struc | structure diagram | 结构图 | 626 | 0.6579 | 0.8355 | 0.7361 | | 4 | figure | photo | photograph | 照片 | 147 | 0.8378 | 0.8435 | 0.8407 | | 5 | figure | tab | table | 表格 | 198 | 0.7759 | 0.9091 | 0.8372 | | 6 | figure | eqn | math equation | 数学公式 | 399 | 0.7762 | 0.6692 | 0.7187 | | 7 | figure | chem | chemical formula | 化学式 | 1,099 | 0.8792 | 0.8944 | 0.8868 | | 8 | figure | noise | noise | 噪声 | 1,241 | 0.7025 | 0.7687 | 0.7341 | | 9 | text | text | text | 文本 | 17,668 | 0.8182 | 0.8062 | 0.8122 | | 10 | text | title | title | 标题 | 601 | 0.9117 | 0.8070 | 0.8561 | | 11 | text | sec | section title | 章节标题 | 1,394 | 0.7968 | 0.7088 | 0.7502 | | 12 | text | head | page header | 页眉 | 3,074 | 0.8187 | 0.7788 | 0.7983 | | 13 | text | foot | page footer | 页脚 | 1,012 | 0.7432 | 0.6433 | 0.6896 | | 14 | text | mnote | marginal note | 边注 | 421 | 0.7794 | 0.5202 | 0.6239 | | 15 | text | cap | caption | 说明 | 80 | 0.6842 | 0.4875 | 0.5693 | | 16 | text | figno | figure number | 编号 | 1,389 | 0.8955 | 0.7466 | 0.8143 | | 17 | text | lineno | line number | 行号 | 341 | 0.7759 | 0.6598 | 0.7132 | | 18 | text | colno | column number | 栏号 | 449 | 0.6964 | 0.4699 | 0.5612 | | 19 | text | seq | sequence | 序列表 | 136 | 0.4430 | 0.2574 | 0.3256 | | 20 | complex | figcx | figure complex | 图片组 | 1,416 | 0.8657 | 0.7373 | 0.7963 | | 21 | complex | rxn | chemical reaction | 反应式 | 150 | 0.8898 | 0.7000 | 0.7836 | | 22 | complex | bib | bibliography | 著录页 | 470 | 0.9615 | 0.7979 | 0.8721 | | 23 | complex | srep | search report | 搜索报告 | 106 | 0.9052 | 0.9906 | 0.9459 | | 24 | complex | toc | Table of Contents | 目录 | 0 | 0.0000 | 0.0000 | 0.0000 | | 25 | complex | ref | reference | 参考文献 | 2 | 0.0000 | 0.0000 | 0.0000 | | ALL | | | | | 33,054 | 0.8144 | 0.7711 | 0.7922 | ### NPD PDF | # | Group | Abbr. | Class | Chinese | Labels | Precision | Recall | F1 | |---:|---|---|---|---|---:|---:|---:|---:| | 1 | figure | graph | graph | 图表 | 248 | 0.6838 | 0.6976 | 0.6906 | | 2 | figure | draw | drawing | 绘制图 | 9 | 0.0000 | 0.0000 | 0.0000 | | 3 | figure | struc | structure diagram | 结构图 | 341 | 0.7454 | 0.7126 | 0.7286 | | 4 | figure | photo | photograph | 照片 | 82 | 0.6071 | 0.6220 | 0.6145 | | 5 | figure | tab | table | 表格 | 209 | 0.7533 | 0.8182 | 0.7844 | | 6 | figure | eqn | math equation | 数学公式 | 298 | 0.6789 | 0.5604 | 0.6140 | | 7 | figure | chem | chemical formula | 化学式 | 388 | 0.7324 | 0.8325 | 0.7793 | | 8 | figure | noise | noise | 噪声 | 695 | 0.4823 | 0.4302 | 0.4548 | | 9 | text | text | text | 文本 | 9,119 | 0.6943 | 0.7625 | 0.7268 | | 10 | text | title | title | 标题 | 304 | 0.7130 | 0.5395 | 0.6142 | | 11 | text | sec | section title | 章节标题 | 1,539 | 0.7337 | 0.6160 | 0.6697 | | 12 | text | head | page header | 页眉 | 1,246 | 0.7464 | 0.7111 | 0.7283 | | 13 | text | foot | page footer | 页脚 | 1,339 | 0.7711 | 0.6468 | 0.7035 | | 14 | text | mnote | marginal note | 边注 | 190 | 0.5714 | 0.2947 | 0.3889 | | 15 | text | cap | caption | 说明 | 573 | 0.8711 | 0.5899 | 0.7034 | | 16 | text | figno | figure number | 编号 | 149 | 0.6078 | 0.4161 | 0.4940 | | 17 | text | lineno | line number | 行号 | 41 | 0.6667 | 0.9268 | 0.7755 | | 18 | text | colno | column number | 栏号 | 0 | 0.0000 | 0.0000 | 0.0000 | | 19 | text | seq | sequence | 序列表 | 18 | 0.7000 | 0.3889 | 0.5000 | | 20 | complex | figcx | figure complex | 图片组 | 734 | 0.7657 | 0.7480 | 0.7567 | | 21 | complex | rxn | chemical reaction | 反应式 | 36 | 0.8947 | 0.4722 | 0.6182 | | 22 | complex | bib | bibliography | 著录页 | 0 | 0.0000 | 0.0000 | 0.0000 | | 23 | complex | srep | search report | 搜索报告 | 3 | 0.4286 | 1.0000 | 0.6000 | | 24 | complex | toc | Table of Contents | 目录 | 76 | 0.8475 | 0.6579 | 0.7407 | | 25 | complex | ref | reference | 参考文献 | 132 | 0.8148 | 0.3333 | 0.4731 | | ALL | | | | | 17,769 | 0.7090 | 0.6983 | 0.7036 | ## Usage The current model artifact is an ONNX export: ```text layout_model/RT-DETR_25.onnx ``` The model can be loaded with ONNXRuntime: ```python import onnxruntime as ort session = ort.InferenceSession("layout_model/RT-DETR_25.onnx") print("inputs:", [i.name for i in session.get_inputs()]) print("outputs:", [o.name for o in session.get_outputs()]) ``` Use `labels.json` for the 25-class label mapping. ## Repository Files | File | Purpose | | --- | --- | | `README.md` | Hugging Face model card in English | | `README_zh.md` | Chinese model card | | `EVALUATION.md` | Detailed benchmark results derived from the workbook | | `labels.json` | Machine-readable 25-class label mapping | | `layout_model/RT-DETR_25.onnx` | ONNX model artifact | | `requirements.txt` | Minimal dependencies for ONNX loading and image preprocessing | | `LICENSE` | Apache-2.0 license | | `DISCLAIMER.md` | Model limitations and responsible-use notes | | `NOTICE` | Copyright and trademark notice | | `OPEN_SOURCE_CHECKLIST.md` | Release checklist before public upload | ## Limitations - Layout predictions may be inaccurate on low-resolution scans, heavily rotated pages, handwritten documents, unusual patent formats, or unseen page templates. - Small objects and sparse categories can have unstable metrics when the evaluation set has very few labels. - The model should not be used as the sole source of truth for legal, compliance, filing, archival, or customer-facing workflows without human review. - Users are responsible for ensuring they have the right to process and share any documents used with this model. ## License This project is released under the Apache License 2.0. See [LICENSE](LICENSE). ## Copyright Notice Copyright (c) 2026 Patsnap. All rights reserved except as expressly licensed under the applicable license terms. Hiro-Layout, Hiro, Patsnap, and any associated names, logos, product names, service names, designs, and slogans are trademarks or registered trademarks of Patsnap or its affiliates. No trademark license is granted under the open source license or any model license unless expressly stated.