File size: 9,173 Bytes
a9871c3
 
fc08c11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a9871c3
fc08c11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
abac692
fc08c11
 
 
 
 
 
abac692
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fc08c11
 
 
 
 
 
 
 
 
abac692
fc08c11
 
 
 
 
 
 
 
 
abac692
fc08c11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
---
license: apache-2.0
language:
- zh
- en
pipeline_tag: object-detection
tags:
- document-ai
- document-layout-analysis
- patent
- pdf
- hiro
- patsnap
datasets:
- in-house
metrics:
- precision
- recall
- f1
library_name: transformers
---

# Hiro-Layout: Document Layout Analysis for Patent and Technical PDFs

English | [简体中文](README_zh.md)

Hiro-Layout is a document layout analysis model for patent and technical PDF pages. It detects and classifies page regions such as text, titles, headers, footers, tables, formulas, chemical structures, figures, captions, search reports, bibliographies, and other patent-specific layout elements.

## Highlights

- Patent-focused layout understanding: covers common patent PDF regions and patent-specific structures.
- Technical document coverage: evaluated on both patent PDFs and NPD PDFs.
- Fine-grained taxonomy: 25 layout categories across figure, text, and complex document elements.

## Model Overview

| Item | Details |
| --- | --- |
| Model name | Hiro-Layout |
| Current artifact | `layout_model/RT-DETR_25.onnx` |
| Task | Document layout analysis / page region detection |
| Input | Rendered PDF page image |
| Output | Layout regions with class labels |
| Domains | Patent PDFs, technical/NPD PDFs |
| License | Apache-2.0 |

## Layout Taxonomy

| Group | Class | Abbr. | Chinese |
| --- | --- | --- | --- |
| figure | graph | graph | 图表 |
| figure | drawing | draw | 绘制图 |
| figure | structure diagram | struc | 结构图 |
| figure | photograph | photo | 照片 |
| figure | table | tab | 表格 |
| figure | math equation | eqn | 数学公式 |
| figure | chemical formula | chem | 化学式 |
| figure | noise | noise | 噪声 |
| text | text | text | 文本 |
| text | title | title | 标题 |
| text | section title | sec | 章节标题 |
| text | page header | head | 页眉 |
| text | page footer | foot | 页脚 |
| text | marginal note | mnote | 边注 |
| text | caption | cap | 说明 |
| text | figure number | figno | 编号 |
| text | line number | lineno | 行号 |
| text | column number | colno | 栏号 |
| text | sequence | seq | 序列表 |
| complex | figure complex | figcx | 图片组 |
| complex | chemical reaction | rxn | 反应式 |
| complex | bibliography | bib | 著录页 |
| complex | search report | srep | 搜索报告 |
| complex | Table of Contents | toc | 目录 |
| complex | reference | ref | 参考文献 |

## Benchmarks

Metrics are reported as Precision, Recall, and F1.

| Benchmark | Labels | Precision | Recall | F1 |
| --- | ---: | ---: | ---: | ---: |
| Patent PDF | 33,054 | 0.8144 | 0.7711 | 0.7922 |
| NPD PDF | 17,769 | 0.7090 | 0.6983 | 0.7036 |

### Patent PDF

| # | Group | Abbr. | Class | Chinese | Labels | Precision | Recall | F1 |
|---:|---|---|---|---|---:|---:|---:|---:|
| 1 | figure | graph | graph | 图表 | 215 | 0.7611 | 0.8000 | 0.7800 |
| 2 | figure | draw | drawing | 绘制图 | 420 | 0.8649 | 0.3048 | 0.4507 |
| 3 | figure | struc | structure diagram | 结构图 | 626 | 0.6579 | 0.8355 | 0.7361 |
| 4 | figure | photo | photograph | 照片 | 147 | 0.8378 | 0.8435 | 0.8407 |
| 5 | figure | tab | table | 表格 | 198 | 0.7759 | 0.9091 | 0.8372 |
| 6 | figure | eqn | math equation | 数学公式 | 399 | 0.7762 | 0.6692 | 0.7187 |
| 7 | figure | chem | chemical formula | 化学式 | 1,099 | 0.8792 | 0.8944 | 0.8868 |
| 8 | figure | noise | noise | 噪声 | 1,241 | 0.7025 | 0.7687 | 0.7341 |
| 9 | text | text | text | 文本 | 17,668 | 0.8182 | 0.8062 | 0.8122 |
| 10 | text | title | title | 标题 | 601 | 0.9117 | 0.8070 | 0.8561 |
| 11 | text | sec | section title | 章节标题 | 1,394 | 0.7968 | 0.7088 | 0.7502 |
| 12 | text | head | page header | 页眉 | 3,074 | 0.8187 | 0.7788 | 0.7983 |
| 13 | text | foot | page footer | 页脚 | 1,012 | 0.7432 | 0.6433 | 0.6896 |
| 14 | text | mnote | marginal note | 边注 | 421 | 0.7794 | 0.5202 | 0.6239 |
| 15 | text | cap | caption | 说明 | 80 | 0.6842 | 0.4875 | 0.5693 |
| 16 | text | figno | figure number | 编号 | 1,389 | 0.8955 | 0.7466 | 0.8143 |
| 17 | text | lineno | line number | 行号 | 341 | 0.7759 | 0.6598 | 0.7132 |
| 18 | text | colno | column number | 栏号 | 449 | 0.6964 | 0.4699 | 0.5612 |
| 19 | text | seq | sequence | 序列表 | 136 | 0.4430 | 0.2574 | 0.3256 |
| 20 | complex | figcx | figure complex | 图片组 | 1,416 | 0.8657 | 0.7373 | 0.7963 |
| 21 | complex | rxn | chemical reaction | 反应式 | 150 | 0.8898 | 0.7000 | 0.7836 |
| 22 | complex | bib | bibliography | 著录页 | 470 | 0.9615 | 0.7979 | 0.8721 |
| 23 | complex | srep | search report | 搜索报告 | 106 | 0.9052 | 0.9906 | 0.9459 |
| 24 | complex | toc | Table of Contents | 目录 | 0 | 0.0000 | 0.0000 | 0.0000 |
| 25 | complex | ref | reference | 参考文献 | 2 | 0.0000 | 0.0000 | 0.0000 |
| ALL |  |  |  |  | 33,054 | 0.8144 | 0.7711 | 0.7922 |

### NPD PDF

| # | Group | Abbr. | Class | Chinese | Labels | Precision | Recall | F1 |
|---:|---|---|---|---|---:|---:|---:|---:|
| 1 | figure | graph | graph | 图表 | 248 | 0.6838 | 0.6976 | 0.6906 |
| 2 | figure | draw | drawing | 绘制图 | 9 | 0.0000 | 0.0000 | 0.0000 |
| 3 | figure | struc | structure diagram | 结构图 | 341 | 0.7454 | 0.7126 | 0.7286 |
| 4 | figure | photo | photograph | 照片 | 82 | 0.6071 | 0.6220 | 0.6145 |
| 5 | figure | tab | table | 表格 | 209 | 0.7533 | 0.8182 | 0.7844 |
| 6 | figure | eqn | math equation | 数学公式 | 298 | 0.6789 | 0.5604 | 0.6140 |
| 7 | figure | chem | chemical formula | 化学式 | 388 | 0.7324 | 0.8325 | 0.7793 |
| 8 | figure | noise | noise | 噪声 | 695 | 0.4823 | 0.4302 | 0.4548 |
| 9 | text | text | text | 文本 | 9,119 | 0.6943 | 0.7625 | 0.7268 |
| 10 | text | title | title | 标题 | 304 | 0.7130 | 0.5395 | 0.6142 |
| 11 | text | sec | section title | 章节标题 | 1,539 | 0.7337 | 0.6160 | 0.6697 |
| 12 | text | head | page header | 页眉 | 1,246 | 0.7464 | 0.7111 | 0.7283 |
| 13 | text | foot | page footer | 页脚 | 1,339 | 0.7711 | 0.6468 | 0.7035 |
| 14 | text | mnote | marginal note | 边注 | 190 | 0.5714 | 0.2947 | 0.3889 |
| 15 | text | cap | caption | 说明 | 573 | 0.8711 | 0.5899 | 0.7034 |
| 16 | text | figno | figure number | 编号 | 149 | 0.6078 | 0.4161 | 0.4940 |
| 17 | text | lineno | line number | 行号 | 41 | 0.6667 | 0.9268 | 0.7755 |
| 18 | text | colno | column number | 栏号 | 0 | 0.0000 | 0.0000 | 0.0000 |
| 19 | text | seq | sequence | 序列表 | 18 | 0.7000 | 0.3889 | 0.5000 |
| 20 | complex | figcx | figure complex | 图片组 | 734 | 0.7657 | 0.7480 | 0.7567 |
| 21 | complex | rxn | chemical reaction | 反应式 | 36 | 0.8947 | 0.4722 | 0.6182 |
| 22 | complex | bib | bibliography | 著录页 | 0 | 0.0000 | 0.0000 | 0.0000 |
| 23 | complex | srep | search report | 搜索报告 | 3 | 0.4286 | 1.0000 | 0.6000 |
| 24 | complex | toc | Table of Contents | 目录 | 76 | 0.8475 | 0.6579 | 0.7407 |
| 25 | complex | ref | reference | 参考文献 | 132 | 0.8148 | 0.3333 | 0.4731 |
| ALL |  |  |  |  | 17,769 | 0.7090 | 0.6983 | 0.7036 |

## Usage

The current model artifact is an ONNX export:

```text
layout_model/RT-DETR_25.onnx
```

The model can be loaded with ONNXRuntime:

```python
import onnxruntime as ort

session = ort.InferenceSession("layout_model/RT-DETR_25.onnx")
print("inputs:", [i.name for i in session.get_inputs()])
print("outputs:", [o.name for o in session.get_outputs()])
```

Use `labels.json` for the 25-class label mapping.

## Repository Files

| File | Purpose |
| --- | --- |
| `README.md` | Hugging Face model card in English |
| `README_zh.md` | Chinese model card |
| `EVALUATION.md` | Detailed benchmark results derived from the workbook |
| `labels.json` | Machine-readable 25-class label mapping |
| `layout_model/RT-DETR_25.onnx` | ONNX model artifact |
| `requirements.txt` | Minimal dependencies for ONNX loading and image preprocessing |
| `LICENSE` | Apache-2.0 license |
| `DISCLAIMER.md` | Model limitations and responsible-use notes |
| `NOTICE` | Copyright and trademark notice |
| `OPEN_SOURCE_CHECKLIST.md` | Release checklist before public upload |

## Limitations

- Layout predictions may be inaccurate on low-resolution scans, heavily rotated pages, handwritten documents, unusual patent formats, or unseen page templates.
- Small objects and sparse categories can have unstable metrics when the evaluation set has very few labels.
- The model should not be used as the sole source of truth for legal, compliance, filing, archival, or customer-facing workflows without human review.
- Users are responsible for ensuring they have the right to process and share any documents used with this model.

## License

This project is released under the Apache License 2.0. See [LICENSE](LICENSE).

## Copyright Notice

Copyright (c) 2026 Patsnap. All rights reserved except as expressly licensed under the applicable license terms.

Hiro-Layout, Hiro, Patsnap, and any associated names, logos, product names, service names, designs, and slogans are trademarks or registered trademarks of Patsnap or its affiliates. No trademark license is granted under the open source license or any model license unless expressly stated.