File size: 5,077 Bytes
fc08c11
 
b81a426
fc08c11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# Evaluation Results

The benchmark contains two evaluation sets:

- `Patent PDF`
- `NPD PDF`

Metrics are reported as Precision, Recall, and F1. The `ALL` rows below are copied from the workbook.

## Summary

| Benchmark | Labels | Precision | Recall | F1 |
| --- | ---: | ---: | ---: | ---: |
| Patent PDF | 33,054 | 0.8144 | 0.7711 | 0.7922 |
| NPD PDF | 17,769 | 0.7090 | 0.6983 | 0.7036 |

## Patent PDF

| # | Group | Abbr. | Class | Chinese | Labels | Precision | Recall | F1 |
|---:|---|---|---|---|---:|---:|---:|---:|
| 1 | figure | graph | graph | 图表 | 215 | 0.7611 | 0.8000 | 0.7800 |
| 2 | figure | draw | drawing | 绘制图 | 420 | 0.8649 | 0.3048 | 0.4507 |
| 3 | figure | struc | structure diagram | 结构图 | 626 | 0.6579 | 0.8355 | 0.7361 |
| 4 | figure | photo | photograph | 照片 | 147 | 0.8378 | 0.8435 | 0.8407 |
| 5 | figure | tab | table | 表格 | 198 | 0.7759 | 0.9091 | 0.8372 |
| 6 | figure | eqn | math equation | 数学公式 | 399 | 0.7762 | 0.6692 | 0.7187 |
| 7 | figure | chem | chemical formula | 化学式 | 1,099 | 0.8792 | 0.8944 | 0.8868 |
| 8 | figure | noise | noise | 噪声 | 1,241 | 0.7025 | 0.7687 | 0.7341 |
| 9 | text | text | text | 文本 | 17,668 | 0.8182 | 0.8062 | 0.8122 |
| 10 | text | title | title | 标题 | 601 | 0.9117 | 0.8070 | 0.8561 |
| 11 | text | sec | section title | 章节标题 | 1,394 | 0.7968 | 0.7088 | 0.7502 |
| 12 | text | head | page header | 页眉 | 3,074 | 0.8187 | 0.7788 | 0.7983 |
| 13 | text | foot | page footer | 页脚 | 1,012 | 0.7432 | 0.6433 | 0.6896 |
| 14 | text | mnote | marginal note | 边注 | 421 | 0.7794 | 0.5202 | 0.6239 |
| 15 | text | cap | caption | 说明 | 80 | 0.6842 | 0.4875 | 0.5693 |
| 16 | text | figno | figure number | 编号 | 1,389 | 0.8955 | 0.7466 | 0.8143 |
| 17 | text | lineno | line number | 行号 | 341 | 0.7759 | 0.6598 | 0.7132 |
| 18 | text | colno | column number | 栏号 | 449 | 0.6964 | 0.4699 | 0.5612 |
| 19 | text | seq | sequence | 序列表 | 136 | 0.4430 | 0.2574 | 0.3256 |
| 20 | complex | figcx | figure complex | 图片组 | 1,416 | 0.8657 | 0.7373 | 0.7963 |
| 21 | complex | rxn | chemical reaction | 反应式 | 150 | 0.8898 | 0.7000 | 0.7836 |
| 22 | complex | bib | bibliography | 著录页 | 470 | 0.9615 | 0.7979 | 0.8721 |
| 23 | complex | srep | search report | 搜索报告 | 106 | 0.9052 | 0.9906 | 0.9459 |
| 24 | complex | toc | Table of Contents | 目录 | 0 | 0.0000 | 0.0000 | 0.0000 |
| 25 | complex | ref | reference | 参考文献 | 2 | 0.0000 | 0.0000 | 0.0000 |
| ALL |  |  |  |  | 33,054 | 0.8144 | 0.7711 | 0.7922 |

## NPD PDF

| # | Group | Abbr. | Class | Chinese | Labels | Precision | Recall | F1 |
|---:|---|---|---|---|---:|---:|---:|---:|
| 1 | figure | graph | graph | 图表 | 248 | 0.6838 | 0.6976 | 0.6906 |
| 2 | figure | draw | drawing | 绘制图 | 9 | 0.0000 | 0.0000 | 0.0000 |
| 3 | figure | struc | structure diagram | 结构图 | 341 | 0.7454 | 0.7126 | 0.7286 |
| 4 | figure | photo | photograph | 照片 | 82 | 0.6071 | 0.6220 | 0.6145 |
| 5 | figure | tab | table | 表格 | 209 | 0.7533 | 0.8182 | 0.7844 |
| 6 | figure | eqn | math equation | 数学公式 | 298 | 0.6789 | 0.5604 | 0.6140 |
| 7 | figure | chem | chemical formula | 化学式 | 388 | 0.7324 | 0.8325 | 0.7793 |
| 8 | figure | noise | noise | 噪声 | 695 | 0.4823 | 0.4302 | 0.4548 |
| 9 | text | text | text | 文本 | 9,119 | 0.6943 | 0.7625 | 0.7268 |
| 10 | text | title | title | 标题 | 304 | 0.7130 | 0.5395 | 0.6142 |
| 11 | text | sec | section title | 章节标题 | 1,539 | 0.7337 | 0.6160 | 0.6697 |
| 12 | text | head | page header | 页眉 | 1,246 | 0.7464 | 0.7111 | 0.7283 |
| 13 | text | foot | page footer | 页脚 | 1,339 | 0.7711 | 0.6468 | 0.7035 |
| 14 | text | mnote | marginal note | 边注 | 190 | 0.5714 | 0.2947 | 0.3889 |
| 15 | text | cap | caption | 说明 | 573 | 0.8711 | 0.5899 | 0.7034 |
| 16 | text | figno | figure number | 编号 | 149 | 0.6078 | 0.4161 | 0.4940 |
| 17 | text | lineno | line number | 行号 | 41 | 0.6667 | 0.9268 | 0.7755 |
| 18 | text | colno | column number | 栏号 | 0 | 0.0000 | 0.0000 | 0.0000 |
| 19 | text | seq | sequence | 序列表 | 18 | 0.7000 | 0.3889 | 0.5000 |
| 20 | complex | figcx | figure complex | 图片组 | 734 | 0.7657 | 0.7480 | 0.7567 |
| 21 | complex | rxn | chemical reaction | 反应式 | 36 | 0.8947 | 0.4722 | 0.6182 |
| 22 | complex | bib | bibliography | 著录页 | 0 | 0.0000 | 0.0000 | 0.0000 |
| 23 | complex | srep | search report | 搜索报告 | 3 | 0.4286 | 1.0000 | 0.6000 |
| 24 | complex | toc | Table of Contents | 目录 | 76 | 0.8475 | 0.6579 | 0.7407 |
| 25 | complex | ref | reference | 参考文献 | 132 | 0.8148 | 0.3333 | 0.4731 |
| ALL |  |  |  |  | 17,769 | 0.7090 | 0.6983 | 0.7036 |

## Notes

- Rows with zero labels are retained because they appear in the source workbook.
- The summary table uses the workbook's `ALL` rows rather than recomputing aggregate F1 from per-class rows.
- The benchmark is in-house and should be described as such unless the dataset is later cleared for public release.