How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("token-classification", model="microsoft/xdoc-base-funsd")
# Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("microsoft/xdoc-base-funsd")
model = AutoModelForTokenClassification.from_pretrained("microsoft/xdoc-base-funsd")
Quick Links

XDoc

Introduction

XDoc is a unified pre-trained model that deals with different document formats in a single model. With only 36.7% parameters, XDoc achieves comparable or better performance on downstream tasks, which is cost-effective for real-world deployment.

XDoc: Unified Pre-training for Cross-Format Document Understanding Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei, EMNLP 2022

Citation

If you find XDoc helpful, please cite us:

@article{chen2022xdoc,
  title={XDoc: Unified Pre-training for Cross-Format Document Understanding},
  author={Chen, Jingye and Lv, Tengchao and Cui, Lei and Zhang, Cha and Wei, Furu},
  journal={arXiv preprint arXiv:2210.02849},
  year={2022}
}
Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for microsoft/xdoc-base-funsd