Image-Text-to-Text
PaddleOCR
Safetensors
English
Chinese
multilingual
paddleocr_vl
ERNIE4.5
PaddlePaddle
image-to-text
ocr
document-parse
layout
table
formula
chart
seal
spotting
conversational
custom_code
Eval Results
Instructions to use PaddlePaddle/PaddleOCR-VL-1.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PaddleOCR
How to use PaddlePaddle/PaddleOCR-VL-1.5 with PaddleOCR:
# See https://www.paddleocr.ai/latest/version3.x/pipeline_usage/PaddleOCR-VL.html to installation from paddleocr import PaddleOCRVL pipeline = PaddleOCRVL(pipeline_version="v1.5") output = pipeline.predict("path/to/document_image.png") for res in output: res.print() res.save_to_json(save_path="output") res.save_to_markdown(save_path="output") - Notebooks
- Google Colab
- Kaggle
Repetition makes this unusable
#17
by zero1zero - opened
Added variations of small to large repetition_penalty and/or presence_penalty and it continuously had problems. I stayed with the recommended greedy decoding but gave low temp a try and it still showed issues.
Seems like this must be an inherent issue with the model? Has anyone seen this work without eventually getting into a repeat loop?
This seems to happen mainly with table structure or visual elements where it wants to use "[-|_|.]" to represent something. This seems like its related to table parsing but also happens on representing visual content or spacing. I'm using the "OCR" task.
Hosted on a 4090 through vllm.