You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

GaroOCR

License: CC BY 4.0 Character Accuracy

OCR model for the Garo (grt_Latn) language, fine-tuned from microsoft/Florence-2-base-ft on Garo text images.

Developed by MWire Labs, Shillong, Meghalaya; part of an ongoing effort to build foundational AI for Northeast Indian languages.


Model Details

Base model microsoft/Florence-2-base-ft
Parameters 231M
Language Garo (Achik)
Task OCR (image → text)
Training samples 80,000
Epochs 5
Character Accuracy 93.13%

Training Setup

  • Hardware: NVIDIA A40 (48GB)
  • Precision: bfloat16
  • Batch size: 4 (effective 16 with gradient accumulation)
  • Learning rate: 3e-4 with cosine scheduler
  • Max label length: 128 tokens
  • Task prompt: <OCR> (Florence-2 uppercase token)

Usage

from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image
import torch

processor = AutoProcessor.from_pretrained("MWirelabs/garo-ocr", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "MWirelabs/garo-ocr",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).cuda()

image = Image.open("your_image.png").convert("RGB")
inputs = processor(text="<OCR>", images=image, return_tensors="pt")
inputs = {k: v.cuda() for k, v in inputs.items()}
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)

with torch.no_grad():
    generated = model.generate(
        pixel_values=inputs["pixel_values"],
        input_ids=inputs["input_ids"],
        max_new_tokens=128,
    )

text = processor.tokenizer.decode(generated[0], skip_special_tokens=True)
print(text)

Note: Use transformers==4.38.2 for compatibility.


Limitations

  • Max reliable output length is ~128 tokens
  • Part of MWire Labs' mono-language series; a multilingual NE-OCR model covering more Northeast Indian languages is in development

Downloads last month
11
Safetensors
Model size
0.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MWirelabs/garo-ocr

Finetuned
(19)
this model

Evaluation results