You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

GaroOCR

OCR model for the Garo (grt_Latn) language, fine-tuned from microsoft/Florence-2-base-ft on Garo text images.

Developed by MWire Labs, Shillong, Meghalaya; part of an ongoing effort to build foundational AI for Northeast Indian languages.

Model Details


Base model	`microsoft/Florence-2-base-ft`
Parameters	231M
Language	Garo (Achik)
Task	OCR (image → text)
Training samples	80,000
Epochs	5
Character Accuracy	93.13%

Training Setup

Hardware: NVIDIA A40 (48GB)
Precision: bfloat16
Batch size: 4 (effective 16 with gradient accumulation)
Learning rate: 3e-4 with cosine scheduler
Max label length: 128 tokens
Task prompt: <OCR> (Florence-2 uppercase token)

Usage

from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image
import torch

processor = AutoProcessor.from_pretrained("MWirelabs/garo-ocr", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "MWirelabs/garo-ocr",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).cuda()

image = Image.open("your_image.png").convert("RGB")
inputs = processor(text="<OCR>", images=image, return_tensors="pt")
inputs = {k: v.cuda() for k, v in inputs.items()}
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)

with torch.no_grad():
    generated = model.generate(
        pixel_values=inputs["pixel_values"],
        input_ids=inputs["input_ids"],
        max_new_tokens=128,
    )

text = processor.tokenizer.decode(generated[0], skip_special_tokens=True)
print(text)

Note: Use transformers==4.38.2 for compatibility.

Limitations

Max reliable output length is ~128 tokens
Part of MWire Labs' mono-language series; a multilingual NE-OCR model covering more Northeast Indian languages is in development

Downloads last month: 11

Safetensors

Model size

0.2B params

Tensor type

BF16

Model tree for MWirelabs/garo-ocr

Base model

microsoft/Florence-2-base-ft

Finetuned

(19)

this model

Evaluation results

Character Accuracy (1000 samples)
self-reported

93.130