is_pay

is_pay is a fine-tuned, lightweight sequence classification model used to predict whether a given text string contains wage or salary information. It was fine-tuned from lyeonii/bert-tiny, making it highly efficient for high-throughput filtering pipelines.

Basic Usage

You can deploy this model using the standard Hugging Face text-classification pipeline.

from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer

model_name = "loyoladatamining/is_pay"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, max_length=64, truncation=True)

# Create text classification pipeline
nlp = pipeline(
    "text-classification", 
    model=model, 
    tokenizer=tokenizer,
    max_length=64,
    truncation=True
)

# Inference
text = "The starting salary for this position is $75,000 per year."
result = nlp(text)
print(result)

Output Format

The model returns a list containing a dictionary with the predicted binary class label and its corresponding confidence score:

[
  {
    "label": "LABEL_1",
    "score": 0.9942
  }
]

Label Mapping

LABEL_0: The text does not contain wage or salary information.
LABEL_1: The text contains wage or salary information.

Citation

If you find is_pay useful in your work, please consider citing:

@article{meisenbacher2025extracting,
  title={Extracting O* NET Features from the NLx Corpus to Build Public Use Aggregate Labor Market Data},
  author={Meisenbacher, Stephen and Nestorov, Svetlozar and Norlander, Peter},
  year={2025}
}

Downloads last month: 90

Safetensors

Model size

4.39M params

Tensor type

F32

Model tree for loyoladatamining/is_pay

Base model

lyeonii/bert-tiny

Finetuned

(2)

this model