skill-classifier-base-v2

skill-classifier-base-v2 is a lightweight, efficient binary sequence classification model designed for sentence-level skill statement classification. It detects whether a specific sentence mentions a skill that might be required on the job. It is build on top of the compact prajjwal1/bert-small model.

Basic Usage

You can deploy this model using the standard Hugging Face text-classification pipeline.

from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer

model_name = "loyoladatamining/skill-classifier-base-v2"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, max_length=64, truncation=True)

# Create text classification pipeline
nlp = pipeline(
    "text-classification", 
    model=model, 
    tokenizer=tokenizer,
    max_length=64,
    truncation=True
)

# Inference
text = "Proficient in Python programming, SQL databases, and cloud infrastructure management."
result = nlp(text)
print(result)

Output Format

The model returns a list containing a single classification result with the predicted binary label and its associated confidence score:

[
  {
    "label": "LABEL_1",
    "score": 0.9912
  }
]

Label Mapping

  • LABEL_0: The text does not contain any skill statements.
  • LABEL_1: The text contains a skill statement or skill language.

Evalaution

The performance of skill-classifier-base-v2 was evaluated against its previous iteration (skill-classifier-base) using the loyoladatamining/usajobs_validation dataset.

The new version of the model demonstrates a significant performance improvement on the skill detection portion of the dataset:

Model Accuracy F-1
skill-classifier-base 0.8335 0.8437
skill-classifier-base-v2 0.9748 0.9749

Citation

If you find this model useful in your work, please consider citing:

@article{meisenbacher2025extracting,
  title={Extracting O* NET Features from the NLx Corpus to Build Public Use Aggregate Labor Market Data},
  author={Meisenbacher, Stephen and Nestorov, Svetlozar and Norlander, Peter},
  year={2025}
}
Downloads last month
190
Safetensors
Model size
28.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for loyoladatamining/skill-classifier-base-v2

Finetuned
(1)
this model