skill-classifier-base-v2

skill-classifier-base-v2 is a lightweight, efficient binary sequence classification model designed for sentence-level skill statement classification. It detects whether a specific sentence mentions a skill that might be required on the job. It is build on top of the compact prajjwal1/bert-small model.

Basic Usage

You can deploy this model using the standard Hugging Face text-classification pipeline.

from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer

model_name = "loyoladatamining/skill-classifier-base-v2"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, max_length=64, truncation=True)

# Create text classification pipeline
nlp = pipeline(
    "text-classification", 
    model=model, 
    tokenizer=tokenizer,
    max_length=64,
    truncation=True
)

# Inference
text = "Proficient in Python programming, SQL databases, and cloud infrastructure management."
result = nlp(text)
print(result)

Output Format

The model returns a list containing a single classification result with the predicted binary label and its associated confidence score:

[
  {
    "label": "LABEL_1",
    "score": 0.9912
  }
]

Label Mapping

LABEL_0: The text does not contain any skill statements.
LABEL_1: The text contains a skill statement or skill language.

Evalaution

The performance of skill-classifier-base-v2 was evaluated against its previous iteration (skill-classifier-base) using the loyoladatamining/usajobs_validation dataset.

The new version of the model demonstrates a significant performance improvement on the skill detection portion of the dataset:

Model	Accuracy	F-1
skill-classifier-base	0.8335	0.8437
skill-classifier-base-v2	0.9748	0.9749

Citation

If you find this model useful in your work, please consider citing:

@article{meisenbacher2025extracting,
  title={Extracting O* NET Features from the NLx Corpus to Build Public Use Aggregate Labor Market Data},
  author={Meisenbacher, Stephen and Nestorov, Svetlozar and Norlander, Peter},
  year={2025}
}

Downloads last month: 190

Safetensors

Model size

28.8M params

Tensor type

F32

Model tree for loyoladatamining/skill-classifier-base-v2

Base model

prajjwal1/bert-small

Finetuned

loyoladatamining/skill-classifier-base

Finetuned

(1)

this model