A-PROOF ICF-domains Classification

Description

A fine-tuned multi-label classification model that detects 17 WHO-ICF domains in clinical text in Dutch. The model is based on a pre-trained Dutch medical language model (link to be added), a RoBERTa model, trained from scratch on clinical notes of the Amsterdam UMC.

ICF domains

The model can detect 17 categories, which were chosen due to their relevance to recovery from COVID-19:

ICF code	Domain	name in repo
b1300	Energy level	ENR
b140	Attention functions	ATT
b152	Emotional functions	STM
b440	Respiration functions	ADM
b455	Exercise tolerance functions	INS
b530	Weight maintenance functions	MBW
d450	Walking	FAC
d550	Eating	ETN
d840-d859	Work and employment	BER
B280	Sensations of pain	SOP
B134	Sleep functions	SLP
D760	Family relationships	FML
B164	Higher-level cognitive functions	HLC
D465	Moving around using equipment	MAE
D410	Changing basic body position	CBP
B230	Hearing functions	HRN
D240	Handling stress and other psychological demands	HSP

Intended uses and limitations

The model was fine-tuned (trained, validated and tested) on medical records from the Amsterdam UMC (the two academic medical centers of Amsterdam). It might perform differently on text from a different hospital or text from non-hospital sources (e.g. GP records).
The model was fine-tuned with the Simple Transformers library. This library is based on Transformers but the model cannot be used directly with Transformers pipeline and classes; doing so would generate incorrect outputs. For this reason, the API on this page is disabled.

How to use

To generate predictions with the model, use the Simple Transformers library:

from simpletransformers.classification import MultiLabelClassificationModel

model = MultiLabelClassificationModel(
    'roberta',
    'CLTL/icf-domains',
    use_cuda=False,
)

example = 'Nu sinds 5-6 dagen progressieve benauwdheidsklachten (bij korte stukken lopen al kortademig), terwijl dit eerder niet zo was.'
predictions, raw_outputs = model.predict([example])

The predictions look like this:

[[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

The indices of the multi-label stand for:

[ENR-B1300, ATT-B140, STM-B152, ADM-B440, INS-B455, MBW-B530, FAC-D540, ETN-D550, BER-D840-D859, SOP-B280, SLP-B134, FML-D760, HLC-B164, MAE-D465, CBP-D410, HRN-B230, HSP-D240]

In other words, the above prediction corresponds to assigning the labels ADM, FAC and INS to the example sentence.

The raw outputs look like this:

[[0.51907885 0.00268032 0.0030862  0.03066113 0.00616694 0.64720929
  0.67348498 0.0118863  0.0046311 ]]

For this model, the threshold at which the prediction for a label flips from 0 to 1 is 0.5.

Training data

The training data consists of clinical notes from medical records (in Dutch) of the Amsterdam UMC. Due to privacy constraints, the data cannot be released.
The annotation guidelines used for the project can be found here.

Training procedure

The default training parameters of Simple Transformers were used, including:

Optimizer: AdamW
Learning rate: 4e-5
Num train epochs: 1
Train batch size: 8
Threshold: 0.5

Authors and references

Authors

Jenia Kim, Piek Vossen

References

Kim, Jenia, Stella Verkijk, Edwin Geleijn, Marieke van der Leeden, Carel Meskers, Caroline Meskers, Sabina van der Veen, Piek Vossen, and Guy Widdershoven. "Modeling Dutch medical texts for detecting functional categories and levels of COVID-19 patients." In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4577-4585. 2022

@inproceedings{kim2022modeling, title={Modeling Dutch medical texts for detecting functional categories and levels of COVID-19 patients}, author={Kim, Jenia and Verkijk, Stella and Geleijn, Edwin and van der Leeden, Marieke and Meskers, Carel and Meskers, Caroline and van der Veen, Sabina and Vossen, Piek and Widdershoven, Guy}, booktitle={Proceedings of the Thirteenth Language Resources and Evaluation Conference}, pages={4577--4585}, year={2022} }

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32