请问用m3e-base作为embedding模型，对于文本的长度限制有没有比较推荐的设置

#20

by demonai - opened Aug 8, 2023

Discussion

demonai

Aug 8, 2023

请问用m3e-base作为embedding模型，对于文本的长度限制有没有比较推荐的设置

MokaHR

Moka HR SaSS org Aug 10, 2023

512 比较好，训练的时候就是这么截断的

06-mingming-Max

Jan 8, 2024

可是看到m3e-base
"clean_up_tokenization_spaces": true,
"cls_token": "[CLS]",
"do_lower_case": true,
"mask_token": "[MASK]",
"model_max_length": 1000000000000000019884624838656,
"pad_token": "[PAD]",
"sep_token": "[SEP]",
"strip_accents": null,
"tokenize_chinese_chars": true,
"tokenizer_class": "BertTokenizer",
"unk_token": "[UNK]"

这个不是代表基本不限制文本的长度么?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment