Instructions to use moka-ai/m3e-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use moka-ai/m3e-base with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("moka-ai/m3e-base") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
请问用m3e-base作为embedding模型,对于文本的长度限制有没有比较推荐的设置
#20
by demonai - opened
请问用m3e-base作为embedding模型,对于文本的长度限制有没有比较推荐的设置
512 比较好,训练的时候就是这么截断的
可是看到m3e-base
"clean_up_tokenization_spaces": true,
"cls_token": "[CLS]",
"do_lower_case": true,
"mask_token": "[MASK]",
"model_max_length": 1000000000000000019884624838656,
"pad_token": "[PAD]",
"sep_token": "[SEP]",
"strip_accents": null,
"tokenize_chinese_chars": true,
"tokenizer_class": "BertTokenizer",
"unk_token": "[UNK]"
这个不是代表基本不限制 文本的长度么?