Using embeddings to do sentence similarity

#16

by bilalmalik4321 - opened May 18, 2023

May 18, 2023

Has anyone used the embeddings to calculate sentence similarity like the example card? If so, what are the steps you took to do this?

mintujohnson

May 22, 2023

•

edited May 22, 2023

This is actually a straight forward task, thanks to huggingface/sentence transformers utilities.
We just need to compare the embeddings using a similarity score utility.

Step 1: Encode the sentences to be compared

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings1 = model.encode(sentences1, convert_to_tensor=True)
embeddings2 = model.encode(sentences2, convert_to_tensor=True)

(where, sentencs1 and sentences2 are list of sentences(strings))

Step 2: Compute the similarity using a similarity matrix

(cosine similarity or dot product)

from sentence_transformers import util
cosine_scores = util.cos_sim(embeddings1, embeddings2)

Step 3: Output the pairs with their score

for i in range(len(sentences1)): print("{} \t\t {} \t\t Score: {:.4f}".format(sentences1[i], sentences2[i], cosine_scores[i][i]))

For more references, you can visit Sentence-Transformers website:
https://www.sbert.net/docs/usage/semantic_textual_similarity.html

gerkim62

Aug 19, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment