USE – Universal Sentence Encoder

At a high level, it consists of an encoder that summarizes any sentence to give a sentence embedding which can be used for any NLP task.

The encoder part comes in two forms and either of them can be used

Transformer – Here the encoder part of the original transformer architecture is used. The architecture consists of 6 stacked transformer layers. Each layer has a self-attention module followed by a feed-forward network. The output context-aware word embeddings are added element-wise and divided by the square root of the length of the sentence to account for the sentence-length difference. We get a 512-dimensional vector as output sentence embedding.
Deep averaging network- the embeddings for word and bi-grams present in a sentence are averaged together. Then, they are passed through a 4-layer feed-forward deep DNN to get 512-dimensional sentence embedding as output. The embeddings for word and bi grams are learned during training.

Training of the USE

The USE is trained on a variety of unsupervised and supervised tasks such as Skipthoughts, NLI, and more using the below principles.

Tokenize the sentences after converting them to lowercase
Depending on the type of encoder, the sentence gets converted to a 512-dimensional vector
The resulting sentence embeddings are subsequently used to update the model parameters.
The trained model is then once again applied to produce a fresh 512-dimensional sentence embedding.

Training of Encoder

Python Implementation

We load the Universal Sentence Encoder’s TF Hub module.

module_url contains the URL to load the Universal Sentence Encoder (version 4) from TensorFlow Hub.
The hub. load function is used to load the Universal Sentence Encoder model from the specified URL (module_url)
We define a function named embed that takes an input text and returns the embeddings using the loaded Universal Sentence Encoder model.

Python3

import tensorflow as tf
 
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import re
import seaborn as sns
 
module_url = "https://tfhub.dev/google/universal-sentence-encoder/4"
model = hub.load(module_url)
print("module %s loaded" % module_url)
 
 
def embed(input):
    return model(input)

We draw the similarity score on our sample data below

Python3

from scipy.spatial import distance
 
 
test = ["I liked the movie very much"]
print('Test Sentence:',test)
test_vec = embed(test)
# Sample sentence
sentences = [["The movie is awesome and It was a good thriller"],
        ["We are learning NLP throughg w3wiki"],
        ["The baby learned to walk in the 5th month itself"]]
 
for sent in sentences:
    similarity_score = 1-distance.cosine(test_vec[0,:],embed(sent)[0,:])
    print(f'\nFor {sent}\nSimilarity Score = {similarity_score} ')

Output

Test Sentence: ['I liked the movie very much']

For ['The movie is awesome and It was a good thriller']
Similarity Score = 0.6519516706466675 

For ['We are learning NLP throughg w3wiki']
Similarity Score = 0.06988027691841125 

For ['The baby learned to walk in the 5th month itself']
Similarity Score = -0.01121298223733902

Different Techniques for Sentence Semantic Similarity in NLP

Semantic similarity is the similarity between two words or two sentences/phrase/text. It measures how close or how different the two pieces of word or text are in terms of their meaning and context.

In this article, we will focus on how the semantic similarity between two sentences is derived. We will cover the following most used models.

Dov2Vec – An extension of word2vec
SBERT – Transformer-based model in which the encoder part captures the meaning of words in a sentence.
InferSent -It uses bi-directional LSTM to encode sentences and infer semantics.
USE (universal sentence encoder) – It’s a model trained by Google that generates fixed-size embeddings for sentences that can be used for any NLP task.

USE – Universal Sentence Encoder

Python Implementation

Python3

Python3

Different Techniques for Sentence Semantic Similarity in NLP

Similar Reads

Contact Us