Are BERT and RoBERTa have computationally powerful for Semantically Textual Similarity?
Google BERT and other transformer-based models have shown the state of the art performance in numerous problems and open new frontiers in natural language processing. Later with the birth of Robustly Optimized BERT Pre-training Approach short for RoBERTa has presented a replicated study of BERT pre-training and addresses key impacts due to hyperparameters and training data sizes. They performed experiments over GLUE, RACE, and the SQUAD dataset that noticed the state of the art performance. Below are hyperparameter fine-tuning and pre-training RoBERTa.
The BERT and RoBERTa on sentence-pairs regression like semantic textual similarity have set a state of the art performance. However, they need both the sentences to feed into their network that arises massively computational overhead. Searching for most similar sentence pairs in the collections of 10,000 sentences requires around 50 million inference with BERT. Its construction is unsuitable for search as well as clustering problems. Later the Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks has presented at EMNLP 2019 by Nils Reimers and Iryna Gurevych. Like RoBERTa, Sentence-BERT is a fine-tuned a pre-trained BERT using the siamese and triplet network and add pooling to the output of the BERT to extract semantic similarity comparison within a vector space that can be compared using cosine similarity function.
Below is the Colab Link for Basic Semantic Search Implementation using Sentence-BERT.
The evaluation has been done on common semantic textual similarity. Similarly, regression function works pairwise operation and If the sentence sizes become large enough, the scalability of the model has decreased due to combinatorial explosions. Use cosine-similarity to compare the similarity between two sentence embedding is preferable. The experiments have performed with negative Manhatten and negative Euclidean distances as similarity measures, but the results for all approaches remained roughly the same.
The following are the results and a detailed analysis of Sentence-BERT for semantic similarity tasks. This includes the comparison and ablation study along with Model performance or efficiency.