The field of Natural Language Processing (NLP) has flourished during the past decades in computer science, and that is largely due to the exponential growth of internet applications, like search engines, social network platforms, chatbots and Internet of Things (IoT). On the other hand, the robotics and human computer interaction fields have been largely connected to NLP development, by exploring ways of human-robot or human-computer communication in natural language. In this work, we deal with the problem of semantic similarity between text passages, which is one of the problems faced in many NLP applications, like human-computer/robot communication through natural language text. More specifically, we developed three deep learning models to face the problem: two variations of the Siamese BiLSTM model and a variation of the Simple BiLST model. We used two different techniques of word embeddings, (a) classic token-to-vec embedding using GloVe, and (b) one implementing the encoder part of the BERT model. Finally, we train and compare each model in terms of performance, through experimental studies on two datasets, MRPC (MSRP) and Quora, and we draw conclusions about the advantages and disadvantages of each one of them. Siamese BERT-BiLSTM model achieves accuracy 83,03% on the Quora dataset, which is comparable to the state of the art. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.