19th AIAI 2023, 14 - 17 June 2023, León, Spain

Natural Language Processing for the Turkish Academic Texts in the Engineering Field: Key-term Extraction, Similarity Detection, Subject/Topic Assignment

Bora Kat

Abstract:

  The information retrieved from texts play crucial roles in many aspects. Although there are significant attempts on natural language processing for various types of texts in Turkish, none of them deals with academic texts. This study mainly aims to retrieve precise key terms from Turkish academic texts in the field of engineering and develop algorithms for similarity detection and automatic classification based on these key terms. In the first step of this study: a library and customized templates, that can transform the n-grams into structured forms, are created by considering the features of engineering terminology and the grammar of the Turkish language. Then, a customized similarity detection algorithm is developed. Finally, the Naïve Bayes Classifier is used to assign the documents to the appropriate engineering sub-fields. The project proposals submitted to The Scientific and Technological Research Council of Turkey (TÜBİTAK) Academic Research Funding Program Directorate (ARDEB) are analyzed as a case study. The results indicate that the proposed similarity algorithm correctly detects almost all of the re-submitted proposals while the accuracy of the classifier is 83.3% in the first prediction and reaches up to 96.4% in the first three predictions over a sample of 1255 proposals.  

*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.