19th AIAI 2023, 14 - 17 June 2023, León, Spain

Extracting Knowledge from Recombinations of SMILES Representations

Christos Didachos, Andreas Kanavos

Abstract:

  The exploitation of all possible combinations of the non-common substructure of compounds using Simplified Molecular-Input Line-Entry System (SMILES) representations is an essential part in terms of accurate chemical information processing. SMILES is a widely used encoding for representing chemical compounds as strings of characters. In our paper, a novel approach, which treats the SMILES strings as a sequence of letters, numbers and symbols in order to extract meaningful knowledge, is presented. It identifies the common substructure between two given SMILES. For the non-common substructure, we extensively search all possible combinations of the string characters of all possible lengths. Finally, for all these character combinations, we accept only those that are chemically correct. So, our approach suggests all possible substructures that may be present for the non-common substructure between two compounds using the atoms that already exist in the initial non-common substructure. This approach can generate all possible fragments that could exist for a given non-common substructure while maintaining the common substructure and could be used in drug discovery and other chemical applications.  

*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.