! -- Paper: 130 -->
|Predicting health outcomes such as a disease onset, recovery or mortality is an important part of medical research. Classical methods of survival analysis such as Cox proportionate hazards model have successfully been employed and proved robust and easy to interpret. Recent development of computational methods and digitalization of medical records brought new tools to survival analysis, which can handle large data with complex non-linear relationships. However, such methods often result in "black box" models hard to interpret. In this project we combine the Cox model with tree-based machine-learning algorithms to take advantage of both approaches' strength and to boost the overall predictive performance. Moreover, we aimed to preserve interpretability of the results, quantify the contribution of linear and non-linear and cross-term dependencies, and get insight into a potential non-linearity. The first method includes the Cox model, ensembled with the survival random forest. The second employs a survival tree algorithm to cluster the data, and then fits a separate Cox model in each cluster. The third uses the clusters obtained with a survival tree to identify interaction and non-linear terms and adds them as new terms to the Cox model. We tested the methods on simulated and real-life medical data and compared their internally validated discrimination and calibration. Our results show that classical models outperform combined methods in data with predominantly linear relationships. The proposed methods were more effective in predicting survival outcomes with strong non-linear and inter-dependent relationships and provided an insight into where the non-linearity is placed.|
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.