Research on Cancer Risk and Drug Treatment Efficacy Based on Database Construction and IFPTML Prediction Model

  1. Ren, Shumin
Supervised by:
  1. Julián Dorado Director
  2. Aliuska Duardo Sánchez Co-director

Defence university: Universidade da Coruña

Fecha de defensa: 15 May 2024

Committee:
  1. Juan Manuel Ruso Beiras Chair
  2. A. Pazos Secretary
  3. Miren Josune Pérez Estrada Committee member

Type: Thesis

Abstract

In the field of cancer research, the construction of databases and predictive models have become essential work tools. The hallmark of cancer is the uncontrolled proliferation of abnormal cells. It necessitates research into its intricate initiation and progression mechanisms, guiding therapeutic interventions. Addressing these challenges involves leveraging the abundant data from various levels, encompassing genomics, proteomics, clinical documentation, and imaging. These data can be collected and organized in specific databases, revealing biological relationships, patterns, and potential biomarkers. Meanwhile,with the advancement of machine learning models, combining different levels of biological and genetic information enhances the capacity to predict and evaluate cancer risk and prognosis more precisely. However, while there are currently some databases on cancer risk and pharmacogenomics, knowledgebases for comparing and evaluating cancer risk models, as well as databases for drug treatment effects based on multi-omic molecules, is still missing. Addressing this research gap, we established the Cancer Risk Prediction Model Knowledge Base (CRPMKB) and the Prostate Cancer Treatment Efficacy Knowledge Base (PCaTEKB). The creation of these two databases, accessible through public websites, has facilitated research on cancer risk prevention, personalized treatment, and cancer mechanisms. They also provide resources for the subsequent development of interpretable models. We also introduced the Information Fusion, Perturbation Theory, and Machine Learning (IFPTML) algorithm which can establish predictive models that integrates various types of inputs. Therefore, we developed the IFPTML model based on data from the ChEMBL database and PCaTEKB database. This model is designed to predict therapeutic outcomes of drugs related to prostate cancer. Subsequently, we deployed this model on the PCaTEKB website. Finally, we discussed the FAIR principles', which have been gaining ground as a paradigm for the ethical management of data in scientific research. We also discussed its evolution, incorporation into legislation, extension to research software, and practical application to our knowledge database.