Nonparametric inference for classification and association with high dimensional genetic data
- García-Magariños, Manuel
- Antonio Salas Ellacuriaga Director
- Wenceslao González Manteiga Director
- Ricardo Cao Abad Director
Defence university: Universidade de Santiago de Compostela
Fecha de defensa: 29 January 2010
- Ángel Carracedo Álvarez Chair
- Carmen María Cadarso Suárez Secretary
- Ignacio López de Ullibarri Galparsoro Committee member
- Vincent Macaulay Committee member
- Thore Egeland Committee member
Type: Thesis
Abstract
Over the last years, genetic advances have meant a revolution that has expanded beyond genetic borders, influencing the future of many other scientific areas, As the boom of genetics has caused the arising of countless high dimensional datasets containing DNA/RNA profiles, statistics is the science required to deal with them. Not only new tools need to be developed, but also existing methods can be adapted, and their abilities evaluated, to be applied to genetic data. The term genetic data include a wide variety of datasets, having in common only the fact of coming from DNA information: from SNPs (categorical data) to gene expression measures (continuous data). Inside this DNA information could be the answer to many common diseases with a complex basis (psychiatric disorders, cancer, diabetes, etc), so the main aim of statistics is to provide with proper, powerful techniques, able to unravel the underlying nature of complex diseases. This essay contains several statistical approaches to both gene expression data and SNP/STR data. There is place here for penalized regression, machine learning or tree-based methods. Although the emphasis lays on clinical genetics, statistical tools for population and forensic genetics are also explained.