Bridging the gap between SNPs and STRsinsertion deletion polymorphisms in forensic genetics, principles and applications

  1. Lebreiro Pereira, Rui Manuel
Dirigida por:
  1. Leonor Gusmâo Director/a
  2. Ángel Carracedo Álvarez Director

Universidad de defensa: Universidade de Santiago de Compostela

Fecha de defensa: 20 de diciembre de 2011

Tribunal:
  1. María Victoria Lareu Huidobro Presidenta
  2. Paula Sánchez Diz Secretaria
  3. A. Amorim Vocal
  4. Lourdes Prieto Solla Vocal
  5. Cristian Capelli Vocal

Tipo: Tesis

Teseo: 315848 DIALNET

Resumen

Introduction Genetic variation in the human genome One of the main characteristics of human populations is the extensive genetic variation existing between individuals. This variability originates from mutation and is shaped by evolutionary forces such as selection, genetic drift and recombination. As a result, all individuals (with the exception of monozygotic twins) have a unique genome harbouring characteristic variants. The first draft of the human genome DNA sequence ten years ago (Lander et al. 2001; Venter et al. 2001) marked the beginning of a new era, bringing new insights on the content and arrangement of the DNA sequence in the cell. The human haploid genome contains about 3.2x109 bases, only a small fraction of it consisting in coding DNA (about 2%) while the vast majority represents non-coding DNA including the intronic regions of genes, heterochromatin and a large number of repetitive sequences (Lander et al. 2001; Venter et al. 2001). The initial sequencing and analysis of the human genome allowed to obtain comprehensive information on already known types of DNA polymorphisms as for example simple repeat sequences ¿ including minisatellites and microsatellites or Short Tandem Repeats (STRs) ¿ and Single Nucleotide Polymorphisms (SNPs) (Lander et al. 2001; Sachidanandam et al. 2001; International HapMap Consortium 2003; 2005). Later, the availability of a high quality human reference sequence (International Human Genome Sequencing Consortium 2004) triggered the research on other sources of genetic variation that are likely to be important factors underlying inherited traits and diseases in humans, namely the discovery and characterization of structural variants, comprising larger forms of genomic alteration involving segments of DNA longer than 1 kb (Feuk et al. 2006; Conrad & Hurles 2007). Structural variation harbours a heterogeneous group of variants which include deletions, duplications and large-scale Copy Number Variants (CNVs) as well as insertions, inversions and translocations (Feuk et al. 2006; Conrad & Hurles 2007). In the last years several studies dedicated particular attention to these types of structural variants, with special focus in CNVs (Redon et al. 2006; Conrad et al. 2010; Sudmant et al. 2010), and also to insertion deletion polymorphisms (indels) (Weber et al. 2002; Mills et al. 2006; 2011). DNA polymorphisms as genetic markers The general properties of a DNA polymorphism are greatly dependent on the mutational mechanism underlying its origin, and the rate at which such mutational events shape their evolution throughout time. Therefore, different types of genetic markers will ultimately present some specific characteristics that will define their usefulness in different genetic research fields. - Short Tandem Repeats (STRs) are among the most variable loci in the genome and have been especially useful in population genetic studies for characterizing the structure and demographic history of human populations (e.g. Rosenberg et al. 2002; 2005; Tishkoff et al. 2009). Also, because STRs are highly polymorphic and predominantly multi-allelic, this type of genetic markers is very informative and widely used in forensic genetics (Ruitberg et al. 2001; Jobling & Gill 2004; Butler 2005; Carracedo & Sánchez-Diz 2005). Other features that make STRs desirable markers in forensics are the technical simplicity due to rapid analysis through PCR and capillary electrophoresis automated fluorescent detection, and the amenability to multiplexing, thus allowing the simultaneous study of a high number of loci in a single reaction. - Single Nucleotide Polymorphisms are the simplest form of sequence variation, originated by single base substitutions. Mutational events such as base substitutions occur with a very low frequency (2.3x10-8; Nachman & Crowell 2000), so that it is highly unlikely that recurrent mutations occur over the time scale of modern human evolution. In practice, this class of mutations will show identity by descent rather than identity by state and, therefore, a higher geographical specificity. By their nature, these mutations create polymorphisms with two allelic states (ancestral and mutant) and are so called binary or biallelic polymorphisms. Since SNPs are more stable than STRs, they are useful to study the evolutionary history of human populations in deeper time scales (e.g. Jakobsson et al. 2008; Li et al. 2008). In forensics, despite presenting lower genetic diversity than STRs, SNPs have several desirable features as genetic markers in this field: SNPs can be analyzed in very short fragments, which is very important to improve the amplification success of highly degraded DNA; they have very low mutation rates (Nachman & Crowell 2000), which is advantageous in kinship testing; and last of all, they are suitable for automation and analysis with high throughput technologies (Sobrino et al. 2005; Kim & Misra 2007; Perkel 2008; Ragoussis 2009). - Insertion deletion polymorphisms (indels) are DNA length variations created by the insertion or deletion of one or more nucleotides in the genome sequence. After SNPs, indels are reported as the most abundant variation in the genome (Weber et al. 2002; Mills et al. 2006). In 2002, Weber et al. in one of the earliest genome-wide indel discovery efforts, identified and characterized 2000 biallelic indels distributed throughout the human genome, which varied greatly in length difference between alleles (Weber et al. 2002). Since then, a number of studies have been published using indels for a variety of purposes as, for example, addressing the genetic structure of human populations (Rosenberg et al. 2005; Bastos-Rodrigues et al. 2006; Tishkoff et al. 2009), inferring individual and population ancestry proportions (Yang et al. 2005), or as useful genetic markers in the analysis of natural populations (Väli et al. 2008) and in species identification (Pereira et al. 2010). In 2006, Mills et al. conducted a milestone study to identify indels, and reported an initial map of insertion and deletion variation in the human genome containing more than 415000 unique polymorphisms (Mills et al. 2006). According to this comprehensive survey, indels represent ~16% of all human DNA polymorphisms and are widely spread throughout the entire genome, with an average density of one indel per 7.2 kb. About one third of the reported indels were identified within known genes, from which ~3.7% were located in exons and promoter regions (Mills et al. 2006). A special class of indels including insertions/deletions of apparently random DNA sequences represents about 41% of all indels and harbours polymorphisms with a wide range of allele length variation, from 2 bp up to about 10 kb. Interestingly, almost all of these indels (more than 99%) are under 100 bp in length (Mills et al. 2006). In more recent studies this particular class of indels containing random DNA sequences is often termed as ¿small indels¿, as opposed to larger forms of insertions and deletions or multibase pair expansions of repeat units (Mullaney et al. 2010; Mills et al. 2011). Also, and throughout this thesis, we adopted the term ¿indel¿ or ¿small indel¿ in a more strict sense to refer to the particular class of indels that contain apparently random DNA sequences. Despite a number of other reports that also dedicate attention to the identification of indels (Bhangale et al. 2005; Korbel et al. 2007; Kidd et al. 2008), it was only very recently that a comprehensive follow-up study by Mills et al. (2011) reported almost two million small indels in the range of 1bp to 10000 bp, from the analysis of 79 diverse humans. The information on these polymorphisms was also included in dbSNP, contributing to enhance the, still, very scarce resources on indels. As an example, a similar query performed for indels with the same limits as shown for SNPs (i.e., human and validated by frequency, the HapMap or 1000 Genomes) result in simply 34823 indels, with only 26515 of them having a reported heterozygosity higher than 0.30. This probably reflects the lower attention dedicated to discovering insertion deletion polymorphisms, along with the challenging task of accurate multialignment allowing for gaps. Also, accompanying population studies are highly desirable ( e.g. Weber et al. 2002) for guidance in future investigations. A new hope comes from the 1000 Genomes Project (1000 Genomes Project Consortium 2010), as more data are released for the research community. By comprising a high number of characterized individuals from different populations, it would be feasible to derive precise allele frequency estimates for such markers, which will be of extreme utility in future studies. Because of their reduced length, small indels are amenable to analysis through a simple mechanism of PCR amplification and electrophoresis, making them practical genetic markers easy analyzable with resources commonly available at molecular genetics laboratories, but also using high throughput genotyping platforms similar to those developed for SNPs (Weber et al. 2002; Yang et al. 2005; Mills et al. 2006; Mullaney et al. 2010; Mills et al. 2011). From the abovementioned, it is possible to conclude that small indels have a huge potential, which is still to be fully unveiled. Moreover, interesting features like genetic markers are well evident, particularly in forensics, and should be extensively explored as happens with other forms of natural genetic variation. Localization in the genome, mode of inheritance and applicability The forms of genetic variation that we have detailed occur ubiquitously throughout the entire nuclear genome and some also in the mitochondrial genome. Due to the different mode of inheritance of autosomes, sexual chromosomes and mitochondria (Figure 2), the localization of a genetic marker in the genome has an important impact on its characteristics, and consequently, on its applicability. The vast majority of the existent genetic variation occurs in autosomes, which represent about 93% of whole human genome (International Human Genome Sequencing Consortium 2004). The genetic information on the autosomes is transmitted from parents to offspring in a typical Mendelian mode of inheritance, with equal contribution from the mother and the father. The shuffling of genetic information is guaranteed by recombination in every generation, leading to a progressive breakdown of linkage disequilibrium. Autosomes constitute the main source of genetic markers to be used in very diverse applications. For instance, in population genetics they constitute the only genetic information where both male and female histories are equally represented. In forensics, the autosomal markers are the only ones useful in any identification scenario, although other type of markers can be of valuable additional information in some specific situations. The X chromosome presents a peculiar sex-biased mode of inheritance: females carry two X chromosomes and show an inheritance mode similar to autosomes, while males have a single copy of the X chromosome that is transmitted almost unchanged to female descendants (only the pseudo-autosomal regions (PARs) maintain homology with its counterpart Y chromosome PARs and recombine during male meiosis). Of interest in forensics, the X chromosome shows higher efficiency than autosomes in specific kinship cases involving mainly female offspring, and it proves useful in reconstructing haplotypes in so-called deficient relationship cases (Szibor et al. 2003a; 2003b; Szibor 2007). Most importantly, the use of X chromosome markers can serve to complement the analysis with autosomal STRs whenever it reveals insufficient or is non-informative. Objectives Forensic genetic studies predominantly use STRs or SNPs depending on the particular characteristics of the investigation. STR multiplexes are the standard strategy in normal cases as they are very discriminating and easily analyzed. Nevertheless, STRs are difficult to genotype in highly degraded DNA given the need of longer intact fragments to serve as template for successful amplification. SNPs were primarily introduced in forensic practice with the aim to overcome this limitation. They can be analyzed in much shorter fragments, thus enhancing the successful amplification of degraded DNA. In addition, SNPs have much lower mutation rates than STRs, which can be advantageous in kinship testing. However, in spite of the proved increased performance with degraded DNA and its utility as a complementary tool in kinship investigations, the implementation and usage of SNP assays in forensic laboratories is limited, mainly due to the laborious analytic workflows and/or need to implement new genotyping technologies. Considering this context, our project aimed to develop a bridging approach between the application of SNPs and STRs by using small insertion deletion polymorphisms. Small indels have great potential as genetic markers in forensics since they can combine desirable characteristics of both SNPs (possibility of analysis in very short fragments and low mutation rates) and STRs (simple analysis by fragment size). In brief, taking advantage of using small indels, it would be possible to develop genetic tools with the same potential as SNPs, while at the same time keeping the simple and widely established analytic methodology of STRs. Assuming a character eminently technical and methodological, the main goal of this work was to develop a set of simple genetic tools, easily implementable and useful in a wide range of applications in the fields of population and forensic genetics, based on indel multiplexing. In pursuing this global aim, intermediary objectives were defined: i) To evaluate the forensic informative power of SNPs in the Portuguese context by, for the first time, implementing a forensic SNP assay and creating a reference population database. At the beginning of this project, forensic SNP assays and population data were inexistent in Portugal, and therefore, preceding the development of new techniques, our first objective was to fulfil that urgent need and promptly enable a forensic tool for the analysis of degraded samples; ii) To select and optimize the multiplexed amplification of a set of autosomal indels highly informative for forensic applications (human identification and kinship testing). This objective involves the selection of small size biallelic indels spread throughout the entire human genome and presenting high heterozygosity in the human major population groups, as well as the development of a multiplex reaction for their simultaneous genotyping applying a short amplicon strategy suitable for the analysis of degraded DNA; iii) To select and optimize the multiplexed amplification of a set of X chromosome specific indels. This objective comprises the selection of small indels spread along the X chromosome and presenting high heterozygosity in the major human population groups, as well as the development of a multiplex reaction for their simultaneous genotyping applying a short amplicon strategy suitable for the analysis of degraded DNA; iv) To select and optimize the multiplexed amplification of a set of ancestry informative indels. This objective includes the selection of small indels showing high allele frequency differentials between populations of different continental origins, as well as the development of a multiplex reaction for their simultaneous genotyping; v) To genetically characterize the selected indel markers in a large number of samples from different worldwide populations (such as Africa, Europe, East Asia and America) using the multiplex systems developed in the previous points. This objective entails a comprehensive collection of population data in order to evaluate the genetic diversity of the selected indels in major human population groups; vi) To evaluate the forensic efficiency of the different indel panels. This objective implies a thorough analysis of the population data generated in the previous point including the assessment of statistical parameters of forensic interest and the accuracy in the assignment of biogeographic origin; vii) To test the applicability of small indels in defined real cases and evaluate the obtained results in light of the expected, and considering the target purposes of the different indel sets. Results and discussion At the beginning of this project, the typing methodologies adopted by forensic laboratories included the well-established STRs and SNPs, which were still in a development phase. Only months before, in a collaborative effort involving five forensic institutions, the SNPforID project group (www.snpforid.org) had reported a multiplex assay comprising 52 autosomal SNPs for human identification (Sanchez et al. 2006). With the availability of the SNPforID forensic assay to the scientific community, at the time, our immediate goal was to implement the method in our laboratory and genetically characterize the Portuguese population for the 52 SNPs (Pereira et al. 2008a, Article 1). This work constituted a first step in the implementation of SNP assays in Portuguese forensic genetics laboratories and established a national reference database, and thus launched the bases for an effective application of SNPs to forensic case work. Furthermore, since population data on autosomal SNPs variation in Africa were very limited (only three African populations from Somalia, Mozambique and Nigeria were included in the SNPforID population frequency browser; http://bioinformatics.cesga.es/snpforid/), we were encouraged to study two more population samples from sub-Saharan Africa existing in our laboratory: Angola and Uganda. The characterization of the 52 SNPs in these populations (Pereira et al. 2008b, Article 2) allowed to expand the coverage of the African continent regarding autosomal SNPs genetic diversity and provided an extra contribution to the SNPforID population frequency browser. The genetic characterization of the 52 SNPs revealed slightly higher levels of genetic diversity in the Portuguese population than in sub-Saharan Africans for this set of markers. Nonetheless, the global random match probabilities achieved with the 52 SNP set were highly informative in all populations, ranging from orders of magnitude of 10-16 to 10-20, clearly proving the value of the SNP set for forensic applications in these populations (Pereira et al. 2008a; 2008b). In comparison with previous experience with STRs, the only criticism on these methodologies relates to the laborious and time-consuming analytic workflow that SNaPshotTM implies. The protocol necessarily involves multiple steps (amplification, purification of amplified products, SBE reaction and final purification before capillary electrophoresis), consequently increasing manipulation and the number of possible variables affecting the end result. Also, the genotyping of alleles is significantly more complex than with STRs, mainly due to the SNaPshotTM chemistry used, which emits fluorescence signals of different intensity depending on the dye-label attached to each dideoxynucleotide. In sum, although based on more complex techniques than those involved in the study of STRs or indels, the implementation of an SNP based tool proved to have an informative power that can be helpful in some forensic contexts. After, this thesis aimed to develop a bridging approach between the application of SNPs and STRs, taking advantage of using small indels equalling the informative potential of SNPs while being easily analyzed by a single PCR and capillary electrophoresis. In order to cover the different needs in the forensic genetics field as much as possible, we focused on three core areas: - highly discriminating autosomal markers; - highly discriminating X chromosome specific markers; - ancestry informative markers. Highly discriminating autosomal markers In this section of the thesis we managed to challenge the status quo by capturing the same biggest advantages of SNPs for forensic applications in a simple assay that is sufficiently informative for most forensic cases and, at the same time, straightforward to implement in any laboratory (Pereira et al. 2009a, Article 3; Pereira & Gusmão in press, Book chapter). By collecting the information available on indels showing short-defined allele length variation, as well as the available population data in the literature and online databases, we were able to screen a pool of genetic markers with desirable characteristics for forensic applications: small indels showing high heterozygosity levels in major population groups. During the execution of this project, it became evident that far less information is available for indels in comparison with SNPs (see Introduction) or even STRs. A high number of STRs have been indicated for forensic practice (see for example Ruitberg et al. 2001; Butler 2005) and otherwise, SNPs underwent a revolution in the last few years, catching the wave of the International SNP Map Working Group and the International HapMap project, along with the extraordinary developments in genotyping technologies providing accurate data for several million SNPs throughout the genome (Sachidanandam et al. 2001; International HapMap Consortium 2003; 2005; Kim & Misra 2007; Perkel 2008; Ragoussis 2009). Still, essentially based on the pioneer work by Weber et al. (2002), we were able to select a set of small indels that are highly informative for forensic purposes across the genome, and subsequently optimize the genotyping of all markers in a single PCR multiplex followed by electrophoresis. This work demonstrated that a proper assessment of the existing data allied to a careful in silico design, could really bring to practise all the power that indels can offer in terms of short amplicon strategy and multiplexing capability in a single reaction (Pereira et al. 2009a). The set of 38 short biallelic markers has representative indels from all autosomes and achieves good levels of discrimination, adequate for most forensic practice. Even though the accumulated Random Match Probabilities (acRMP) are not able to equal standard STR kits when full profiles are obtained, it is noteworthy that the 38 indelplex acRMP exceeds in about four orders of magnitude the efficiency parameters of the most ¿direct competitor¿ mini-STR kit Minifiler (Applied Biosystems) (Mulero et al. 2008), which likewise the small indel multiplex, is indicated to the analysis of highly degraded samples. Moreover, indel systems are also starting to be viewed by biotechnology companies as promising tools to enter the forensic practice in the near future. Namely, a commercial kit comprising 30 small indels distributed in 19 autosomes and analyzable in short amplicons, became available for the scientific community in late 2009 (Mentype® DIPplex PCR Amplification Kit, Biotype, Dresden, Germany; Investigator DIPplex Kit, Qiagen, Hilden, Germany). The combined probability of identity values for the panel were reported only for Europeans, revealing to be around 100 times less discriminating than the 38 autosomal indel set developed during this thesis (Pereira et al. 2009a) for the same population group (3.57 x 10-15). Conversely, an attempt to commercialize a subset of 48 SNPs from the SNPforID assay ( GenPlex¿ HID System, Applied Biosystems; Phillips et al. 2007a) revealed unfruitful. Furthermore, during this phase, the performance of the indel multiplex was tested in several compromised samples from forensic case work (skeletal remains like bones and dental pulp extracts) and also from clinical research challenging materials (like paraffin-embedded tissues). The short amplicon strategy employed with small indels allowed to obtain full profiles where STRs had failed to provide sufficient information, highlighting the usefulness of these markers in forensic or limited quality/quantity samples (Pereira et al. 2009a; 2009b; Oliveira et al. 2009) (Articles 3, 4 and 5). The application of the indeplex notoriously overcame the evident difficulties from STRs in paraffin-embedded tissues extracts, and furthermore, served as a control for the possible occurrence of stochastic allele dropout events in this type of limited samples that otherwise could be biasing the loss of heterozygosity (LOH) analysis also under investigation for the same samples and patients (Oliveira et al. 2009, Article 5). The usefulness of stable low mutating indels in kinship investigations was also emphasized in Pereira et al. (2009b, Article 4) as an efficient complementary tool to STR analysis in more complex cases where microsatellites alone provide insufficient evidence, likewise for SNPs. This section provides an important example of the application of indel markers in human identification, particularly in challenging DNA samples. The main advantage of the indel approach over SNP alternative assays can ultimately be summarized in a single word: simplicity. And importantly in this case, simplicity means easiness, time and cost effectiveness; and most determinant in forensics, considerably reduces the steps involved in the genotyping of an informative set of biallelic genetic markers. The direct workflow minimizes manipulation, risks of contamination or sample mix-ups, and reduces to a minimum the number of variables affecting the end results. X chromosome indels (X-indels) Following the same rationale of bridging the gap between SNPs and STRs, in this part of the project we explored the potential of small X-indels. In a first approach to the use of X-indels we were able to multiplex 13 markers in a simple PCR reaction and use this to ascertain the interethnic admixture proportions in a urban population from northeastern Brazilian Amazonas (Ribeiro-Rodrigues et al. 2009, Article 6). Our results show an increased Native American and African ancestry proportion accompanied by a lower European influence in comparison with autosomal estimates for the same region ( Santos & Guerreiro 1995), thus corroborating the common sex-biased matting pattern reported for Latin American admixed populations. Moreover, the results obtained for the Native American component as recorded by the X chromosome were in perfect agreement with the expected value based on previous mtDNA and Y-chromosome data from this ancestral contributor group (Santos et al. 1999). In a second approach, subsequent efforts were made to provide a more complete tool to the scientific community. During this thesis we were able to set up a general tool for population and forensic genetic analyses comprising 32 X-indels in a unique multiplex reaction using a short amplicon strategy (Pereira et al. 2011, Article 7). The developed panel allows easy access to X chromosome information, especially useful in complex kinship investigations in which the commonly available tools provide limited information. The X-indel system implemented in our work has the advantages of combining a short amplicon strategy, simplicity of analysis and good multiplexing capacity in a single reaction, representing one step further in relation to currently existent biallellic X marker sets (e.g. Edelmann et al. 2009; Freitas et al. 2010; Resque et al. 2010; Tomas et al. 2010). Ancestry Informative indels (AIM-indels) Ancestry Informative Markers (AIMs) show high allele frequency divergence between different ancestral or geographically distant populations and are especially useful to infer the ancestral origin of individuals and estimate ancestry proportions in admixed individuals and populations. Clearly acknowledging the great value of AIMs in a wide range of applications, as demonstrated by several AIM-SNP panels in different studies, this last section of the thesis focused on the development of AIM sets that would be easy to use and implement in regular molecular or forensic genetics, without the need of new technologies or genotyping platforms. In a first study we developed a panel of 48 AIM-indels analyzed in three multiplex reactions capable of efficiently inferring ancestry and measure admixture proportions of the three main ancestral contributors to the formation of the Brazilian populations: Africans, Europeans and Native Americans (Santos et al. 2010, Article 8). The study involved a large collection of data from parental populations from the three origins, to thereafter use as a reference and test geographically and historically different Brazilian populations in order to evaluate the accuracy of the panel. While using distinct statistical approaches, the concordance of the results clearly confirmed the accuracy of the panel to assess ancestry estimates in trihybrid populations, as is the case of the Brazilian and most South American populations. The genotyping of a large number of samples is cost and time-effective and the ancestry estimates produced can be used to detect substructuring effects and correct for individual ancestry differences when calculating association parameters in case-control association studies (e.g. Ota et al. 2010; Da Silva et al. 2011; Tarazona-Santos et al. 2011). This efficiency is also patent in the assessment of ancestral membership proportions at an individual level, foreseeing the utility of the panel also in forensic investigations. In a recent research looking for signs of malaria selective pressure in the human pyruvate kinase (PK) encoding gene PKLR (Machado et al. 2010, Article 9) we studied groups of individuals with different malaria infection outcome as well as non-infected controls from two sub-Saharan African populations (Angola and Mozambique); and also a control and pyruvate kinase-deficient group of Portuguese individuals used for comparison. In this study we applied a subset of 32 AIM-indels described in Santos et al. (2010, Article 8) comprising markers informative of African and European ancestries with the aim to evaluate the structure of the African groups and to investigate if the Portuguese PK-deficient group could have a relevant African genetic component. The AIM analyses revealed that no substructuring existed between different malaria status groups within Mozambique or Angola and also, no differentiation existed among control and PK-deficient Portuguese. With these confirmations, safe conclusions could be drawn from the study of SNP and STR markers spanning along the PKLR gene and adjacent regions (chromosome 1q21) in the different studied groups. The genetic distances obtained between African and Portuguese groups for this specific fragment were considerably higher than random neutral markers usually show among the two continental groups, indicating that this region of the genome may have been subject to selective pressure, of which malaria is the most probable driving force (Machado et al. 2010). During the final phase of our project we aimed to set up a more generic and comprehensive AIM-indel tool that would be capable of efficiently measuring population admixture proportions of four different origins (African, European, East Asian and Native American) in a single reaction. In pursuing our goal we managed to push further the multiplexing potential of small indels and succeeded to optimize a new multiplex assay including 46 AIM-indels in a single reaction (Pereira et al. under revision, Article 10). All AIMs are analyzed in short fragments under 230 bp in a single PCR followed by capillary electrophoresis. Because the technique is straightforward, it represents a valuable alternative to the more commonly available AIM-SNP typing methods dependent on more complex protocols and/or implementation of new genotyping technologies. Despite the fact that a considerable number of AIM sets are reported to infer biogeographic ancestry of individuals based on DNA, it is also true that very few studies present practical tools applicable in the generality of molecular genetic laboratories or in specialized forensic laboratories dealing regularly with low quality/quantity samples (e.g. Phillips et al. 2007b; Lao et al. 2010; Santos et al. 2010). Only a few tools are available, but these involve a low number of markers, more laborious analytic workflows or requiring technologies not readily available in most laboratories. Conversely, we were able to establish a compromise solution harbouring a high number of AIMs, while using a simple method and amplicon lengths suitable for the analysis of low quality DNA (Pereira et al. under revision). The 46 AIM- indel panel proved to be highly accurate in inferring the ancestry of individuals and estimating ancestry proportions at the population level, as revealed by the thorough analyses involving HGDP-CEPH Diversity Panel samples from four continental groups (Africa, Europe, East Asia and Native America) and other populations from diverse geographic origins (Angola, Portugal, Taiwan and Brazilian Amazonas). Moreover, in spite of the assay being primarily designed for studies considering only four major population groups, it revealed an excellent capacity in distinguishing Oceanians (Pereira et al. under revision). In summary, in this final phase of the thesis we were able to provide the scientific community with two practical and polyvalent tools that can be used as a whole or in smaller dedicated subsets, and which will allow to obtain accurate individual or global ancestry estimates of different continental origins (Santos et al. 2010; Pereira et al. under revision). Given the high efficiency in the population assignment of individuals from diverse biogeographic origins, the AIM panels can be of great interest in forensic applications, particularly as an intelligence tool during investigations. Furthermore, ancestry estimates can be used in the correction of false positive results due to population stratification between cases and controls in association studies, or as simple and inexpensive tools to perform an initial screening of individuals and therefore genetically match cases and control individuals to then be used in genome-wide association studies. Conclusions Altogether, the results obtained during this thesis allowed to reach the following main conclusions: 1. For the 52 SNPs included in the SNPforID multiplex assay, the Portuguese population presents a genetic homogeneity between the three North, Central and South regions, allowing the use of a common database; and the high power of discrimination obtained confirms the utility of the 52-plex in identification studies. 2. The forensic efficiency parameters estimated for the 52 SNPs included in the SNPforID multiplex in two population samples from Angola and Uganda demonstrated the informative potential of this assay for identification purposes also in Sub-Saharan African populations. 3. A multiplex system using a short amplicon strategy and enabling the simultaneous genotyping of 38 highly polymorphic autosomal indels through a simple PCR and electrophoresis was optimized, which proved to represent a valuable approach in human identification studies, especially in challenging DNA cases, as a more straightforward alternative to SNP typing. 4. The 38plex of low mutating indels constitutes an efficient complementary tool to STR analysis in kinship investigations, particularly in cases where microsatellites alone provide insufficient evidence and extra information is needed (namely, in paternity tests presenting few genetic transmission inconsistencies with STRs). 5. A 13 X-linked indel PCR multiplex assay was optimized and proves to be an accurate tool when assessing interethnic admixture in trihybrid populations with African, European, and Native American ancestries. 6. A simple and informative X-indel multiplex was optimized allowing the genotyping of 32 indels distributed along the X chromosome in a single PCR and single CE, and using short PCR fragments suitable for the analysis of degraded samples. 7. The 32 X-indels showed high heterozygosities in distinct population groups, which indicate a high forensic efficiency in identification studies. Despite the biallelic nature of indels, the combined mean exclusion chance values are adequate for most forensic demands. This tool allows easy access to X chromosome information, especially useful in complex kinship analysis in which autosomal markers provide limited information. 8. A panel of 48 AIM-indels, optimized in three multiplexes, demonstrated to be a valuable tool for estimating individual and global ancestry proportions in populations with three main contributing ancestral origins: African, European and Native American. The ability to accurately infer interethnic admixture in Brazilian populations proves the usefulness of this marker set for assessing population substructure sharing trihybrid ancestry patterns. 9. A new multiplex was optimized for the simultaneous genotyping of 46 autosomal AIM-indels through a simple PCR and electrophoresis, hence representing a more straightforward alternative to the commonly available methods dependent on multi-step protocols and/or implementation of new technologies. 10. The 46 AIM-indels proved efficient in inferring the ancestral origin of individuals and estimating ancestry proportions of admixed individuals or populations from four different origins: African, European, East Asian and Native American. Furthermore, the assay clearly revealed the capacity of distinguishing population groups at the continental level (including Oceanians) and demonstrated a high accuracy in assigning the probable biogeographic origin of individuals, thus confirming the utility in forensic investigations. 11. A worldwide population database was established including HGDP-CEPH Diversity Panel genetic data for the 46 AIM-indels, which can be used in the future by the research community as reference from five continental groups (Africa, Europe, East Asia, Native America and Oceania), namely as training sets to assess the probable biogeographic origin of unknown testing individuals. 12. As a whole, the results obtained during this thesis confirmed the potential of small indels as genetic markers in population and forensic genetic applications. By presenting essentially the same benefits from SNPs and keeping the ease of analysis of STRs, small indels allowed the development of a series of powerful tools for forensic applications that have a straightforward analytic process and take advantage of well established methodologies and technologies already existent in most forensic laboratories. (Please consult the main Thesis document for full References.)