Tutorial: Analysis of Complex Traits Using R: Case studies |
Owing to recent advances in genotyping and sequencing technologies and the successes of several international collaborative projects, there is a considerable interest in genetic analysis of complex traits which include common diseases [1] and other quantitative measurements [2]. The analysis customarily involves a large number of single-nucleotide polymorphisms (SNPs), the most abundant genetic variants in human genome. A lot of work have been done using a variety of computer software including R but a greater awareness of R, a synthesis of the current work together with contributions from a broader community are required [3].
This tutorial intends to give an overview of approaches for genetic analysis of complex traits, including heritability, segregation, linkage and association studies, paying attention to the statistical models, some successful stories and indication of their limitations. A particular focus is on the study design and analysis of genomewide association studies (GWAS), and the instructor’s own involvement in such analysis will be described [2,4]. A complementary part of this tutorial concerns about computer software in these analyses including R, which reflects but not limited to the instructor’s own work [3,5-7]. Statistical and computational challenges are expected to be exposed through both parts.
Topics associated with and motivated from the case studies range from fundamental concepts such as measurement of risk and heritability to analysis of genomic data such as Hardy-Weinberg equilibrium, linkage equilibrium to more sophisticated modelling such as prospective and retrospective models of haplotypes, gene-environmental interactions [8] and pathways [9]. Related aspects include haplotype analysis, imputation of genotypes and meta-analysis, merits of some frequentist and Bayesian methods as used in our GWAS of obesity.
The instructor is an investigator scientist in genetics. He obtained his degrees in medicine, statistics and genetics, and has worked on a broad range of problems in statistical genetics and genetic epidemiology over the last ten years. He and his colleagues have recently been involved in study design and analysis in several large epidemiological cohorts and collaborative work such as the Genomic Investigation of Anthropometric Traits (GIANT) consortium. Besides materials to be covered in the tutorial, he also did other work on genetic data analysis with R [10,11] and programs in other computing environment and languages such as C and SAS [12-14].
The potential attendees will be researchers with basic knowledge in statistics and computing who wish to get involved with or improve their understanding of genetic data analysis. However, it will also be useful to professionals and researchers actively engaged in analysis of genetic data and/or development of computational tools in R or other environments. It is expected that course materials will refresh and interact with attendees' views on design and analysis of genomic data in humans while generating interest to researchers in plant and animal sciences.