MATH 7750: Statistical Theories Applicable to Genomics (fall 2010)

Instructor: Gene Hwang

There are many statistical concepts that are useful in genomics. One particular problem with genomics (e.g. microarray data analysis) is that the number of populations or genes is large. As a result there are a huge number of hypotheses. How to test these types of hypotheses simultaneously? We will discuss concepts such as family-wise error rate, false discovery rate (FDR) of Benjamini and Hochberg (1995 JRSS B) and Storey's papers relating to pFDR.

What other statistical inferential technique may be useful for a large number of populations or genes? The traditional one population approach assuming that all populations are different is too inefficient. It seems interesting and important to have techniques that can combine all observations from all populations together and when the populations are similar they "borrow the strength" from each other and when the populations are very different they go separate ways. In fact, Shrinkage (or Empirical Bayes) technique, or equivalently the BLUP in mixed model can do this. So the course will spend some time discussing these techniques. We will discuss the point estimation and the confidence interval construction. A new approach called the selected mean approach proves to be promising and will be discussed.

Other topics may include permutation tests and QTL identification if time allows. This course is mainly about the (mathematical) statistical theory and hence in many lectures the focus was to prove theorems. It is recommended that you should have some statistic courses such as ORIE 6700 or MATH 6740.