Investigation of the fine structure of European populations with applications to disease association studies.

Simon C Heath, Ivo G Gut, Paul Brennan, James D McKay, Vladimir Bencko, Eleonora Fabianova, Lenka Foretova, Michael Georges, Vladimir Janout, Michael Kabesch, Hans E Krokan, Maiken B Elvestad, Jolanta Lissowska, Dana Mates, Peter Rudnai, Frank Skorpen, Stefan Schreiber, José M Soria, Ann-Christine Syvänen, Pierre Meneton, Serge Herçberg, Pilar Galan, Neonilia Szeszenia-Dabrowska, David Zaridze, Emmanuel Génin, Lon R Cardon, Mark Lathrop
Year of publication:
Journal title abbreviated:
Journal title long:
European journal of human genetics
Impact factor:
An investigation into fine-scale European population structure was carried out using high-density genetic variation on nearly 6000 individuals originating from across Europe. The individuals were collected as control samples and were genotyped with more than 300 000 SNPs in genome-wide association studies using the Illumina Infinium platform. A major East-West gradient from Russian (Moscow) samples to Spanish samples was identified as the first principal component (PC) of the genetic diversity. The second PC identified a North-South gradient from Norway and Sweden to Romania and Spain. Variation of frequencies at markers in three separate genomic regions, surrounding LCT, HLA and HERC2, were strongly associated with this gradient. The next 18 PCs also accounted for a significant proportion of genetic diversity observed in the sample. We present a method to predict the ethnic origin of samples by comparing the sample genotypes with those from a reference set of samples of known origin. These predictions can be performed using just summary information on the known samples, and individual genotype data are not required. We discuss issues raised by these data and analyses for association studies including the matching of case-only cohorts to appropriate pre-collected control samples for genome-wide association studies.