Construction and benchmarking of a multi-ethnic reference panel for the imputation of HLA class I and II alleles.

Frauke Degenhardt, Mareike Wendorff, Michael Wittig, Eva Ellinghaus, Lisa W Datta, John Schembri, Siew C Ng, Elisa Rosati, Matthias Hübenthal, David Ellinghaus, Eun S Jung, Wolfgang Lieb, Shifteh Abedian, Reza Malekzadeh, Jae H Cheon, Pierre Ellul, Ajit Sood, Vandana Midha, Thelma Bk, Sunny H Wong, Stefan Schreiber, Keiko Yamazaki, Michiaki Kubo, Gabrielle Boucher, John Rioux, Tobias L Lenz, Steven R Brant, Andre Franke
Year of publication:
Journal title abbreviated:
Journal title long:
Human molecular genetics
Impact factor:
Genotype imputation of the human leukocyte antigen (HLA) region is a cost-effective means to infer classical HLA alleles from inexpensive and dense SNP array data. In the research setting, imputation helps avoid costs for wet lab-based HLA typing and thus renders association analyses of the HLA in large cohorts feasible. Yet, most HLA imputation reference panels target Caucasian ethnicities and multi-ethnic panels are scarce. We compiled a high-quality multi-ethnic reference panel based on genotypes measured with lllumina's Immunochip genotyping array and HLA types established using a high-resolution next generation sequencing approach. Our reference panel includes more than 1,300 samples from Germany, Malta, China, India, Iran, Japan and Korea and samples of African American ancestry for all classical HLA class I and II alleles including HLA-DRB3/4/5. Applying extensive cross-validation, we benchmarked the imputation using the HLA imputation tool HIBAG, our multi-ethnic reference and an independent, previously published data set compiled of subpopulations of the 1000 Genomes project. We achieved average imputation accuracies higher than 0.924 for the commonly studied HLA-A, -B, -C, -DQB1 and -DRB1 genes across all ethnicities. We investigated allele-specific imputation challenges in regard to geographic origin of the samples using sensitivity and specificity measurements as well as allele frequencies and identified HLA alleles that are challenging to impute for each of the populations separately. In conclusion, our new multi-ethnic reference data set allows for high resolution HLA imputation of genotypes at all classical HLA class I and II genes including the HLA-DRB3/4/5 loci based on diverse ancestry populations.