Reference-Based Haplotype Phasing with FPGAs

Lars Wienbrandt, Jan Christian Kässens, David Ellinghaus
Year of publication:
Journal title abbreviated:
Journal title long:
Unknown Journal
Haplotype phasing of individual samples is commonly carried out as a precursor step before genotype imputation to reduce the runtime complexity of the imputation step and to improve imputation accuracy. The phasing process is time-consuming and generally exceeds hours even on server-grade computing systems. Loh et al. recently introduced a fast and effective reference-based haplotype phasing software named EAGLE2 which scales linearly with the number of reference samples and variants to phase. We discovered that from the several steps of the EAGLE2 phasing process, data preparation for the internally used HapHedge data structure already consumes about half of the total runtime in general use cases. We addressed this problem by introducing a new design for reconfigurable architectures that accelerates this part of the software by a factor of up to 29 on a Xilinx Kintex UltraScale FPGA, resulting in a total speedup of the complete phasing process of almost 2 (the theoretical limit according to Amdahl’s Law) when compared to a server-grade computing system with two Intel Xeon CPUs. As a result, we reduced the EAGLE2 runtime of genome-wide phasing of 520,000 variants in 2500 samples using the 1000 Genomes Project reference panel from 68 min to 39 min on our system while maintaining quality.