
Merging High-Throughput, Amplicon-Based Second and Third Generation Sequencing Data: An Integrative and Modular Data Analysis Framework for Haplotype Prediction and Output Evaluation.
Authors
Sylvia Mink, Christian Attenberger, Yannik Busch, Johanna Kiefer, Wolfgang Peter, Janne Cadamuro, Tim A Steiert, Andre Franke, Christoph Gassner
Year of publication
2025Journal
INT J MOL SCIVolume
26Issue
7Abstract
Despite providing highly accurate results, the short reads generated by second generation sequencing have major limitations in mapping complex genomic regions. Longer reads can resolve these issues and additionally phase distant variants. The third generation sequencing platform ONT currently achieves the longest sequencing reads but falls short in sequencing accuracy. Additionally, deriving phased haplotypes from amplicon-based NGS data remains a complex and time-consuming task that requires extensive bioinformatic expertise. We constructed an integrative, open-access modular data-analysis framework that allows for automated processing of high-throughput sequencing data from both second (Illumina) and third generation (ONT) sequencing platforms, combining the strengths of both technologies. Variant information is automatically evaluated and color-coded for discrepancies. Haplotypes are listed by frequency. All parts of the framework can be used independently. The framework’s performance was validated using synthetic and tested with real-life data by analyzing partly homologous FUT1/2/3 sequencing data from 400 blood donors.