A benchmark of hemoglobin blocking during library preparation for mRNA-Sequencing of human blood samples.

Florian Uellendahl-Werth, Markus Wolfien, Andre Franke, Olaf Wolkenhauer, David Ellinghaus
Year of publication:
Journal title abbreviated:
Journal title long:
Scientific Reports
Impact factor:
RNA-Sequencing (RNA-Seq) of peripheral blood can be a valuable source of information for investigating the status and mechanism of diseases. However, blood contains 50-80% unwanted hemoglobin (Hb) transcripts. Lexogen's QuantSeq mRNA-Seq-Kit for Illumina RNA-Seq features a 'Globin Block' (GB) module that depletes Hb cDNAs during library preparation. Here, we aimed to assess GB's effectiveness and checked for technical biases attributable to GB. Using whole blood total RNA samples of 91 healthy individuals, we sequenced 91 pairs of GB and non-blocked samples (noGB) on Illumina HiSeq2500 and 8 pairs of GB/noGB technical replicates on HiSeq4000. GB reduced the fraction of Hb transcripts from 43% (s.d. 14%) to 8.0% (s.d. 4.3%). From GB samples we detected 1,397 more expressed genes at approximately 11 million reads per RNA-isolate. Enrichment and differential expression analyses did not reveal significant differences for GB and noGB samples with respect to molecular function. In contrast to results from studies that have examined the performance of GB during RNA isolation, we were able to assign GB to corresponding noGB samples (from multiple sequencing runs on HiSeq2500) with at least 89.8% accuracy from the complete correlation matrix of all GB/GB, noGB/noGB and GB/noGB pairs. However, the use of different sequencers (HiSeq2500 vs HiSeq4000) impaired assignment of technical replicates, whereas assignment of GB to corresponding noGB samples worked perfectly when sequencing on one lane on HiSeq4000. Lexogen's GB RNA-Seq module is a valuable addition during mRNA-Seq library preparation which works even with low amounts of input total RNA (50 ng per sample). GB facilitated the detection of low abundant transcripts and yielded more non-hemoglobin reads, while preserving biological information. We observed that differences in sequencing run and platform have a far greater effect on technical variation than the use of GB.