Comparing genome versus proteome-based identification of clinical bacterial isolates.

Valentina Galata, Christina Backes, Cédric Christian Laczny, Georg Hemmrich-Stanisak, Howard Li, Laura Smoot, Andreas Emanuel Posch, Susanne Schmolke, Markus Bischoff, Lutz von Müller, Achim Plum, Andre Franke, Andreas Keller
Year of publication:
Journal title abbreviated:
Brief. Bioinformatics
Journal title long:
Briefings in bioinformatics
Impact factor:
Whole-genome sequencing (WGS) is gaining importance in the analysis of bacterial cultures derived from patients with infectious diseases. Existing computational tools for WGS-based identification have, however, been evaluated on previously defined data relying thereby unwarily on the available taxonomic information.Here, we newly sequenced 846 clinical gram-negative bacterial isolates representing multiple distinct genera and compared the performance of five tools (CLARK, Kaiju, Kraken, DIAMOND/MEGAN and TUIT). To establish a faithful 'gold standard', the expert-driven taxonomy was compared with identifications based on matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) analysis. Additionally, the tools were also evaluated using a data set of 200 Staphylococcus aureus isolates.CLARK and Kraken (with k =31) performed best with 626 (100%) and 193 (99.5%) correct species classifications for the gram-negative and S aureus isolates, respectively. Moreover, CLARK and Kraken demonstrated highest mean F-measure values (85.5/87.9% and 94.4/94.7% for the two data sets, respectively) in comparison with DIAMOND/MEGAN (71 and 85.3%), Kaiju (41.8 and 18.9%) and TUIT (34.5 and 86.5%). Finally, CLARK, Kaiju and Kraken outperformed the other tools by a factor of 30 to 170 fold in terms of runtime.We conclude that the application of nucleotide-based tools using k-mers-e.g. CLARK or Kraken-allows for accurate and fast taxonomic characterization of bacterial isolates from WGS data. Hence, our results suggest WGS-based genotyping to be a promising alternative to the MS-based biotyping in clinical settings. Moreover, we suggest that complementary information should be used for the evaluation of taxonomic classification tools, as public databases may suffer from suboptimal annotations.