Year of publication:
2017
Volume:
-
Issue:
-
Issn:
1367-4803
Journal title abbreviated:
BIOINFORMATICS
Journal title long:
Bioinformatics
Impact factor:
6.931
Pubmed:
Abstract:
While the amount of small non-coding RNA sequencing data is continuously increasing, it is still unclear to which extent small RNAs are represented in the human genome.In this study we analyzed 303 billion sequencing reads from nearly 25,000 data sets to answer this question. We determined that 0.8% of the human genome are reliably covered by 874,123 regions with an average length of 31nt. On the basis of these regions, we found that among the known small non-coding RNA classes, microRNAs were the most prevalent. In subsequent steps, we characterized variations of miRNAs and performed a staged validation of 11,877 candidate miRNAs. Of these, many were actually expressed and significantly dysregulated in lung cancer. Selected candidates were finally validated by northern blots. While isolated miRNAs could still be present in the human genome, our presented set likely contains the largest fraction of human miRNAs.andreas.keller@ccb.uni-saarland.de.Supplementary data are available at Bioinformatics online.