Identifying Crohn’s disease signal from variome analysis

Authors

Yanran Wang, Yuri Astrakhan, Britt-Sabina Petersen, Stefan Schreiber, Andre Franke, Yana Bromberg

Year of publication

2017

Journal

UKN

Volume

-

Issue

-

ISSN

-

Impact factor

-

Abstract

Background

After many years of concentrated research efforts, the exact cause of Crohn’s disease remains unknown. Its accurate diagnosis, however, helps in management and even preventing the onset of disease. Genome-wide association studies have identified 140 loci associated with CD, but these carry very small log odds ratios and are uninformative for diagnoses.

Results

Here we describe a machine learning method – AVA,Dx (Analysis of Variation for Association with Disease) – that uses whole exome sequencing data to make predictions of CD status. Using the person-specific variation in these genes from a panel of only 111 individuals, we built disease-prediction models informative of previously undiscovered disease genes. In this panel, our models differentiate CD patients from healthy controls with 71% precision and 73% recall at the default cutoff. By additionally accounting for batch effects, we are also able to predict individual CD status for previously unseen individuals from a separate CD study (84% precision, 73% recall).

Conclusions

Larger training panels and additional features, including regulatory variants and environmental factors, e.g. human-associated microbiota, are expected to improve model performance. However, current results already position AVA,Dx as both an effective method for highlighting pathogenesis pathways and as a simple Crohn’s disease risk analysis tool, which can improve clinical diagnostic time and accuracy.