Acute lymphoblastic leukemia (ALL) is a malignant disease of white blood cells in which early lymphoid precursors proliferate and replace the normal hematopoietic tissue of the bone marrow. More than 60% of patients diagnosed with ALL are children below the age of 15. The annual incidence rates of childhood ALL vary worldwide between one and four new cases per 100,000 children younger than 15. Fortunately, the therapy of ALL in childhood is quite successful, with overall cure rates of about 80% in developed countries, and research still focuses on clinical and biological aspects of ALL to identify factors that are relevant for prognosis in order to individually adapt and improve therapy protocols.
Our Institute and the Paediatric Clinic in Kiel have jointly coordinated a “Pan-Omic” project for the Bundesamt für Strahlenschutz (BfS, Federal German Office for Radiation Protection), in collaboration with the EMBL, the MPI for Molecular Genetics, the Pedriatic Clinics of Düsseldorf and the Charité in Berlin, and the University Children’s Hospital of Zurich. For this project, specimens were obtained of children suffering from ALL and genome, exome, transcriptome and epigenome sequencing and analyses performed in order to identify potentially causative somatic mutations and structural variants.
In our leukemia project, several different bioinformatic pipelines were used within our consortium. The results were rigorously validated by Sanger sequencing and by data integration. The data integration was led by the MPI for Molecular Genetics in Berlin. At our institute, the bioinformatics included the calling, annotation and filtering of somatic SNVs in paired cancer samples (germline and tumor from the same patient) and the calling, annotation and filtering of methylation ratios from paired samples. The bioinformatic tools used for this project in our institute include pibase/samtools/GATK/VarScan2/snpActs/Annovar/Blat and custom tools programmed in C++, Python and R. A new pibase version was developed for the high-throughput multi-omic validation of somatic genomic SNVs in exome or transcriptome BAM files.