The application of Nanopore sequencing for variant calling on the human mitochondrial DNA

  • Anton Shikov Genetics Laboratory, City Hospital No. 40, ul. Borisova, 9, Saint Petersburg, 197706, Russian Federation; All-Russia Research Institute for Agricultural Microbiology, Shosse Podbel'skogo, 3, Saint Petersburg, 190608, Russian Federation; Faculty of Medicine, Saint Petersburg State University, 21-ya liniya, 8a, Saint Petersburg, 199106, Russian Federation https://orcid.org/0000-0001-7084-0177
  • Viktoriya Tsay Genetics Laboratory, City Hospital No. 40, ul. Borisova, 9, Saint Petersburg, 197706, Russian Federation https://orcid.org/0000-0001-6488-8369
  • Mikhail Fedyakov Genetics Laboratory, City Hospital No. 40, ul. Borisova, 9, Saint Petersburg, 197706, Russian Federation https://orcid.org/0000-0002-3291-3811
  • Yuri Eismont Genetics Laboratory, City Hospital No. 40, ul. Borisova, 9, Saint Petersburg, 197706, Russian Federation https://orcid.org/0000-0002-4828-8053
  • Alena Rudnik Genetics Laboratory, City Hospital No. 40, ul. Borisova, 9, Saint Petersburg, 197706, Russian Federation https://orcid.org/0000-0001-9315-1040
  • Stanislav Urasov Genetics Laboratory, City Hospital No. 40, ul. Borisova, 9, Saint Petersburg, 197706, Russian Federation https://orcid.org/0000-0002-5441-2911
  • Sergey Sherbak Genetics Laboratory, City Hospital No. 40, ul. Borisova, 9, Saint Petersburg, 197706, Russian Federation; Faculty of Medicine, Saint Petersburg State University, 21-ya liniya, 8a, Saint Petersburg, 199106, Russian Federation https://orcid.org/0000-0001-5036-1259
  • Oleg Glotov Genetics Laboratory, City Hospital No. 40, ul. Borisova, 9, Saint Petersburg, 197706, Russian Federation; Department of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, Mendeleyevskaya liniya, 3, Saint Petersburg, 199034, Russian Federation https://orcid.org/0000-0002-0091-2224

Abstract

The emergence of long-read sequencing technologies has made a revolutionary step in genome biology and medicine. However, long reads are characterized by a relatively high error rate, impairing their usage for variant calling as a part of routine practice. Thus, we here examine different popular variant callers on long-read sequences of the human mitochondrial genome, convenient in terms of small size and easily obtained high coverage. The sequencing of mitochondrial DNA from 8 patients was conducted via Illumina (MiSeq) and the Oxford Nanopore platform (MinION), with the former utilized as a gold standard when evaluating variant calling’s accuracy. We used a conventional GATK3-BWA-based pipeline for paired-end reads and Guppy basecaller coupled with minimap2 for MinION data, respectively. We then compared the outputs of Clairvoyante, Nanopolish, GATK3, Longshot, DeepVariant, and Varscan tools applied on long-read alignments by analyzing false-positive and false-negative rates. While for most callers, raw signals represented false positives due to homopolymeric errors, Nanopolish demonstrated both high similarity (Jaccard coefficient of 0.82) and a comparable number of calls with the Illumina data (140 vs. 154) with the best performance according to AUC (area under ROC curve, 0.953) as well. In sum, our results, despite being obtained from a small dataset, provide evidence that sufficient coverage coupled with an optimal pipeline could make long reads of mitochondrial DNA applicable for variant calling.

Keywords:

next-generation sequencing, Oxford Nanopore, Illumina, variant calling, mitochondrial DNA

Downloads

Download data is not yet available.
 

References

Aganezov, S., Goodwin, S., Sherman, R. M., Sedlazeck, F. J., Arun, G., Bhatia, S., Lee, I., Kirsche, M., Wappel, R., Kramer, M., Kostroff, K., Spector, D. L., Timp, W., McCombie, W. R., and Schatz, M. C. 2020. Comprehensive analysis of structural variants in breast cancer genomes using single-molecule sequencing. Genome Research 30(9):1258–1273. https://doi.org/10.1101/gr.260497.119

Alkanaq, A. N., Hamanaka, K., Sekiguchi, F., Taguri, M., Takata, A., Miyake, N., Miyatake, S., Mizuguchi, T., and Matsumoto, N. 2019. Comparison of mitochondrial DNA variants detection using short- and long-read sequencing. Journal of Human Genetics 64(11):1107–1116. https://doi.org/10.1038/s10038-019-0654-9

Ardui, S., Ameur, A., Vermeesch, J. R., and Hestand, M. S. 2018. Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Research 46(5):2159–2168. https://doi.org/10.1093/nar/gky066

Van der Auwera, G. A., Carneiro, M. O., Hartl, C., Poplin, R., Del Angel, G., Levy-Moonshine, A., Jordan, T., Shakir, K., Roazen, D., Thibault, J., Banks, E., Garimella, K. V., Altshuler, D., Gabriel, S., and DePristo, M. A. 2013. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Current Protocols in Bioinformatics 43(1110):11.10.1-11.10.33. https://doi.org/10.1002/0471250953.bi1110s43

Banoei, M. M., Houshmand, M., Panahi, M. S. S., Shariati, P., Rostami, M., Manshadi, M. D., and Majidizadeh, T. 2007. Huntington’s disease and mitochondrial DNA deletions: event or regular mechanism for mutant huntingtin protein and CAG repeats expansion?! Cellular and Molecular Neurobiology 27(7):867–875. https://doi.org/10.1007/s10571-007-9206-5

Bansal, V. and Bafna, V. 2008. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16):153–159. https://doi.org/10.1093/bioinformatics/btn298

Bates, D., Mächler, M., Bolker, B., and Walker, S. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1). https://doi.org/10.18637/jss.v067.i01

Bowden, R., Davies, R. W., Heger, A., Pagnamenta, A. T., de Cesare, M., Oikkonen, L. E., Parkes, D., Freeman, C., Dhalla, F., Patel, S. Y., Popitsch, N., Ip, C. L. C., Roberts, H. E., Salatino, S., Lockstone, H., Lunter, G., Taylor, J. C., Buck, D., Simpson, M. A., and Donnelly, P. 2019. Sequencing of human genomes with nanopore technology. Nature Communications 10(1):1–9. https://doi.org/10.1038/s41467-019-09637-5

Brandhagen, M. D., Just, R. S., and Irwin, J. A. 2020. Validation of NGS for mitochondrial DNA casework at the FBI Laboratory. Forensic Science International: Genetics 44:102151. https://doi.org/10.1016/j.fsigen.2019.102151

Cingolani, P., Platts, A., Wang, L. L., Coon, M., Nguyen, T., Wang, L., Land, S. J., Lu, X., and Ruden, D. M. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6(2):80–92. https://doi.org/10.4161/fly.19695

Cornelis, S., Gansemans, Y., Vander Plaetsen, A. S., Weymaere, J., Willems, S., Deforce, D., and Van Nieuwerburgh, F. 2019. Forensic tri-allelic SNP genotyping using nanopore sequencing. Forensic Science International: Genetics 38:204–210. https://doi.org/10.1016/j.fsigen.2018.11.012

Coxhead, J., Kurzawa-Akanbi, M., Hussain, R., Pyle, A., Chinnery, P., and Hudson, G. 2016. Somatic mtDNA variation is an important component of Parkinson’s disease. Neurobiology of Aging 38:217.e1–217.e6. https://doi.org/10.1016/j.neurobiolaging.2015.10.036

Cumbo, C., Minervini, C. F., Orsini, P., Anelli, L., Zagaria, A., Minervini, A., Coccaro, N., Impera, L., Tota, G., Parciante, E., Conserva, M. R., Spinelli, O., Rambaldi, A., Specchia, G., and Albano, F. 2019. Nanopore targeted sequencing for rapid gene mutations detection in acute myeloid leukemia. Genes 10(12):1026. https://doi.org/10.3390/genes10121026

Dashti, M., Alsaleh, H., Eaaswarkhanth, M., John, S. E., Nizam, R., Melhem, M., Hebbar, P., Sharma, P., Al-Mulla, F., and Thanaraj, T. A. 2021. Delineation of mitochondrial DNA variants from exome sequencing data and association of haplogroups with obesity in Kuwait. Frontiers in Genetics 12:626260. https://doi.org/10.3389/fgene.2021.626260

DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V, Maguire, J. R., Hartl, C., Philippakis, A. A., del Angel, G., Rivas, M. A., Hanna, M., McKenna, A., Fennell, T. J., Kernytsky, A. M., Sivachenko, A.,Y., Cibulskis, K., Gabriel, S. B., Altshuler, D., and Daly, M. J. 2011. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics 43(5):491–498. https://doi.org/10.1038/ng.806

Dhorne-Pollet, S., Barrey, E., and Pollet, N. 2020. A new method for long-read sequencing of animal mitochondrial genomes: application to the identification of equine mitochondrial DNA variants. BMC Genomics 21(1):785. https://doi.org/10.1186/s12864-020-07183-9

Edge, P. and Bansal, V. 2019. Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing. Nature Communications 10(1):564443. https://doi.org/10.1038/s41467-019-12493-y

Franco-Sierra, N. D. and Díaz-Nieto, J. F. 2020. Rapid mitochondrial genome sequencing based on Oxford Nanopore Sequencing and a proxy for vertebrate species identification. Ecology and Evolution 10(7):3544–3560. https://doi.org/https://doi.org/10.1002/ece3.6151

Goodwin, S., McPherson, J. D., and McCombie, W. R. 2016. Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics 17(6):333–351. https://doi.org/10.1038/nrg.2016.49

Greig, D. R., Jenkins, C., Gharbia, S., and Dallman, T. J. 2019. Comparison of single-nucleotide variants identified by Illumina and Oxford Nanopore technologies in the context of a potential outbreak of Shiga toxin-producing Escherichia coli. GigaScience 8(8):1–12. https://doi.org/10.1093/gigascience/giz104

Koboldt, D. C. 2020. Best practices for variant calling in clinical sequencing. Genome Medicine 12(1):91. https://doi.org/10.1186/s13073-020-00791-w

Koboldt, D. C., Zhang, Q., Larson, D. E., Shen, D., McLellan, M. D., Lin, L., Miller, C. A., Mardis, E. R., Ding, L., and Wilson, R. K. 2012. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Research 22(3):568–576. https://doi.org/10.1101/gr.129684.111

Lee, H.-C., Li, S.-H., Lin, J.-C., Wu, C.-C., Yeh, D.-C., and Wei, Y.-H. 2004. Somatic mutations in the D-loop and decrease in the copy number of mitochondrial DNA in human hepatocellular carcinoma. Mutation Research 547(1–2):71–78. https://doi.org/10.1016/j.mrfmmm.2003.12.011

Li, H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34(18):3094–3100. https://doi.org/10.1093/bioinformatics/bty191

Li, H. and Durbin, R. 2010. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26(5):589–595. https://doi.org/10.1093/bioinformatics/btp698

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079. https://doi.org/10.1093/bioinformatics/btp352

Li, H., Slone, J., Fei, L., and Huang, T. 2019. Mitochondrial DNA variants and common diseases: a mathematical model for the diversity of age-related mtDNA mutations. Cells 8(6):608. https://doi.org/10.3390/cells8060608

Loman, N. J., Quick, J., and Simpson J. T. 2015. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nature Methods 12(8):733–735. https://doi.org/10.1038/nmeth.3444

Luo, R., Sedlazeck, F. J., Lam, T.-W., and Schatz, M. C. 2019. A multi-task convolutional deep neural network for variant calling in single molecule sequencing. Nature Communications 10(1):998. https://doi.org/10.1038/s41467-019-09025-z

Maestri, S., Maturo, M. G., Cosentino, E., Marcolungo, L., Iadarola, B., Fortunati, E., Rossato, M., and Delledonne, M. 2020. A long-read sequencing approach for direct haplotype phasing in clinical settings. International Journal of Molecular Sciences 21(23):9177. https://doi.org/10.3390/ijms21239177

Magdy, T., Kuo, H., and Burridge, P. W. 2020. Precise and cost-effective nanopore sequencing for post-GWAS fine-mapping and causal variant identification. iScience 23(4):100971. https://doi.org/10.1016/j.isci.2020.100971

Mannelli, M., Rapizzi, E., Fucci, R., Canu, L., Ercolino, T., Luconi, M., and Young, W. F. J. 2015. 15 YEARS OF PARAGANGLIOMA: Metabolism and pheochromocytoma/paraganglioma. Endocrine-Related Cancer 22(4):T83–T90. https://doi.org/10.1530/ERC-15-0215

Masutani, B., Arimura, S., and Morishita, S. 2021. Investigating the mitochondrial genomic landscape of Arabidopsis thaliana by long-read sequencing. PLOS Computational Biology 17(1):e1008597. https://doi.org/10.1371/journal.pcbi.1008597

Menzel, P., Frellsen, J., Plass, M., Rasmussen, S. H., and Krogh, A. 2013. On the accuracy of short read mapping. Methods in Molecular Biology 1038:39–59. https://doi.org/10.1007/978-1-62703-514-9_3

Naing, A., Kenchaiah, M., Krishnan, B., Mir, F., Charnley, A., Egan, C., and Bano, G. 2014. Maternally inherited diabetes and deafness (MIDD): diagnosis and management. Journal of Diabetes and its Complications 28(4):542–546. https://doi.org/10.1016/j.jdiacomp.2014.03.006

Onyango, I. G., Dennis, J., and Khan, S. M. 2016. Mitochondrial dysfunction in Alzheimer’s disease and the rationale for bioenergetics based therapies. Aging and Disease 7(2):201–214. https://doi.org/10.14336/AD.2015.1007

Orsini, P., Minervini, C. F., Cumbo, C., Anelli, L., Zagaria, A., Minervini, A., Coccaro, N., Tota, G., Casieri, P., Impera, L., Parciante, E., Brunetti, C., Giordano, A., Specchia, G., and Albano, F. 2018. Design and MinION testing of a nanopore targeted gene sequencing panel for chronic lymphocytic leukemia. Scientific Reports 8(1):1–10. https://doi.org/10.1038/s41598-018-30330-y

Patel, T. H., Norman, L., Chang, S., Abedi, S., Liu, C., Chwa, M., Atilano, S. R., Thaker, K., Lu, S., Jazwinski, S. M., Miceli, M. V., Udar, N., Bota, D., and Kenney, M. C. 2019. European mtDNA variants are associated with differential responses to cisplatin, an anticancer drug: implications for drug resistance and side effects . Frontiers in Oncology 9:640. https://doi.org/10.3389/fonc.2019.00640

Peng, H. and Lu, Y. 2012. Model selection in linear mixed effect models. Journal of Multivariate Analysis 109:109–129. https://doi.org/10.1016/j.jmva.2012.02.005

Popitsch, N., Preuner, S., and Lion, T. 2020. Nanopanel2 calls phased low-frequency variants in Nanopore panel sequencing data. bioRxiv 2020.11.06.370858. https://doi.org/10.1101/2020.11.06.370858

Poplin, R., Chang, P. C., Alexander, D., Schwartz, S., Colthurst, T., Ku, A., Newburger, D., Dijamco, J., Nguyen, N., Afshar, P. T., Gross, S. S., Dorfman, L., McLean, C. Y., and Depristo, M. A. 2018. A universal snp and small-indel variant caller using deep neural networks. Nature Biotechnology 36(10):983. https://doi.org/10.1038/nbt.4235

Purevsuren, J., Fukao, T., Hasegawa, Y., Kobayashi, H., Li, H., Mushimoto, Y., Fukuda, S., and Yamaguchi, S. 2009. Clinical and molecular aspects of Japanese patients with mitochondrial trifunctional protein deficiency. Molecular Genetics and Metabolism 98(4):372–377. https://doi.org/10.1016/j.ymgme.2009.07.011

Quail, M. A., Smith, M., Coupland, P., Otto, T. D., Harris, S. R., Connor, T. R., Bertoni, A., Swerdlow, H. P., and Gu, Y. 2012. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13:341. https://doi.org/10.1186/1471-2164-13-341

Di Resta, C. and Ferrari, M. 2018. Next generation sequencing: from research area to clinical practice. EJIFCC 29(3):215–220.

Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., and Müller, M. 2011. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12(1):77. https://doi.org/10.1186/1471-2105-12-77

Simon, D. K., Pulst, S. M., Sutton, J. P., Browne, S. E., Beal, M. F., and Johns, D. R. 1999. Familial multisystem degeneration with parkinsonism associated with the 11778 mitochondrial DNA mutation. Neurology 53(8):1787–1793. https://doi.org/10.1212/wnl.53.8.1787

Slatko, B., Gardner, A., and Ausubel, F. 2018. Overview of next generation sequencing technologies. Current Protocols in Molecular Biology 122(1):1–15. https://doi.org/doi:10.1002/cpmb.59

Suzuki, A., Suzuki, M., Mizushima-Sugano, J., Frith, M. C., Makałowski, W., Kohno, T., Sugano, S., Tsuchihara, K., and Suzuki, Y. 2017. Sequencing and phasing cancer mutations in lung cancers using a long-read portable sequencer. DNA Research 24(6):585–596. https://doi.org/10.1093/dnares/dsx027

Tham, C. Y., Tirado-Magallanes, R., Goh, Y., Fullwood, M. J., Koh, B. T. H., Wang, W., Ng, C.,H., Chng, W. J., Thiery, A., Tenen, D. G., and Benoukraf, T. 2020. NanoVar: accurate characterization of patients’ genomic structural variants using low-depth nanopore sequencing. Genome Biology 21(1):56. https://doi.org/10.1186/s13059-020-01968-7

Toncheva, D., Serbezov, D., Karachanak-Yankova, S., and Nesheva, D. 2020. Ancient mitochondrial DNA pathogenic variants putatively associated with mitochondrial disease. PloS ONE 15(9):e0233666–e0233666. https://doi.org/10.1371/journal.pone.0233666

Tranah, G. J., Nalls, M. A., Katzman, S. M., Yokoyama, J. S., Lam, E. T., Zhao, Y., Mooney, S., Thomas, F., Newman, A. B., Liu, Y., Cummings, S. R., Harris, T. B., and Yaffe, K. 2012. Mitochondrial DNA sequence variation associated with dementia and cognitive function in the elderly. Journal of Alzheimer’s disease 32(2):357–372. https://doi.org/10.3233/JAD-2012-120466

Valentino, R. R., Tamvaka, N., Heckman, M. G., Johnson, P. W., Soto-Beasley, A. I., Walton, R. L., Koga, S., Uitti, R. J., Wszolek, Z. K., Dickson, D. W., and Ross, O. A. 2020. Associations of mitochondrial genomic variation with corticobasal degeneration, progressive supranuclear palsy, and neuropathological tau measures. Acta Neuropathologica Communications 8(1):162. https://doi.org/10.1186/s40478-020-01035-z

Watson, E., Davis, R., and Sue, C. M. 2020. New diagnostic pathways for mitochondrial disease. Journal of Translational Genetics and Genomics 4(3):188–202. https://doi.org/10.20517/jtgg.2020.31

Wickham, H. 2009. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. https://doi.org/10.1007/978-0-387-98141-3

Wood, E., Parker, M. D., Dunning, M. J., Hesketh, S., Wang, D., Pink, R., and Fratter, C. 2019. Clinical long-read sequencing of the human mitochondrial genome for mitochondrial disease diagnostics. bioRxiv 597187. https://doi.org/10.1101/597187

Yao, Y., Nishimura, M., Murayama, K., Kuranobu, N., Tojo, S., Beppu, M., Ishige, T., Itoga, S., Tsuchida, S., Mori, M., Takayanagi, M., Yokoyama, M., Yamagata, K., Kishita, Y., Okazaki, Y., Nomura, F., Matsushita, K., and Tanaka, T. 2019. A simple method for sequencing the whole human mitochondrial genome directly from samples and its application to genetic testing. Scientific Reports 9(1):17411. https://doi.org/10.1038/s41598-019-53449-y

Zascavage, R. R., Thorson, K., and Planz, J. V. 2019. Nanopore sequencing: An enrichment-free alternative to mitochondrial DNA sequencing. Electrophoresis 40(2):272–280. https://doi.org/10.1002/elps.201800083

Zhou, K., Mo, Q., Guo, S., Liu, Y., Yin, C., Ji, X., Guo, X., and Xing, J. 2020. A novel next-generation sequencing-based approach for concurrent detection of mitochondrial DNA Copy number and mutation. The Journal of Molecular Diagnostics 22(12):1408–1418. https://doi.org/10.1016/j.jmoldx.2020.09.005

Published
2021-06-30
How to Cite
Shikov, A., Tsay, V., Fedyakov, M., Eismont, Y., Rudnik, A., Urasov, S., Sherbak, S., & Glotov, O. (2021). The application of Nanopore sequencing for variant calling on the human mitochondrial DNA. Biological Communications, 66(2), 109–123. https://doi.org/10.21638/spbu03.2021.202
Section
Full communications