Comparative analysis of methods for batch correction in proteomics — a two-batch case

Authors

  • Katerina Danko Bioinformatics Institute, ul. Kantemirovskaya, 2, Saint Petersburg, 197342, Russian Federation https://orcid.org/0000-0003-3987-2175
  • Lavrentii Danilov Department of Genetics and Biotechnology, Faculty of Biology, Saint Petersburg State University, Universitetskaya nab., 7–9, Saint Petersburg, 199034, Russian Federation https://orcid.org/0000-0002-4479-3095
  • Anna Malashicheva Laboratory of Regenerative Biomedicine, Institute of Cytology, Russian Academy of Sciences, Tikhoretskiy pr., 4, Saint Petersburg, 194064, Russian Federation https://orcid.org/0000-0002-0820-2913
  • Arseniy Lobov Laboratory of Regenerative Biomedicine, Institute of Cytology, Russian Academy of Sciences, Tikhoretskiy pr., 4, Saint Petersburg, 194064, Russian Federation https://orcid.org/0000-0002-0930-1171

DOI:

https://doi.org/10.21638/spbu03.2023.106

Abstract

A proper study design is vital for life science. Any effects unrelated to the studied ones (batch effects) should be avoided. Still, it is not always possible to exclude all batch effects in a complicated omics study. Here we discuss an appropriate way for analysis of proteomics data with an enormous technical batch effect. We re-analyzed the published dataset (PXD032212) with two batches of samples analyzed in two different years. Each batch includes control and differentiated cells. Control and differentiated cells form separate clusters with 209 differentially expressed proteins (DEPs). Nevertheless, the differences between the batches were higher than between the cell types. Therefore, the analysis of only one of the batches gives 276 or 290 DEPs. Then we compared the efficiency of five methods for batch correction. ComBat was the most effective method for batch effect correction, and the analysis of the corrected dataset revealed 406 DEPs.

Keywords:

batch effect, proteomics, bioinformatics, batch effect correction

Downloads

Download data is not yet available.
 

References

Čuklina, J., Lee, C. H., Williams, E. G., Sajic, T., Collins, B. C., Rodríguez Martínez, M., Sharma, V. S., Wendt, F., Goetze, S., Keele, G. R., and Wollscheid, B. 2021. Diagnostics and correction of batch effects in large‐scale proteomic studies: a tutorial. Molecular Systems Biology 17(8):10240. https://doi.org/10.15252/msb.202110240

Fei, T. and Yu, T. 2020. scBatch: batch-effect correction of RNA-seq data through sample distance matrix adjustment. Bioinformatics 36(10):3115–3123. https://doi.org/10.1093/bioinformatics/btaa097

Goh, W. W. B., Wang, W., and Wong, L. 2017. Why batch effects matter in omics data, and how to avoid them. Trends in Biotechnology 35(6):498–507. https://doi.org/10.1016/j.tibtech.2017.02.012

Hornung, R. and Causeur, D. 2016. bapred: Batch effect removal and addon normalization (in phenotype prediction using gene data). Stanford. Department of Statistics: Technical Reports. No. 19.

Jiang, F., Liu, Q., Li, Q., Zhang, S., Qu, X., Zhu, J., Zhong, G., and Huang, M. 2020. Signal drift in liquid chromatography tandem mass spectrometry and its internal standard calibration strategy for quantitative analysis. Analytical Chemistry 92(11):7690–7698. https://doi.org/10.1021/acs.analchem.0c00633

Johnson, W. E., Li, C., and Rabinovic, A. 2007. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1):118–127. https://doi.org/10.1093/biostatistics/kxj037

Kiselev, V. Y., Kirschner, K., Schaub, M. T., Andrews, T., Yiu, A., Chandra, T., Natarajan, K. N., Reik, W., Barahona, M., Green, A. R., and Hemberg, M. 2017. SC3: consensus clustering of single-cell RNA-seq data. Nature Methods 14(5):483–486. https://doi.org/10.1038/nmeth.4236

Leek, J. T., Johnson, W. E., Parker, H. S., Fertig, E. J., Jaffe, A. E., Zhang, Y., Storey, J. D., and Torres, L. C. 2021. sva: Surrogate variable analysis. 2020. R package version, 3(0).

Muntel, J., Kirkpatrick, J., Bruderer, R., Huang, T., Vitek, O., Ori, A., and Reiter, L. 2019. Comparison of protein quantification in a complex background by DIA and TMT workflows with fixed instrument time. Journal of Proteome Research 18(3):1340–1351. https://doi.org/10.1021/acs.jproteome.8b00898

Oytam, Y., Sobhanmanesh, F., Duesing, K., Bowden, J. C., Osmond-McLeod, M., and Ross, J. 2016. Risk-conscious correction of batch effects: maximising information extraction from high-throughput genomic datasets. BMC Bioinformatics 17(1):1–17. https://doi.org/10.1186/s12859-016-1212-5

Ritchie, M. E., Phipson, B., Wu, D. I., Hu, Y., Law, C. W., Shi, W., and Smyth, G. K. 2015. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7):e47–e47. https://doi.org/10.1093/nar/gkv007

Rohart, F., Gautier, B., Singh, A., and Lê Cao, K. A. 2017. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Computational Biology 13(11):e1005752. https://doi.org/10.1371/journal.pcbi.1005752

Rosenberger, G., Ludwig, C., Röst, H. L., Aebersold, R., and Malmström, L. 2014. aLFQ: an R-package for estimating absolute protein quantities from label-free LC-MS/MS proteomics data. Bioinformatics 30(17):2511–2513. https://doi.org/10.1093/bioinformatics/btu200

Semenova, D., Zabirnyk, A., Lobov, A., Boyarskaya, N., Kachanova, O., Uspensky, V., Zainullina, B., Denisov, E., Gerashchenko, T., Kvitting, J. P. E., and Kaljusto, M. L. 2022. Multi-omics of in vitro aortic valve calcification. Frontiers in Cardiovascular Medicine 9. https://doi.org/10.3389/fcvm.2022.1043165

Shaham, U., Stanton, K. P., Zhao, J., Li, H., Raddassi, K., Montgomery, R., and Kluger, Y. 2017. Removal of batch effects using distribution-matching residual networks. Bioinformatics 33(16):2539–2546. https://doi.org/10.1093/bioinformatics/btx196

Teo, G., Kim, S., Tsou, C. C., Collins, B., Gingras, A. C., Nesvizhskii, A. I., and Choi, H. 2015. mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry. Journal of Proteomics 129:108–120. https://doi.org/10.1016/j.jprot.2015.09.013

Zhang, Y., Parmigiani, G., and Johnson, W. E. 2020. ComBatseq: batch effect adjustment for RNA-seq count data. NAR Genomics and Bioinformatics 2(3):lqaa078. https://doi.org/10.1093/nargab/lqaa078

Downloads

Published

2023-05-02

How to Cite

Danko, K., Danilov, L., Malashicheva, A., & Lobov, A. (2023). Comparative analysis of methods for batch correction in proteomics — a two-batch case. Biological Communications, 68(1), 56–61. https://doi.org/10.21638/spbu03.2023.106

Issue

Section

Brief communications

Categories