A benchmark of batch-effect correction methods for single-cell RNA sequencing data

被引:617
作者
Hoa Thi Nhu Tran [1 ]
Ang, Kok Siong [1 ]
Chevrier, Marion [1 ]
Zhang, Xiaomeng [1 ]
Lee, Nicole Yee Shin [1 ]
Goh, Michelle [1 ]
Chen, Jinmiao [1 ]
机构
[1] ASTAR, Singapore Immunol Network SIgN, 8A Biomed Grove,Immunos Bldg,Level 3, Singapore 138648, Singapore
关键词
Single-cell RNA-seq; Batch correction; Batch effect; Integration; Differential gene expression; EXPRESSION; CLASSIFICATION; MAP;
D O I
10.1186/s13059-019-1850-9
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, we perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal. Results We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression. Conclusion Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.
引用
收藏
页数:32
相关论文
共 47 条
[11]   Integrating single-cell transcriptomic data across different conditions, technologies, and species [J].
Butler, Andrew ;
Hoffman, Paul ;
Smibert, Peter ;
Papalexi, Efthymia ;
Satija, Rahul .
NATURE BIOTECHNOLOGY, 2018, 36 (05) :411-+
[12]  
Goodfellow I.J., 2014, ADV NEUR IN, p1406.2661, DOI DOI 10.1145/3422622
[13]   Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors [J].
Haghverdi, Laleh ;
Lun, Aaron T. L. ;
Morgan, Michael D. ;
Marioni, John C. .
NATURE BIOTECHNOLOGY, 2018, 36 (05) :421-+
[14]   Mapping the Mouse Cell Atlas by Microwell-Seq [J].
Han, Xiaoping ;
Wang, Renying ;
Zhou, Yincong ;
Fei, Lijiang ;
Sun, Huiyu ;
Lai, Shujing ;
Saadatpour, Assieh ;
Zhou, Zimin ;
Chen, Haide ;
Ye, Fang ;
Huang, Daosheng ;
Xu, Yang ;
Huang, Wentao ;
Jiang, Mengmeng ;
Jiang, Xinyi ;
Mao, Jie ;
Chen, Yao ;
Lu, Chenyu ;
Xie, Jin ;
Fang, Qun ;
Wang, Yibin ;
Yue, Rui ;
Li, Tiefeng ;
Huang, He ;
Orkin, Stuart H. ;
Yuan, Guo-Cheng ;
Chen, Ming ;
Guo, Guoji .
CELL, 2018, 172 (05) :1091-+
[15]   Canonical correlation analysis: An overview with application to learning methods [J].
Hardoon, DR ;
Szedmak, S ;
Shawe-Taylor, J .
NEURAL COMPUTATION, 2004, 16 (12) :2639-2664
[16]   Efficient integration of heterogeneous single-cell transcriptomes using Scanorama [J].
Hie, Brian ;
Bryson, Bryan ;
Berger, Bonnie .
NATURE BIOTECHNOLOGY, 2019, 37 (06) :685-+
[17]   COMPARING PARTITIONS [J].
HUBERT, L ;
ARABIE, P .
JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) :193-218
[18]   Adjusting batch effects in microarray expression data using empirical Bayes methods [J].
Johnson, W. Evan ;
Li, Cheng ;
Rabinovic, Ariel .
BIOSTATISTICS, 2007, 8 (01) :118-127
[19]  
Jolliffe I.T., 2002, PRINCIPAL COMPONENT
[20]  
Kharchenko PV, 2014, NAT METHODS, V11, P740, DOI [10.1038/NMETH.2967, 10.1038/nmeth.2967]