Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors

被引:1343
作者
Haghverdi, Laleh [1 ,2 ]
Lun, Aaron T. L. [3 ]
Morgan, Michael D. [4 ]
Marioni, John C. [1 ,3 ,4 ]
机构
[1] EBI, EMBL, Cambridge, England
[2] Helmholtz Zentrum Munchen, Inst Computat Biol, Munich, Germany
[3] Univ Cambridge, Canc Res UK Cambridge Inst, Cambridge, England
[4] Wellcome Trust Sanger Inst, Cambridge, England
基金
英国惠康基金;
关键词
SEQ; STEM; MAP;
D O I
10.1038/nbt.4091
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.
引用
收藏
页码:421 / +
页数:9
相关论文
共 32 条
[31]   Identification of cell types from single-cell transcriptomes using a novel clustering method [J].
Xu, Chen ;
Su, Zhengchang .
BIOINFORMATICS, 2015, 31 (12) :1974-1980
[32]  
ZHENG GX, 2017, NATURE, V0008