Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors
被引:1343
作者:
Haghverdi, Laleh
论文数: 0引用数: 0
h-index: 0
机构:
EBI, EMBL, Cambridge, England
Helmholtz Zentrum Munchen, Inst Computat Biol, Munich, GermanyEBI, EMBL, Cambridge, England
Haghverdi, Laleh
[1
,2
]
Lun, Aaron T. L.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Cambridge, Canc Res UK Cambridge Inst, Cambridge, EnglandEBI, EMBL, Cambridge, England
Lun, Aaron T. L.
[3
]
Morgan, Michael D.
论文数: 0引用数: 0
h-index: 0
机构:
Wellcome Trust Sanger Inst, Cambridge, EnglandEBI, EMBL, Cambridge, England
Morgan, Michael D.
[4
]
Marioni, John C.
论文数: 0引用数: 0
h-index: 0
机构:
EBI, EMBL, Cambridge, England
Univ Cambridge, Canc Res UK Cambridge Inst, Cambridge, England
Wellcome Trust Sanger Inst, Cambridge, EnglandEBI, EMBL, Cambridge, England
Marioni, John C.
[1
,3
,4
]
机构:
[1] EBI, EMBL, Cambridge, England
[2] Helmholtz Zentrum Munchen, Inst Computat Biol, Munich, Germany
[3] Univ Cambridge, Canc Res UK Cambridge Inst, Cambridge, England
[4] Wellcome Trust Sanger Inst, Cambridge, England
Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.