scds: computational annotation of doublets in single-cell RNA sequencing data

被引:148
作者
Bais, Abha S. [1 ]
Kostka, Dennis [1 ,2 ,3 ]
机构
[1] Univ Pittsburgh, Sch Med, Dept Dev Biol, Pittsburgh, PA 15201 USA
[2] Univ Pittsburgh, Sch Med, Dept Computat & Syst Biol, Pittsburgh, PA 15201 USA
[3] Univ Pittsburgh, Sch Med, Pittsburgh Ctr Evolutionary Biol & Med, Pittsburgh, PA 15201 USA
基金
美国国家卫生研究院;
关键词
TRANSCRIPTOMICS; CHALLENGES; REVEALS; PACKAGE;
D O I
10.1093/bioinformatics/btz698
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Motivation: Single-cell RNA sequencing (scRNA-seq) technologies enable the study of transcriptional heterogeneity at the resolution of individual cells and have an increasing impact on biomedical research. However, it is known that these methods sometimes wrongly consider two or more cells as single cells, and that a number of so-called doublets is present in the output of such experiments. Treating doublets as single cells in downstream analyses can severely bias a study's conclusions, and therefore computational strategies for the identification of doublets are needed. Results: With scds, we propose two new approaches for in silico doublet identification: Co-expression based doublet scoring (cxds) and binary classification based doublet scoring (bcds). The co-expression based approach, cxds, utilizes binarized (absence/presence) gene expression data and, employing a binomial model for the co-expression of pairs of genes, yields interpretable doublet annotations. bcds, on the other hand, uses a binary classification approach to discriminate artificial doublets from original data. We apply our methods and existing computational doublet identification approaches to four datasets with experimental doublet annotations and find that our methods perform at least as well as the state of the art, at comparably little computational cost. We observe appreciable differences between methods and across datasets and that no approach dominates all others. In summary, scds presents a scalable, competitive approach that allows for doublet annotation of datasets with thousands of cells in a matter of seconds.
引用
收藏
页码:1150 / 1158
页数:9
相关论文
共 40 条
[1]
An Introduction to the Analysis of Single-Cell RNA-Sequencing Data [J].
AlJanahi, Aisha A. ;
Danielsen, Mark ;
Dunbar, Cynthia E. .
MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT, 2018, 10 :189-196
[2]
Cell fixation and preservation for droplet-based single-cell transcriptomics [J].
Alles, Jonathan ;
Karaiskos, Nikos ;
Praktiknjo, Samantha D. ;
Grosswendt, Stefanie ;
Wahle, Philipp ;
Ruffault, Pierre-Louis ;
Ayoub, Salah ;
Schreyer, Luisa ;
Boltengagen, Anastasiya ;
Birchmeier, Carmen ;
Zinzen, Robert ;
Kocks, Christine ;
Rajewsky, Nikolaus .
BMC BIOLOGY, 2017, 15
[3]
[Anonymous], BIORXIV
[4]
[Anonymous], CELL
[5]
[Anonymous], 2016, ARXIV160802148
[6]
[Anonymous], 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, DOI DOI 10.1007/978
[7]
Differentiation dynamics of mammary epithelial cells revealed by single-cell RNA sequencing [J].
Bach, Karsten ;
Pensa, Sara ;
Grzelak, Marta ;
Hadfield, James ;
Adams, David J. ;
Marioni, John C. ;
Khaled, Walid T. .
NATURE COMMUNICATIONS, 2017, 8
[8]
Integrating single-cell transcriptomic data across different conditions, technologies, and species [J].
Butler, Andrew ;
Hoffman, Paul ;
Smibert, Peter ;
Papalexi, Efthymia ;
Satija, Rahul .
NATURE BIOTECHNOLOGY, 2018, 36 (05) :411-+
[9]
Chen T, 2019, XGBOOST EXTREME GRAD
[10]
XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794