FusionQ: a novel approach for gene fusion detection and quantification from paired-end RNA-Seq

被引:20
作者
Liu, Chenglin [1 ,2 ,3 ]
Ma, Jinwen [2 ,3 ]
Chang, ChungChe [4 ]
Zhou, Xiaobo [1 ]
机构
[1] Wake Forest Sch Med, Dept Diagnost Radiol, Winston Salem, NC 27157 USA
[2] Peking Univ, Sch Math Sci, Dept Informat Sci, Beijing 100871, Peoples R China
[3] Peking Univ, LMAM, Beijing 100871, Peoples R China
[4] Univ Cent Florida, Florida Hosp, Dept Pathol, Orlando, FL 32803 USA
来源
BMC BIOINFORMATICS | 2013年 / 14卷
基金
美国国家卫生研究院;
关键词
Fusion detection; chimerical transcripts quantification; EM algorithm; BREAST-CANCER; TRANSCRIPTS; GENOME; DISCOVERY; DNA;
D O I
10.1186/1471-2105-14-193
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Gene fusions, which result from abnormal chromosome rearrangements, are a pathogenic factor in cancer development. The emerging RNA-Seq technology enables us to detect gene fusions and profile their features. Results: In this paper, we proposed a novel fusion detection tool, FusionQ, based on paired-end RNA-Seq data. This tool can detect gene fusions, construct the structures of chimerical transcripts, and estimate their abundances. To confirm the read alignment on both sides of a fusion point, we employed a new approach, "residual sequence extension", which extended the short segments of the reads by aggregating their overlapping reads. We also proposed a list of filters to control the false-positive rate. In addition, we estimated fusion abundance using the Expectation-Maximization algorithm with sparse optimization, and further adopted it to improve the detection accuracy of the fusion transcripts. Simulation was performed by FusionQ and another two stated-of-art fusion detection tools. FusionQ exceeded the other two in both sensitivity and specificity, especially in low coverage fusion detection. Using paired-end RNA-Seq data from breast cancer cell lines, FusionQ detected both the previously reported and new fusions. FusionQ reported the structures of these fusions and provided their expressions. Some highly expressed fusion genes detected by FusionQ are important biomarkers in breast cancer. The performances of FusionQ on cancel line data still showed better specificity and sensitivity in the comparison with another two tools. Conclusions: FusionQ is a novel tool for fusion detection and quantification based on RNA-Seq data. It has both good specificity and sensitivity performance. FusionQ is free and available at http://www.wakehealth.edu/CTSB/Software/Software.htm.
引用
收藏
页数:11
相关论文
共 21 条
[1]  
[Anonymous], 2002, GENE CHROMOSOME CANC, V35, P311
[2]   Cloning of BCAS3 (17q23) and BCAS4 (20q13) genes that undergo amplification, overexpression, and fusion in breast cancer [J].
Bärlund, M ;
Monni, O ;
Weaver, JD ;
Kauraniemi, P ;
Sauter, G ;
Heiskanen, M ;
Kallioniemi, OP ;
Kallioniemi, A .
GENES CHROMOSOMES & CANCER, 2002, 35 (04) :311-317
[3]   Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript [J].
Benelli, Matteo ;
Pescucci, Chiara ;
Marseglia, Giuseppina ;
Severgnini, Marco ;
Torricelli, Francesca ;
Magi, Alberto .
BIOINFORMATICS, 2012, 28 (24) :3232-3239
[4]   Identification of fusion genes in breast cancer by paired-end RNA-sequencing [J].
Edgren, Henrik ;
Murumagi, Astrid ;
Kangaspeska, Sara ;
Nicorici, Daniel ;
Hongisto, Vesa ;
Kleivi, Kristine ;
Rye, Inga H. ;
Nyberg, Sandra ;
Wolf, Maija ;
Borresen-Dale, Anne-Lise ;
Kallioniemi, Olli .
GENOME BIOLOGY, 2011, 12 (01)
[5]   The UCSC Genome Browser database: update 2011 [J].
Fujita, Pauline A. ;
Rhead, Brooke ;
Zweig, Ann S. ;
Hinrichs, Angie S. ;
Karolchik, Donna ;
Cline, Melissa S. ;
Goldman, Mary ;
Barber, Galt P. ;
Clawson, Hiram ;
Coelho, Antonio ;
Diekhans, Mark ;
Dreszer, Timothy R. ;
Giardine, Belinda M. ;
Harte, Rachel A. ;
Hillman-Jackson, Jennifer ;
Hsu, Fan ;
Kirkup, Vanessa ;
Kuhn, Robert M. ;
Learned, Katrina ;
Li, Chin H. ;
Meyer, Laurence R. ;
Pohl, Andy ;
Raney, Brian J. ;
Rosenbloom, Kate R. ;
Smith, Kayla E. ;
Haussler, David ;
Kent, W. James .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D876-D882
[6]   TopHat-Fusion: an algorithm for discovery of novel fusion transcripts [J].
Kim, Daehwan ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2011, 12 (08)
[7]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[8]   RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome [J].
Li, Bo ;
Dewey, Colin N. .
BMC BIOINFORMATICS, 2011, 12
[9]   RNA-Seq gene expression estimation with read mapping uncertainty [J].
Li, Bo ;
Ruotti, Victor ;
Stewart, Ron M. ;
Thomson, James A. ;
Dewey, Colin N. .
BIOINFORMATICS, 2010, 26 (04) :493-500
[10]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858