Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study

被引:391
作者
Zhao, Qiong-Yi [1 ]
Wang, Yi [2 ]
Kong, Yi-Meng [1 ]
Luo, Da [3 ]
Li, Xuan [1 ]
Hao, Pei [4 ]
机构
[1] Chinese Acad Sci, Shanghai Inst Biol Sci, Inst Plant Physiol & Ecol, Key Lab Synthet Biol, Shanghai 200032, Peoples R China
[2] E China Normal Univ, Inst Software Engn, Inst Mass Comp, Shanghai 200062, Peoples R China
[3] Sun Yat Sen Univ, State Key Lab Biocontrol, Guangzhou 510275, Guangdong, Peoples R China
[4] Shanghai Ctr Bioinformat Technol, Shanghai 200235, Peoples R China
来源
BMC BIOINFORMATICS | 2011年 / 12卷
关键词
ALIGNMENT; ULTRAFAST; RESOURCE; CANCER; GENOME; TOOL;
D O I
10.1186/1471-2105-12-S14-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data. Results: To reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including k-mer values, genome complexity, coverage depth, directional reads, etc. Seven program conditions, four single k-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large k-mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies. Conclusions: Our work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting de novo assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. De novo assembly of C. sinensis transcriptome was greatly improved using some optimized methods.
引用
收藏
页数:12
相关论文
共 23 条
[1]  
[Anonymous], 2011, NAT BIOTECHNOL
[2]   De novo transcriptome assembly with ABySS [J].
Birol, Inanc ;
Jackman, Shaun D. ;
Nielsen, Cydney B. ;
Qian, Jenny Q. ;
Varhol, Richard ;
Stazyk, Greg ;
Morin, Ryan D. ;
Zhao, Yongjun ;
Hirst, Martin ;
Schein, Jacqueline E. ;
Horsman, Doug E. ;
Connors, Joseph M. ;
Gascoyne, Randy D. ;
Marra, Marco A. ;
Jones, Steven J. M. .
BIOINFORMATICS, 2009, 25 (21) :2872-2877
[3]   De Novo Assembly of Chickpea Transcriptome Using Short Reads for Gene Discovery and Marker Identification [J].
Garg, Rohini ;
Patel, Ravi K. ;
Tyagi, Akhilesh K. ;
Jain, Mukesh .
DNA RESEARCH, 2011, 18 (01) :53-63
[4]   The developmental transcriptome of Drosophila melanogaster [J].
Graveley, Brenton R. ;
Brooks, Angela N. ;
Carlson, JosephW. ;
Duff, Michael O. ;
Landolin, Jane M. ;
Yang, Li ;
Artieri, Carlo G. ;
van Baren, Marijke J. ;
Boley, Nathan ;
Booth, Benjamin W. ;
Brown, James B. ;
Cherbas, Lucy ;
Davis, Carrie A. ;
Dobin, Alex ;
Li, Renhua ;
Lin, Wei ;
Malone, John H. ;
Mattiuzzo, Nicolas R. ;
Miller, David ;
Sturgill, David ;
Tuch, Brian B. ;
Zaleski, Chris ;
Zhang, Dayu ;
Blanchette, Marco ;
Dudoit, Sandrine ;
Eads, Brian ;
Green, Richard E. ;
Hammonds, Ann ;
Jiang, Lichun ;
Kapranov, Phil ;
Langton, Laura ;
Perrimon, Norbert ;
Sandler, Jeremy E. ;
Wan, Kenneth H. ;
Willingham, Aarron ;
Zhang, Yu ;
Zou, Yi ;
Andrews, Justen ;
Bickel, Peter J. ;
Brenner, Steven E. ;
Brent, Michael R. ;
Cherbas, Peter ;
Gingeras, Thomas R. ;
Hoskins, Roger A. ;
Kaufman, Thomas C. ;
Oliver, Brian ;
Celniker, Susan E. .
NATURE, 2011, 471 (7339) :473-479
[5]   The KEGG resource for deciphering the genome [J].
Kanehisa, M ;
Goto, S ;
Kawashima, S ;
Okuno, Y ;
Hattori, M .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D277-D280
[6]   Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing [J].
Kannan, Kalpana ;
Wang, Liguo ;
Wang, Jianghua ;
Ittmann, Michael M. ;
Li, Wei ;
Yen, Laising .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (22) :9172-9177
[7]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]
[8]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[9]   The developmental dynamics of the maize leaf transcriptome [J].
Li, Pinghua ;
Ponnala, Lalit ;
Gandotra, Neeru ;
Wang, Lin ;
Si, Yaqing ;
Tausta, S. Lori ;
Kebrom, Tesfamichael H. ;
Provart, Nicholas ;
Patel, Rohan ;
Myers, Christopher R. ;
Reidel, Edwin J. ;
Turgeon, Robert ;
Liu, Peng ;
Sun, Qi ;
Nelson, Timothy ;
Brutnell, Thomas P. .
NATURE GENETICS, 2010, 42 (12) :1060-U51
[10]   SOAP2: an improved ultrafast tool for short read alignment [J].
Li, Ruiqiang ;
Yu, Chang ;
Li, Yingrui ;
Lam, Tak-Wah ;
Yiu, Siu-Ming ;
Kristiansen, Karsten ;
Wang, Jun .
BIOINFORMATICS, 2009, 25 (15) :1966-1967