Comparing de novo assemblers for 454 transcriptome data

被引:200
作者
Kumar, Sujai [1 ]
Blaxter, Mark L. [1 ]
机构
[1] Univ Edinburgh, Inst Evolutionary Biol, Edinburgh EH9 3JT, Midlothian, Scotland
关键词
SEQUENCE; GENERATION; ALIGNMENT; GENOME; BLAST;
D O I
10.1186/1471-2164-11-571
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 [微生物学]; 090105 [作物生产系统与生态工程];
摘要
Background: Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis. Results: Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs. Conclusions: Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible final product, and this strategy is recommended.
引用
收藏
页数:12
相关论文
共 56 条
[1]
Of Mice, Cattle, and Humans: The Immunology and Treatment of River Blindness [J].
Allen, Judith E. ;
Adjei, Ohene ;
Bain, Odile ;
Hoerauf, Achim ;
Hoffmann, Wolfgang H. ;
Makepeace, Benjamin L. ;
Schulz-Key, Hartwig ;
Tanya, Vincent N. ;
Trees, Alexander J. ;
Wanji, Samuel ;
Taylor, David W. .
PLOS NEGLECTED TROPICAL DISEASES, 2008, 2 (04)
[2]
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]
The cinnamyl alcohol dehydrogenase gene family in Populus: phylogeny, organization, and expression [J].
Barakat, Abdelali ;
Bagniewska-Zadworna, Agnieszka ;
Choi, Alex ;
Plakkat, Urmila ;
DiLoreto, Denis S. ;
Yellanki, Priyadarshini ;
Carlson, John E. .
BMC PLANT BIOLOGY, 2009, 9
[4]
Combining next-generation pyrosequencing with microarray for large scale expression analysis in non-model species [J].
Bellin, Diana ;
Ferrarini, Alberto ;
Chimento, Antonio ;
Kaiser, Olaf ;
Levenkova, Natasha ;
Bouffard, Pascal ;
Delledonne, Massimo .
BMC GENOMICS, 2009, 10
[5]
De novo transcriptome assembly with ABySS [J].
Birol, Inanc ;
Jackman, Shaun D. ;
Nielsen, Cydney B. ;
Qian, Jenny Q. ;
Varhol, Richard ;
Stazyk, Greg ;
Morin, Ryan D. ;
Zhao, Yongjun ;
Hirst, Martin ;
Schein, Jacqueline E. ;
Horsman, Doug E. ;
Connors, Joseph M. ;
Gascoyne, Randy D. ;
Marra, Marco A. ;
Jones, Steven J. M. .
BIOINFORMATICS, 2009, 25 (21) :2872-2877
[6]
Differences in transcription between free-living and CO2-activated third-stage larvae of Haemonchus contortus [J].
Cantacessi, Cinzia ;
Campbell, Bronwyn E. ;
Young, Neil D. ;
Jex, Aaron R. ;
Hall, Ross S. ;
Presidente, Paul J. A. ;
Zawadzki, Jodi L. ;
Zhong, Weiwei ;
Aleman-Meza, Boanerges ;
Loukas, Alex ;
Sternberg, Paul W. ;
Gasser, Robin B. .
BMC GENOMICS, 2010, 11
[7]
Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology [J].
Cheung, Foo ;
Haas, Brian J. ;
Goldberg, Susanne M. D. ;
May, Gregory D. ;
Xiao, Yongli ;
Town, Christopher D. .
BMC GENOMICS, 2006, 7 (1)
[8]
Analysis of the Pythium ultimum transcriptome using Sanger and Pyrosequencing approaches [J].
Cheung, Foo ;
Win, Joe ;
Lang, Jillian M. ;
Hamilton, John ;
Vuong, Hue ;
Leach, Jan E. ;
Kamoun, Sophien ;
Levesque, C. Andre ;
Tisserat, Ned ;
Buell, C. Robin .
BMC GENOMICS, 2008, 9 (1)
[9]
Chevreux B., 1999, Proceedings of the German Conference on Bioinformatics (GCB), V99, P45
[10]
Insights into shell deposition in the Antarctic bivalve Laternula elliptica: gene discovery in the mantle transcriptome using 454 pyrosequencing [J].
Clark, Melody S. ;
Thorne, Michael A. S. ;
Vieira, Florbela A. ;
Cardoso, Joao C. R. ;
Power, Deborah M. ;
Peck, Lloyd S. .
BMC GENOMICS, 2010, 11