ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence

被引：49

作者：

Blanca, Jose M. ^{[1
]}

Pascual, Laura ^{[1
]}

Ziarsolo, Peio ^{[1
]}

Nuez, Fernando ^{[1
]}

Canizares, Joaquin ^{[1
]}

机构：

[1] Univ Politecn Valencia, Inst Conservac & Mejora Agrodiversidad Valenciana, Valencia 46022, Spain

来源：

BMC GENOMICS | 2011年 / 12卷

关键词：

TRANSCRIPTOME; DISCOVERY; FRAMEWORK;

D O I：

10.1186/1471-2164-12-285

中图分类号：

Q81 [生物工程学（生物技术）]; Q93 [微生物学];

学科分类号：

071005 ; 0836 ; 090102 ; 100705 ;

摘要：

Background: The possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. Results: The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP) calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs) were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. Conclusions: ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf.comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin.

引用

页数：8

共 27 条

[1]

[Anonymous], A deep catalog of human genetic variation

[2]

*APPL BIOS, APPL BIOS LIF TECHN

[3] Heart transcriptome of the bank vole (Myodes glareolus): towards understanding the evolutionary variation in metabolic rate [J].

Babik, Wieslaw ;

Stuglik, Michal ;

Qi, Weihong ;

Kuenzli, Marzanna ;

Kuduk, Katarzyna ;

Koteja, Pawel ;

Radwan, Jacek .

BMC GENOMICS, 2010, 11

[4] A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly [J].

Blankenberg, Daniel ;

Taylor, James ;

Schenck, Ian ;

He, Jianbin ;

Zhang, Yi ;

Ghent, Matthew ;

Veeraraghavan, Narayanan ;

Albert, Istvan ;

Miller, Webb ;

Makova, Kateryna D. ;

Hardison, Ross C. ;

Nekrutenko, Anton .

GENOME RESEARCH, 2007, 17 (06) :960-964

[5] Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs [J].

Chevreux, B ;

Pfisterer, T ;

Drescher, B ;

Driesel, AJ ;

Müller, WEG ;

Wetter, T ;

Suhai, S .

GENOME RESEARCH, 2004, 14 (06) :1147-1159

[6] DNA sequence quality trimming and vector removal [J].

Chou, HH ;

Holmes, MH .

BIOINFORMATICS, 2001, 17 (12) :1093-1104

[7]

*CLOVR, AUT SEQ AN YOUR DESK

[8] Sense from sequence reads: methods for alignment and assembly (vol 6, pg S6, 2009) [J].

Flicek, Paul ;

Birney, Ewan .

NATURE METHODS, 2010, 7 (06) :479-479

[9]

*GITH, GITH SOC COD

[10] Amplicon melting analysis with labeled primers: A closed-tube method for differentiating homozygotes and heterozygotes [J].

Gundry, CN ;

Vandersteen, JG ;

Reed, GH ;

Pryor, RJ ;

Chen, J ;

Wittwer, CT .

CLINICAL CHEMISTRY, 2003, 49 (03) :396-406

← 1 2 3 →