PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data

被引:18
作者
Chiu, Kuo Ping
Wong, Chee-Hong
Chen, Qiongyu
Ariyaratne, Pramila
Ooi, Hong Sain
Wei, Chia-Lin
Sung, Wing-Kin Ken
Ruan, Yijun
机构
[1] Genome Inst Singapore, Genome Technol & Biol Grp, Singapore 138672, Singapore
[2] Natl Univ Singapore, Dept Comp Sci, Singapore 117543, Singapore
[3] Bioinformat Inst, Singapore 138671, Singapore
[4] Genome Inst Singapore, Informat & Math Sci Grp, Singapore 138672, Singapore
关键词
D O I
10.1186/1471-2105-7-390
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: We recently developed the Paired End diTag ( PET) strategy for efficient characterization of mammalian transcriptomes and genomes. The paired end nature of short PET sequences derived from long DNA fragments raised a new set of bioinformatics challenges, including how to extract PETs from raw sequence reads, and correctly yet efficiently map PETs to reference genome sequences. To accommodate and streamline data analysis of the large volume PET sequences generated from each PET experiment, an automated PET data process pipeline is desirable. Results: We designed an integrated computation program package, PET-Tool, to automatically process PET sequences and map them to the genome sequences. The Tool was implemented as a web-based application composed of four modules: the Extractor module for PET extraction; the Examiner module for analytic evaluation of PET sequence quality; the Mapper module for locating PET sequences in the genome sequences; and the ProjectManager module for data organization. The performance of PET-Tool was evaluated through the analyses of 2.7 million PET sequences. It was demonstrated that PET-Tool is accurate and efficient in extracting PET sequences and removing artifacts from large volume dataset. Using optimized mapping criteria, over 70% of quality PET sequences were mapped specifically to the genome sequences. With a 2.4 GHz LINUX machine, it takes approximately six hours to process one million PETs from extraction to mapping. Conclusion: The speed, accuracy, and comprehensiveness have proved that PET-Tool is an important and useful component in PET experiments, and can be extended to accommodate other related analyses of paired-end sequences. The Tool also provides user-friendly functions for data quality check and system for multi-layer data management.
引用
收藏
页数:11
相关论文
共 16 条
[1]   TAGmapper: A web-based tool for mapping SAGE tags [J].
Bala, P ;
Georgantas, RW ;
Sudhir, D ;
Suresh, M ;
Shanker, K ;
Vrushabendra, BM ;
Civin, CI ;
Pandey, A .
GENE, 2005, 364 :123-129
[2]   The transcriptional landscape of the mammalian genome [J].
Carninci, P ;
Kasukawa, T ;
Katayama, S ;
Gough, J ;
Frith, MC ;
Maeda, N ;
Oyama, R ;
Ravasi, T ;
Lenhard, B ;
Wells, C ;
Kodzius, R ;
Shimokawa, K ;
Bajic, VB ;
Brenner, SE ;
Batalov, S ;
Forrest, ARR ;
Zavolan, M ;
Davis, MJ ;
Wilming, LG ;
Aidinis, V ;
Allen, JE ;
Ambesi-Impiombato, X ;
Apweiler, R ;
Aturaliya, RN ;
Bailey, TL ;
Bansal, M ;
Baxter, L ;
Beisel, KW ;
Bersano, T ;
Bono, H ;
Chalk, AM ;
Chiu, KP ;
Choudhary, V ;
Christoffels, A ;
Clutterbuck, DR ;
Crowe, ML ;
Dalla, E ;
Dalrymple, BP ;
de Bono, B ;
Della Gatta, G ;
di Bernardo, D ;
Down, T ;
Engstrom, P ;
Fagiolini, M ;
Faulkner, G ;
Fletcher, CF ;
Fukushima, T ;
Furuno, M ;
Futaki, S ;
Gariboldi, M .
SCIENCE, 2005, 309 (5740) :1559-1563
[3]   The ENCODE (ENCyclopedia of DNA elements) Project [J].
Feingold, EA ;
Good, PJ ;
Guyer, MS ;
Kamholz, S ;
Liefer, L ;
Wetterstrand, K ;
Collins, FS ;
Gingeras, TR ;
Kampa, D ;
Sekinger, EA ;
Cheng, J ;
Hirsch, H ;
Ghosh, S ;
Zhu, Z ;
Pate, S ;
Piccolboni, A ;
Yang, A ;
Tammana, H ;
Bekiranov, S ;
Kapranov, P ;
Harrison, R ;
Church, G ;
Struhl, K ;
Ren, B ;
Kim, TH ;
Barrera, LO ;
Qu, C ;
Van Calcar, S ;
Luna, R ;
Glass, CK ;
Rosenfeld, MG ;
Guigo, R ;
Antonarakis, SE ;
Birney, E ;
Brent, M ;
Pachter, L ;
Reymond, A ;
Dermitzakis, ET ;
Dewey, C ;
Keefe, D ;
Denoeud, F ;
Lagarde, J ;
Ashurst, J ;
Hubbard, T ;
Wesselink, JJ ;
Castelo, R ;
Eyras, E ;
Myers, RM ;
Sidow, A ;
Batzoglou, S .
SCIENCE, 2004, 306 (5696) :636-640
[4]   5′-end SAGE for the analysis of transcriptional start sites [J].
Hashimoto, S ;
Suzuki, Y ;
Kasai, Y ;
Morohoshi, K ;
Yamada, T ;
Sese, J ;
Morishita, S ;
Sugano, S ;
Matsushima, K .
NATURE BIOTECHNOLOGY, 2004, 22 (09) :1146-1149
[5]   SAGEmap: A public gene expression resource [J].
Lash, AE ;
Tolstoshev, CM ;
Wagner, L ;
Schuler, GD ;
Strausberg, RL ;
Riggins, GJ ;
Altschul, SF .
GENOME RESEARCH, 2000, 10 (07) :1051-1060
[6]   The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells [J].
Loh, YH ;
Wu, Q ;
Chew, JL ;
Vega, VB ;
Zhang, WW ;
Chen, X ;
Bourque, G ;
George, J ;
Leong, B ;
Liu, J ;
Wong, KY ;
Sung, KW ;
Lee, CWH ;
Zhao, XD ;
Chiu, KP ;
Lipovich, L ;
Kuznetsov, VA ;
Robson, P ;
Stanton, LW ;
Wei, CL ;
Ruan, YJ ;
Lim, B ;
Ng, HH .
NATURE GENETICS, 2006, 38 (04) :431-440
[7]   Nucleotide frequency variation across human genes [J].
Louie, E ;
Ott, J ;
Majewski, J .
GENOME RESEARCH, 2003, 13 (12) :2594-2601
[8]   Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation [J].
Ng, P ;
Wei, CL ;
Sung, WK ;
Chiu, KP ;
Lipovich, L ;
Ang, CC ;
Gupta, S ;
Shahab, A ;
Ridwan, A ;
Wong, CH ;
Liu, ET ;
Ruan, Y .
NATURE METHODS, 2005, 2 (02) :105-111
[9]   Using the transcriptome to annotate the genome [J].
Saha, S ;
Sparks, AB ;
Rago, C ;
Akmaev, V ;
Wang, CJ ;
Vogelstein, B ;
Kinzler, KW ;
Velculescu, VE .
NATURE BIOTECHNOLOGY, 2002, 20 (05) :508-512
[10]   Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage [J].
Shiraki, T ;
Kondo, S ;
Katayama, S ;
Waki, K ;
Kasukawa, T ;
Kawaji, H ;
Kodzius, R ;
Watahiki, A ;
Nakamura, M ;
Arakawa, T ;
Fukuda, S ;
Sasaki, D ;
Podhajska, A ;
Harbers, M ;
Kawai, J ;
Carninci, P ;
Hayashizaki, Y .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (26) :15776-15781