Slim-Filter: an interactive windows-based application for illumina genome analyzer data assessment and manipulation

被引:7
作者
Golovko, Georgiy [1 ,2 ]
Khanipov, Kamil [1 ]
Rojas, Mark [1 ,2 ]
Martinez-Alcantara, Antonio [1 ]
Howard, Jesse J. [1 ]
Ballesteros, Efren [1 ]
Gupta, Sharu [1 ]
Widger, William [1 ,3 ]
Fofanov, Yuriy [1 ,2 ,3 ]
机构
[1] Univ Houston, Ctr BioMed & Environm Genom, Houston, TX 77204 USA
[2] Univ Houston, Dept Comp Sci, Houston, TX 77204 USA
[3] Univ Houston, Dept Biol & Biochem, Houston, TX 77204 USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
关键词
Data Manipulation; Illumina Genome Analyzer; Sequencing Instrument; Manipulation Capability; Scanner Calibration;
D O I
10.1186/1471-2105-13-166
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The emergence of Next Generation Sequencing technologies has made it possible for individual investigators to generate gigabases of sequencing data per week. Effective analysis and manipulation of these data is limited due to large file sizes, so even simple tasks such as data filtration and quality assessment have to be performed in several steps. This requires (potentially problematic) interaction between the investigator and a bioinformatics/computational service provider. Furthermore, such services are often performed using specialized computational facilities. Results: We present a Windows-based application, Slim-Filter designed to interactively examine the statistical properties of sequencing reads produced by Illumina Genome Analyzer and to perform a broad spectrum of data manipulation tasks including: filtration of low quality and low complexity reads; filtration of reads containing undesired subsequences (such as parts of adapters and PCR primers used during the sample and sequencing libraries preparation steps); excluding duplicated reads (while keeping each read's copy number information in a specialized data format); and sorting reads by copy numbers allowing for easy access and manual editing of the resulting files. Slim-Filter is organized as a sequence of windows summarizing the statistical properties of the reads. Each data manipulation step has roll-back abilities, allowing for return to previous steps of the data analysis process. Slim-Filter is written in C++ and is compatible with fasta, fastq, and specialized AS file formats presented in this manuscript. Setup files and a user's manual are available for download at the supplementary web site (https://www.bioinfo.uh.edu/Slim_Filter/). Conclusion: The presented Windows-based application has been developed with the goal of providing individual investigators with integrated sequencing reads analysis, curation, and manipulation capabilities.
引用
收藏
页数:4
相关论文
共 9 条
[1]   Finding optimal threshold for correction error reads in DNA assembling [J].
Chin, Francis Y. L. ;
Leung, Henry C. M. ;
Li, Wei-Lin ;
Yiu, Siu-Ming .
BMC BIOINFORMATICS, 2009, 10
[2]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[3]   TagDust-a program to eliminate artifacts from next generation sequencing data [J].
Lassmann, Timo ;
Hayashizaki, Yoshihide ;
Daub, Carsten O. .
BIOINFORMATICS, 2009, 25 (21) :2839-2840
[4]   PIQA: pipeline for Illumina G1 genome analyzer data quality assessment [J].
Martinez-Alcantara, A. ;
Ballesteros, E. ;
Feng, C. ;
Rojas, M. ;
Koshinsky, H. ;
Fofanov, V. Y. ;
Havlak, P. ;
Fofanov, Y. .
BIOINFORMATICS, 2009, 25 (18) :2438-2439
[5]   ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data [J].
Morgan, Martin ;
Anders, Simon ;
Lawrence, Michael ;
Aboyoun, Patrick ;
Pages, Herve ;
Gentleman, Robert .
BIOINFORMATICS, 2009, 25 (19) :2607-2608
[6]   SHREC: a short-read error correction method [J].
Schroeder, Jan ;
Schroeder, Heiko ;
Puglisi, Simon J. ;
Sinha, Ranjan ;
Schmidt, Bertil .
BIOINFORMATICS, 2009, 25 (17) :2157-2163
[7]   Swift: primary data analysis for the Illumina Solexa sequencing platform [J].
Whiteford, Nava ;
Skelly, Tom ;
Curtis, Christina ;
Ritchie, Matt E. ;
Loehr, Andrea ;
Zaranek, Alexander Wait ;
Abnizova, Irina ;
Brown, Clive .
BIOINFORMATICS, 2009, 25 (17) :2194-2199
[8]  
Wong Thomas K F, 2009, Int J Bioinform Res Appl, V5, P224, DOI 10.1504/IJBRA.2009.024039
[9]   EDAR: An Efficient Error Detection and Removal Algorithm for Next Generation Sequencing Data [J].
Zhao, Xiaohong ;
Palmer, Lance E. ;
Bolanos, Randall ;
Mircean, Cristian ;
Fasulo, Dan ;
Wittenberg, Gayle M. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2010, 17 (11) :1549-1560