Querying the public databases for sequences using complex keywords contained in the feature lines

被引:5
作者
Croce, O [1 ]
Lamarre, M
Christen, R
机构
[1] CNRS, UMR 6543, Lab Biol Virtuelle, F-06108 Nice, France
[2] Univ Nice, Ctr Biochim, F-06108 Nice, France
关键词
D O I
10.1186/1471-2105-7-45
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: High throughput technologies often require the retrieval of large data sets of sequences. Retrieval of EMBL or GenBank entries using keywords is easy using tools such as ACNUC, Entrez or SRS, but has some limitations, in particular when querying with complex keywords. Results: We show that Entrez has severe limitations with respect to retrieving subsequences. SRS works well with simple keywords but not with keywords composed of several terms, and has problems with complex queries. ACNUC works well, but does not allow precise queries in the Feature qualifiers. We developed specific Perl scripts to precisely retrieve subsequences as defined by complex descriptors in the Features qualifiers of the EMBL entries. We improved parts of the bioPerl library to allow parsing of large data files, and we embedded these scripts in a user friendly interface ( OS independent) for easy use. Conclusion: Although not as fast as the public tools that use prebuilt indexes, parsing the complete entries using a script is often necessary in order to retrieve the exact data searched for. Embedding in a user friendly interface allows biologists to use the scripts, which can easily be modified, if necessary, by bioinformaticians for unforeseen needs.
引用
收藏
页数:6
相关论文
共 19 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Fungal diversity in rock beneath a crustose lichen as revealed by molecular markers [J].
Bjelland, T ;
Ekman, S .
MICROBIAL ECOLOGY, 2005, 49 (04) :598-603
[3]   16S-23S rRNA gene internal transcribed spacer sequences for analysis of the phylogenetic relationships among species of the genus Porphyromonas [J].
Conrads, G ;
Citron, DM ;
Tyrrell, KL ;
Horz, HP ;
Goldstein, EJC .
INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY, 2005, 55 :607-613
[4]   GeneRecords: a relational database for GenBank flat file parsing and data manipulation in personal computers [J].
D'Addabbo, P ;
Lenzi, L ;
Facchin, F ;
Casadei, R ;
Canaider, S ;
Vitale, L ;
Frabetti, F ;
Carinci, P ;
Zannotti, M ;
Strippoli, P .
BIOINFORMATICS, 2004, 20 (16) :2883-2885
[5]  
ETZOLD T, 1993, COMPUT APPL BIOSCI, V9, P49
[6]  
GOUY M, 1985, COMPUT APPL BIOSCI, V1, P167
[7]   Assessment of ribosomal large-subunit D1-D2, internal transcribed spacer 1, and internal transcribed spacer 2 regions as targets for molecular identification of medically important Aspergillus species [J].
Hinrikson, HP ;
Hurst, SF ;
Lott, TJ ;
Warnock, DW ;
Morrison, CJ .
JOURNAL OF CLINICAL MICROBIOLOGY, 2005, 43 (05) :2092-2103
[8]   Fast protocols for the 5S rDNA and ITS-2 based identification of Oenococcus oeni [J].
Hirschhäuser, S ;
Fröhlich, J ;
Gneipel, A ;
Schönig, I ;
König, H .
FEMS MICROBIOLOGY LETTERS, 2005, 244 (01) :165-171
[9]   Identification of medically important molds by an oligonucleotide array [J].
Hsiao, CR ;
Huang, LY ;
Bouchara, JP ;
Barton, R ;
Li, HC ;
Chang, TC .
JOURNAL OF CLINICAL MICROBIOLOGY, 2005, 43 (08) :3760-3768
[10]   Use of PCR targeting of internal transcribed spacer regions and single-stranded conformation polymorphism analysis of sequence variation in different regions of rRNA genes in fungi for rapid diagnosis of mycotic keratitis [J].
Kumar, M ;
Shukla, PK .
JOURNAL OF CLINICAL MICROBIOLOGY, 2005, 43 (02) :662-668