IDENTIFICATION AND ANALYSIS OF MULTIGENE FAMILIES BY COMPARISON OF EXON FINGERPRINTS

被引:19
作者
BROWN, NP [1 ]
WHITTAKER, AJ [1 ]
NEWELL, WR [1 ]
RAWLINGS, CJ [1 ]
BECK, S [1 ]
机构
[1] IMPERIAL CANC RES FUND,DNA SEQUENCING LAB,LONDON WC2A 3PX,ENGLAND
关键词
GENE ORGANIZATION; SEQUENCE ALIGNMENT; SEQUENCE COMPARISON; DYNAMIC PROGRAMMING; HOMOLOGY SEARCHING;
D O I
10.1006/jmbi.1995.0301
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Gene families are often recognised by sequence homology using similarity searching to find relationships, however, genomic sequence data provides gene architectural information not used by conventional search methods. In particular, intron positions and phases are expected to be relatively conserved features, because mis-splicing and reading frame shifts should be selected against. A fast search technique capable of detecting possible weak sequence homologies apparent at the intron/exon level of gene organization is presented for comparing spliceosomal genes and gene fragments. FINEX compares strings of exons delimited by intron/exon boundary positions and intron phases (exon fingerprint) using a global dynamic programming algorithm with a combined intron phase identity and exon size dissimilarity score. Exon fingerprints are typically two orders of magnitude smaller than their nucleic acid sequence counterparts giving rise to fast search times: a ranked search against a library of 6755 fingerprints for a typical three exon fingerprint completes in under 30 seconds on an ordinary workstation, while a worst case largest fingerprint of 52 exons completes in just over one minute. The short ''sequence'' length of exon fingerprints in comparisons is compensated for by the large exon alphabet compounded of intron phase types and a wide range of exon sizes, the latter contributing the most information to alignments. FINEX performs better in some searches than conventional methods, finding matches with similar exon organization, but low sequence homology. A search using a human serum albumin finds all members of the multigene family in the FINEX database at the top of the search ranking, despite very low amino acid percentage identities between family members. The method should complement conventional sequence searching and alignment techniques, offering a means of identifying otherwise hard to detect homologies where genomic data are available.
引用
收藏
页码:342 / 359
页数:18
相关论文
共 51 条
[1]   ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES [J].
ALTSCHUL, SF ;
BOGUSKI, MS ;
GISH, W ;
WOOTTON, JC .
NATURE GENETICS, 1994, 6 (02) :119-129
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   FLEXIBLE PROTEIN-SEQUENCE PATTERNS - A SENSITIVE METHOD TO DETECT WEAK STRUCTURAL SIMILARITIES [J].
BARTON, GJ ;
STERNBERG, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 212 (02) :389-402
[4]   DNA-SEQUENCE ANALYSIS OF 66 KB OF THE HUMAN MHC CLASS-II REGION ENCODING A CLUSTER OF GENES FOR ANTIGEN PROCESSING [J].
BECK, S ;
KELLY, A ;
RADLEY, E ;
KHURSHID, F ;
ALDERTON, RP ;
TROWSDALE, J .
JOURNAL OF MOLECULAR BIOLOGY, 1992, 228 (02) :433-441
[5]   EXONS - PRESENT FROM THE BEGINNING [J].
BLAKE, C .
NATURE, 1983, 306 (5943) :535-537
[6]   DO GENES-IN-PIECES IMPLY PROTEINS-IN-PIECES [J].
BLAKE, CCF .
NATURE, 1978, 273 (5660) :267-267
[7]  
BROWN JR, 1976, FED PROC, V35, P2141
[8]   SPLICE JUNCTIONS - ASSOCIATION WITH VARIATION IN PROTEIN-STRUCTURE [J].
CRAIK, CS ;
RUTTER, WJ ;
FLETTERICK, R .
SCIENCE, 1983, 220 (4602) :1125-1129
[9]  
DAYHOFF MO, 1983, METHOD ENZYMOL, V91, P524
[10]  
DAYHOFF MO, 1978, ATLAS PROTEIN SEQUEN, V5, P1