Systematic recovery and analysis of full-ORF human cDNA clones

被引:20
作者
Baross, A [1 ]
Butterfield, YSN [1 ]
Coughlin, SM [1 ]
Zeng, T [1 ]
Griffith, M [1 ]
Griffith, OL [1 ]
Petrescu, AS [1 ]
Smailus, DE [1 ]
Khattra, J [1 ]
McDonald, HL [1 ]
McKay, SJ [1 ]
Moksa, M [1 ]
Holt, RA [1 ]
Marra, MA [1 ]
机构
[1] British Columbia Canc Agcy, Genome Sci Ctr, Vancouver, BC V5Z 4E6, Canada
关键词
D O I
10.1101/gr.2473704
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Mammalian Gene Collection (MGC) consortium (http://mgc.nci.nih.gov) seeks to establish publicly available collections of full-ORF cDNAs for several organisms of significance to biomedical research, including human. To date over 15,200 human cDNA clones containing full-length open reading frames (ORFs) have been identified via systematic expressed sequence tag (EST) analysis of a diverse set of cDNA libraries; however, further systematic EST analysis is no longer an efficient method for identifying new cDNAs. As part of our involvement in the MGC program, we have developed a scalable method for targeted recovery of cDNA clones to facilitate recovery of genes absent from the MGC collection. First, cDNA is synthesized from various RNAs, followed by polymerase chain reaction (PCR) amplification of transcripts ill 96-well plates using gene-specific primer pairs flanking the ORFs. Amplicons are cloned into a sequencing vector, and full-length sequences are obtained. Sequences are processed and assembled using Phred and Phrap, and analyzed using Consed and a number of bioinformatics methods we have developed. Sequences are compared with the Reference Sequence (RefSeq) database, and validation of sequence discrepancies is attempted using other sequence databases including dbEST and dbSNP. Clones with identical sequence to RefSeq or containing only validated changes will become part of the MGC human gene collection. Clones containing novel splice variants or polymorphisms have also been identified. Our approach to clone recovery, applied at large scale, has the potential to recover many and possibly most of the genes absent from the MGC collection.
引用
收藏
页码:2083 / 2092
页数:10
相关论文
共 23 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   PCR AMPLIFICATION OF UP TO 35-KB DNA WITH HIGH-FIDELITY AND HIGH-YIELD FROM LAMBDA-BACTERIOPHAGE TEMPLATES [J].
BARNES, WM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (06) :2216-2220
[3]   Ensembl 2004 [J].
Birney, E ;
Andrews, D ;
Bevan, P ;
Caccamo, M ;
Cameron, G ;
Chen, Y ;
Clarke, L ;
Coates, G ;
Cox, T ;
Cuff, J ;
Curwen, V ;
Cutts, T ;
Down, T ;
Durbin, R ;
Eyras, E ;
Fernandez-Suarez, XM ;
Gane, P ;
Gibbins, B ;
Gilbert, J ;
Hammond, M ;
Hotz, H ;
Iyer, V ;
Kahari, A ;
Jekosch, K ;
Kasprzyk, A ;
Keefe, D ;
Keenan, S ;
Lehvaslaiho, H ;
McVicker, G ;
Melsopp, C ;
Meidl, P ;
Mongin, E ;
Pettett, R ;
Potter, S ;
Proctor, G ;
Rae, M ;
Searle, S ;
Slater, G ;
Smedley, D ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Storey, R ;
Ureta-Vidal, A ;
Woodwark, C ;
Clamp, M ;
Hubbard, T .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D468-D470
[4]   DBEST - DATABASE FOR EXPRESSED SEQUENCE TAGS [J].
BOGUSKI, MS ;
LOWE, TMJ ;
TOLSTOSHEV, CM .
NATURE GENETICS, 1993, 4 (04) :332-333
[5]   An efficient strategy for large-scale high-throughput transposon-mediated sequencing of cDNA clones [J].
Butterfield, YSN ;
Marra, MA ;
Asano, JK ;
Chan, SY ;
Guin, R ;
Krzywinski, MI ;
Lee, SS ;
MacDonald, KWK ;
Mathewson, CA ;
Olson, TE ;
Pandoh, PK ;
Prabhu, AL ;
Schnerch, A ;
Skalska, U ;
Smailus, DE ;
Stott, JM ;
Tsai, MI ;
Yang, GS ;
Zuyderduyn, SD ;
Schein, JE ;
Jones, SJM .
NUCLEIC ACIDS RESEARCH, 2002, 30 (11) :2460-2468
[6]   EFFECTIVE AMPLIFICATION OF LONG TARGETS FROM CLONED INSERTS AND HUMAN GENOMIC DNA [J].
CHENG, S ;
FOCKLER, C ;
BARNES, WM ;
HIGUCHI, R .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (12) :5695-5699
[7]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[8]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[9]   Software for automated analysis of DNA fingerprinting gels [J].
Fuhrmann, DR ;
Krzywinski, MI ;
Chiu, R ;
Saeedi, P ;
Schein, JE ;
Bosdet, IE ;
Chinwalla, A ;
Hillier, LW ;
Waterston, RH ;
McPherson, JD ;
Jones, SJM ;
Marra, MA .
GENOME RESEARCH, 2003, 13 (05) :940-953
[10]   Consed: A graphical tool for sequence finishing [J].
Gordon, D ;
Abajian, C ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :195-202