LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks

被引：144

作者：

Adai, AT

Date, SV

Wieland, S

Marcotte, EM

机构：

[1] Univ Texas, Ctr Syst & Synth Biol, Austin, TX 78712 USA

[2] Univ Texas, Inst Cellular & Mol Biol, Dept Chem & Biochem, Austin, TX 78712 USA

来源：

JOURNAL OF MOLECULAR BIOLOGY | 2004年 / 340卷 / 01期

基金：

美国国家科学基金会;

关键词：

network; visualization; protein function; protein map; bioinformatics;

D O I：

10.1016/j.jmb.2004.04.047

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Networks are proving to be central to the study of gene function, protein-protein interaction, and biochemical pathway data. Visualization of networks is important for their study, but visualization tools are often inadequate for working with very large biological networks. Here, we present an algorithm, called large graph layout (LGL), which can be used to dynamically visualize large networks on the order of hundreds of thousands of vertices and millions of edges. LGL applies a force-directed layout guided by a minimal spanning tree of the network in order to generate coordinates for the vertices in two or three dimensions, which are subsequently visualized interactively navigated with companion programs. We demonstrate the use of LGL in visualizing an extensive protein map summarizing the results of similar to21 billion sequence comparisons between 145,579 proteins from 50 genomes. Proteins are positioned in the map according to sequence homology and gene fusions, with the map ultimately serving as a theoretical framework that integrates inferences about gene function derived from sequence homology, remote homology, gene fusions, and higher-order fusions. We confirm that protein neighbors in the resulting map are functionally related, and that distinct map regions correspond to distinct cellular systems, enabling a computational strategy for discovering proteins' functions on the basis of the proteins' map positions. Using the map produced by LGL, we infer general functions for 23 uncharacterized protein families. LGL is freely available (at http://bioinformatics.icmb.utexas.edu/lgl). (C) 2004 Elsevier Ltd. All rights reserved.

引用

页码：179 / 190

页数：12

共 44 条

[1] Clustering of proximal sequence space for the identification of protein families
Abascal, F
Valencia, A
[J]. BIOINFORMATICS, 2002, 18 (07) : 908 - 921
[2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Altschul, SF
Madden, TL
Schaffer, AA
Zhang, JH
Zhang, Z
Miller, W
Lipman, DJ
[J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
[3] The InterPro database, an integrated documentation resource for protein families, domains and functional sites
Apweiler, R
Attwood, TK
Bairoch, A
Bateman, A
Birney, E
Biswas, M
Bucher, P
Cerutti, T
Corpet, F
Croning, MDR
Durbin, R
Falquet, L
Fleischmann, W
Gouzy, J
Hermjakob, H
Hulo, N
Jonassen, I
Kahn, D
Kanapin, A
Karavidopoulou, Y
Lopez, R
Marx, B
Mulder, NJ
Oinn, TM
Pagni, M
Servant, F
Sigrist, CJA
Zdobnov, EM
[J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 37 - 40
[4] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
Bairoch, A
Apweiler, R
[J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
[5] Batagelj V., 1998, Connections, V21, P47
[6] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[7] Comparison of the complete protein sets of worm and yeast: Orthology and divergence
Chervitz, SA
Aravind, L
Sherlock, G
Ball, CA
Koonin, EV
Dwight, SS
Harris, MA
Dolinski, K
Mohr, S
Smith, T
Weng, S
Cherry, JM
Botstein, D
[J]. SCIENCE, 1998, 282 (5396) : 2022 - 2028
[8] CHESWICK B, 2000, P US ANN TECHN C JUN
[9] GeneRAGE: a robust algorithm for sequence clustering and domain detection
Enright, AJ
Ouzounis, CA
[J]. BIOINFORMATICS, 2000, 16 (05) : 451 - 457
[10] An efficient algorithm for large-scale detection of protein families
Enright, AJ
Van Dongen, S
Ouzounis, CA
[J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (07) : 1575 - 1584

← 1 2 3 4 5 →