LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks

被引:144
作者
Adai, AT
Date, SV
Wieland, S
Marcotte, EM
机构
[1] Univ Texas, Ctr Syst & Synth Biol, Austin, TX 78712 USA
[2] Univ Texas, Inst Cellular & Mol Biol, Dept Chem & Biochem, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
network; visualization; protein function; protein map; bioinformatics;
D O I
10.1016/j.jmb.2004.04.047
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Networks are proving to be central to the study of gene function, protein-protein interaction, and biochemical pathway data. Visualization of networks is important for their study, but visualization tools are often inadequate for working with very large biological networks. Here, we present an algorithm, called large graph layout (LGL), which can be used to dynamically visualize large networks on the order of hundreds of thousands of vertices and millions of edges. LGL applies a force-directed layout guided by a minimal spanning tree of the network in order to generate coordinates for the vertices in two or three dimensions, which are subsequently visualized interactively navigated with companion programs. We demonstrate the use of LGL in visualizing an extensive protein map summarizing the results of similar to21 billion sequence comparisons between 145,579 proteins from 50 genomes. Proteins are positioned in the map according to sequence homology and gene fusions, with the map ultimately serving as a theoretical framework that integrates inferences about gene function derived from sequence homology, remote homology, gene fusions, and higher-order fusions. We confirm that protein neighbors in the resulting map are functionally related, and that distinct map regions correspond to distinct cellular systems, enabling a computational strategy for discovering proteins' functions on the basis of the proteins' map positions. Using the map produced by LGL, we infer general functions for 23 uncharacterized protein families. LGL is freely available (at http://bioinformatics.icmb.utexas.edu/lgl). (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:179 / 190
页数:12
相关论文
共 44 条
  • [31] Sharan R, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P307
  • [32] BioMiner -: modeling, analyzing, and visualizing biochemical pathways and networks
    Sirava, M
    Schäfer, T
    Eiglsperger, M
    Kaufmann, M
    Kohlbacher, O
    Bornberg-Bauer, E
    Lenhof, HP
    [J]. BIOINFORMATICS, 2002, 18 : S219 - S230
  • [33] Genome evolution - gene fusion versus gene fission
    Snel, B
    Bork, P
    Huynen, M
    [J]. TRENDS IN GENETICS, 2000, 16 (01) : 9 - 11
  • [34] Three cdg operons control cellular turnover of cyclic di-GMP in Acetobacter xylinum:: Genetic organization and occurrence of conserved domains in isoenzymes
    Tal, R
    Wong, HC
    Calhoon, R
    Gelfand, D
    Fear, AL
    Volman, G
    Mayer, R
    Ross, P
    Amikam, D
    Weinhouse, H
    Cohen, A
    Sapir, S
    Ohana, P
    Benziman, M
    [J]. JOURNAL OF BACTERIOLOGY, 1998, 180 (17) : 4416 - 4425
  • [35] A genomic perspective on protein families
    Tatusov, RL
    Koonin, EV
    Lipman, DJ
    [J]. SCIENCE, 1997, 278 (5338) : 631 - 637
  • [36] The COG database: new developments in phylogenetic classification of proteins from complete genomes
    Tatusov, RL
    Natale, DA
    Garkavtsev, IV
    Tatusova, TA
    Shankavaram, UT
    Rao, BS
    Kiryutin, B
    Galperin, MY
    Fedorova, ND
    Koonin, EV
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 22 - 28
  • [37] The sequence of the human genome
    Venter, JC
    Adams, MD
    Myers, EW
    Li, PW
    Mural, RJ
    Sutton, GG
    Smith, HO
    Yandell, M
    Evans, CA
    Holt, RA
    Gocayne, JD
    Amanatides, P
    Ballew, RM
    Huson, DH
    Wortman, JR
    Zhang, Q
    Kodira, CD
    Zheng, XQH
    Chen, L
    Skupski, M
    Subramanian, G
    Thomas, PD
    Zhang, JH
    Miklos, GLG
    Nelson, C
    Broder, S
    Clark, AG
    Nadeau, C
    McKusick, VA
    Zinder, N
    Levine, AJ
    Roberts, RJ
    Simon, M
    Slayman, C
    Hunkapiller, M
    Bolanos, R
    Delcher, A
    Dew, I
    Fasulo, D
    Flanigan, M
    Florea, L
    Halpern, A
    Hannenhalli, S
    Kravitz, S
    Levy, S
    Mobarry, C
    Reinert, K
    Remington, K
    Abu-Threideh, J
    Beasley, E
    [J]. SCIENCE, 2001, 291 (5507) : 1304 - +
  • [38] STRING: a database of predicted functional associations between proteins
    von Mering, C
    Huynen, M
    Jaeggi, D
    Schmidt, S
    Bork, P
    Snel, B
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 258 - 261
  • [39] DIP: The Database of Interacting Proteins: 2001 update
    Xenarios, I
    Fernandez, E
    Salwinski, L
    Duan, XJ
    Thompson, MJ
    Marcotte, EM
    Eisenberg, D
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 239 - 241
  • [40] Genes linked by fusion events are generally of the same functional category: A systematic analysis of 30 microbial genomes
    Yanai, I
    Derti, A
    DeLisi, C
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (14) : 7940 - 7945