共 31 条
Gene recognition based on nucleotide distribution of ORFs in a hyper-thermophilic crenarchaeon, Aeropyrum pernix K1
被引:16
作者:
Guo, FB
[1
]
Wang, J
[1
]
Zhang, CT
[1
]
机构:
[1] Tianjin Univ, Dept Phys, Tianjin 300072, Peoples R China
关键词:
gene recognition;
protein-coding genes;
Acropyrum pernix K1;
nucleotide composition;
three clusters;
D O I:
10.1093/dnares/11.6.361
中图分类号:
Q3 [遗传学];
学科分类号:
071007 ;
090102 ;
摘要:
The 2694 ORFs originally annotated as potential genes in the genome of Acropyrum pernix can be categorized into three clusters (A, B, C), according to their nucleotide composition at three codon positions. Coding potential was found to be responsible for the phenomenon of three clusters in a 9-dimensional space derived from the nucleotide composition of ORFs: ORFs assigned to cluster A are coding ones, while those assianed to clusters B and C are non-coding ORFs. A "codingness" index called the AZ score is defined based on a clustering method used to recognize protein-coding genes in the A. pernix genome. The criterion for a coding or non-coding ORF is based on the AZ score. ORFs with AZ > 0 or AZ < 0 are coding or non-coding, respectively. Consequently, 620 out of 632 ORFs with putative functions based on the original annotation are contained in cluster A, which have positive AZ scores. In addition, all 29 ORFs encoding putative or conserved proteins newly added in RefSeq annotation also have positive AZ scores. Accordingly, the number of re-recognized protein-coding genes in the A. pernix genome is 1610, which is significantly less than 2694 in the original annotation and also much less than 1841 in the RefSeq annotation curated by NCBI staff.
引用
收藏
页码:361 / 370
页数:10
相关论文