Documentation;
Information management;
Project management;
Classification;
INFORMATION-SYSTEMS;
MANAGEMENT;
METHODOLOGY;
D O I:
10.1061/(ASCE)CP.1943-5487.0000338
中图分类号:
TP39 [计算机的应用];
学科分类号:
081203 ;
0835 ;
摘要:
Organizing construction project documents based on semantic similarities offers several advantages over traditional metadata criteria, including facilitating document retrieval and enhancing knowledge reuse. In this study, the use of text classifiers for automatically classifying documents according to their corresponding group of semantically related documents is evaluated. Supporting documents of claims were used as representations of document discourses. The evaluation was performed under varying general conditions (such as dimensionality level and weighting method) to assess the effect of such conditions on performance, and varying classifier-specific parameters. The highest performance in terms of classification accuracy was achieved by a Rocchio classifier and a kNN classifier with the application of dimensionality reduction and using the tf-idf weighting method. A combined classifier approach was also evaluated in which the classification outcome is based on a majority vote strategy between the outcomes of three different classifiers. The evaluation demonstrated that classification accuracy of standard text classifiers can be refined by applying an appropriate level of dimensionality reduction to the training and testing sets and by combining the results of several classifiers. Accordingly, such application enables effective utilization of standard text classifiers for automatic organization of project documents based on text content.