Ensemble of multiple kNN classifiers for societal risk classification |
| |
Authors: | Jindong Chen Xijin Tang |
| |
Affiliation: | 1.Academy of Mathematics and Systems Science,Chinese Academy of Sciences,Beijing,China;2.China Aerospace Academy of Systems Science and Engineering,Beijing,China |
| |
Abstract: | Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representations: string representation, term-frequency representation, TF-IDF representation and the distributed representation of BBS posts are applied. Using edit distance or cosine similarity as distance metric, four k-Nearest Neighbor (kNN) classifiers based on different representations are developed and compared. Owing to the priority of word order and semantic extraction of the neural network model Paragraph Vector, kNN based on the distributed representation generated by Paragraph Vector (kNN-PV) shows effectiveness for societal risk classification. Furthermore, to improve the performance of societal risk classification, through different weights, kNN-PV is combined with other three kNN classifiers as an ensemble model. Through brute force grid search method, the optimal weights are assigned to different kNN classifiers. Compared with kNN-PV, the experimental results reveal that Macro-F of the ensemble method is significantly improved for societal risk classification. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|