Abstract: | We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships
among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the
category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space.
The results of experiment indicate that it can obtain the co-occurrence relations among key-words in the documents which promote
the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of
the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality.
Foundation item: Supporteded by the National 863 Project of China (2001AA142160, 2002AA145090)
Biography: Su Gui-yang (1974-), male, Ph. D candidate, research direction: information filter and text classification. |