A New Word Clustering Algorithm Based on Word Similarity

YUAN Lichi

doi:10.1049/cje.2017.09.016

YUAN Lichi. A New Word Clustering Algorithm Based on Word Similarity[J]. Chinese Journal of Electronics, 2017, 26(6): 1221-1226. DOI: 10.1049/cje.2017.09.016

Citation:

YUAN Lichi. A New Word Clustering Algorithm Based on Word Similarity[J]. Chinese Journal of Electronics, 2017, 26(6): 1221-1226. DOI: 10.1049/cje.2017.09.016

Citation:

YUAN Lichi. A New Word Clustering Algorithm Based on Word Similarity[J]. Chinese Journal of Electronics, 2017, 26(6): 1221-1226. DOI: 10.1049/cje.2017.09.016

A New Word Clustering Algorithm Based on Word Similarity

YUAN Lichi

Abstract

Abstract

Category-based statistic language model is an important method to solve the problem of sparse data in statistical language models. But there are two bottlenecks about this model:1) The problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation; 2) Class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a novel definition of word similarity by utilizing mutual information was presented. Based on word similarity, the definition of word set similarity was given and a bottom-up hierarchical clustering algorithm was proposed. Experimental results show that the word clustering algorithm based on word similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 207.8.

FullText(HTML)

References (21)

Cited By

A New Word Clustering Algorithm Based on Word Similarity

Abstract

Catalog

Links

Chinese Journal of Electronics

A New Word Clustering Algorithm Based on Word Similarity

Abstract

Catalog

Links

Chinese Journal of Electronics

Export File

Citation

Format

Content