WAF-Based Chinese Character Recognition for Spam Image Filtering

LI Siyuan; LI Ruiguang; XU Yuan; ZHOU Hao; YAN Hanbing; XU Bin; ZHANG Honggang

doi:10.1049/cje.2018.06.014

LI Siyuan, LI Ruiguang, XU Yuan, ZHOU Hao, YAN Hanbing, XU Bin, ZHANG Honggang. WAF-Based Chinese Character Recognition for Spam Image Filtering[J]. Chinese Journal of Electronics, 2018, 27(5): 1050-1055. DOI: 10.1049/cje.2018.06.014

Citation:

WAF-Based Chinese Character Recognition for Spam Image Filtering

Abstract

Abstract

We address the problem of filtering image spam, a kind of rapidly spread spam in which the text is embedded into images to defeat text-based spam filter. Particularly, we focus on image spam with Chinese text as "spam" which is a more challenging task. A popular way to detect image spam is by Optical character recognition (OCR) system, which detects and recognizes the embedded text, then followed by a text classifier that discriminate spam from ham. However, spammers start to obscure image text to prevent OCR system discovering the spam text. To compensate for the shortcomings of OCR system, a novel method which essentially is a keyword reconstruction algorithm based on Word activation force (WAF) model is proposed. It is effective on discovering keywords, hence is benefit for the later classification stage and notably improve the performance of image spam filtering. The experimental results on a personal data set of spam images (publicly available) validate the effectiveness of our approach that outperforms the original OCR system in practical usage with complex background in image spam.

FullText(HTML)

References (18)

Cited By

WAF-Based Chinese Character Recognition for Spam Image Filtering

Abstract

Catalog

Links

Chinese Journal of Electronics

WAF-Based Chinese Character Recognition for Spam Image Filtering

Abstract

Catalog

Links

Chinese Journal of Electronics

Export File

Citation

Format

Content