REN Yafeng, JI Donghong, YIN Lan, et al., “Finding Deceptive Opinion Spam by Correcting the Mislabeled Instances,” Chinese Journal of Electronics, vol. 24, no. 1, pp. 52-57, 2015,
Citation: REN Yafeng, JI Donghong, YIN Lan, et al., “Finding Deceptive Opinion Spam by Correcting the Mislabeled Instances,” Chinese Journal of Electronics, vol. 24, no. 1, pp. 52-57, 2015,

Finding Deceptive Opinion Spam by Correcting the Mislabeled Instances

Funds:  This work is supported by the State Key Program of National Natural Science Foundation of China (No.61133012), the National Natural Science Foundation of China (No.61173062, No.61070082), and the National Philosophy Social Science Major Bidding Project of China (No.11&zd189).
More Information
  • Corresponding author: JI Donghong was born in 1967. He is a professor in School of Computer at Wuhan University. His research interests include natural language processing and data mining, etc. (Email: dhji@whu.edu.cn)
  • Received Date: 2013-01-01
  • Rev Recd Date: 2014-05-01
  • Publish Date: 2015-01-10
  • Assessing the trustworthiness of reviews is a key in natural language processing and computational linguistics. Previous work mainly focuses on some heuristic strategies or simple supervised learning methods, which limit the performance of this task. This paper presents a new approach, from the viewpoint of correcting the mislabeled instances, to find deceptive opinion spam. Partition a dataset into several subsets, construct a classifier set for each subset and select the best one to evaluate the whole dataset. Error variables are defined to compute the probability that the instances have been mislabeled. The mislabeled instances are corrected based on two threshold schemes, majority and non-objection. The results display significant improvements in our method in contrast to the existing baselines.
  • loading
  • B. Pang and L. Lee, "Opinion mining and sentiment analysis", Foundations and Trends in Information Retrieval, Vol.2, No.1- 2, pp.1-135, 2008.
    B. Liu, Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers, 2012.
    M.Q. Hu and B. Liu, "Ming opinion features in customer reviews", Proc. of 19th National Conference on Artificial Intelligence, San Jose, USA, pp.755-760, 2004.
    M. Ganapathibhotla and B. Liu, "Ming opinions in comparative sentences", Proc. of 22nd International Conference on Computational Linguistics, Manchester, UK, pp.241-248, 2008.
    P. Lv, L. Zhong and Y.T. Wu, "An algorithm integrating mentality and influence of opinion holder for opinion mining", Chinese Journal of Electronic, Vol.22, No.4, pp.655-660, 2013.
    N. Jindal and B. Liu, "Opinion spam and analysis", Proc. of the First ACM International Conference on Web Search and Data Mining, California, USA, pp.137-142, 2008.
    M. Ott, Y.J. Choi, C. Caridie, et al., "Finding deceptive opinion spam by any stretch of the imagination", Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technoloies, Portland, Oregon, USA, pp.309-319, 2011.
    F.T. Li, M.L. Huang, Y. Yang, et al., "Learning to identify review spam", Proc. of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain, pp.2488-2493, 2011.
    S. Feng, R. Banerjee and Y.J. Choi, "Syntactic Stylometry for Deception Detection", Proc. of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea, pp.171-175, 2012.
    H. Drucker, D.H. Wu, V.N. Vapnik, "Support vector machines for spam categorization", IEEE Transactions on Neural Networks, Vol.10, No.5, pp.1048-1054, 1999.
    Z. Gyongyi, H.G. Molina and J. Pedesen, "Combating web spam web with trustrank", Proc. of the 30th International Conference on Very Large Data Bases, Toronto, Canada, pp.576-587, 2004.
    A. Ntoulas, M. Najork, M. Manasse, et al., "Detecting spam web pages through content analysis", Proc. of the 15th International Conference on World Wide Web, Edinburgh, Scotland, pp.83-92, 2006.
    F. Wu and B.A. Huberman, "Opinion information under costly express", ACM Transactions on Intelligence System Technology, Vol.1, No.5, pp.1-13, 2010.
    J.R. Landis and G.G. Koch, "The measurement of observer agreement for categorical data", Biometrics, Vol.33, No.1, pp.159-174, 1977.
    R. Mihalcea and C. Strapparave, "The lie detector: Explorations in the automatic recognition of deceptive language", Proc. of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Suntec, Singapore, pp.309-312, 2009.
    A. Engelson, M. Koppel and G. Avneri, "Style-based text categorization: What newspaper am I reading", Proc. of the AAAI Workshop on Text Categorization, Madison, Wisonsin, USA, pp.1-4, 1998.
    Y. Zhao and J. Zobel, "Searching with style: Authorship attribution in classic literature", Proc. of the Thirtieth Australasian Conference on Computer Science, Darlinghurst, Australia, pp.59-68, 2007.
    M.L. Newman, J.W. Pennebaker, D.S. Berry, et al., "Lying words: Predicting deception from linguistic styles", Personality and Social Psychology Bulletin, Vol.29, No.5, pp.665-675, 2003.
    L. Zhou, J.K. Burgboon, D.P. Twitchell, et al., "A comparison of classification methods for predicting deception in computermediated communication", Journal of Management Information Systems, Vol.20, No.4, pp.139-166, 2004.
    J.W. Pennebaker, C.K. Chung, M. Ireland, et al., "The development and psychometric properties of LIWC2007", LIWC.Net, Austin, TX, 2007.
    T.H. Wang, D.Y. Zhao and F.L. Liu, "An efficient kernel evaluation criterion for multiclass classification", Chinese Journal of Electronic, Vol.22, No.2, pp.219-224, 2013.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (541) PDF downloads(1058) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return