HAN Zhongyuan, YANG Muyun, KONG Leilei, QI Haoliang, LI Sheng. A Hybrid Model for Microblog Real-Time Filtering[J]. Chinese Journal of Electronics, 2016, 25(3): 432-440. doi: 10.1049/cje.2016.05.007
Citation: HAN Zhongyuan, YANG Muyun, KONG Leilei, QI Haoliang, LI Sheng. A Hybrid Model for Microblog Real-Time Filtering[J]. Chinese Journal of Electronics, 2016, 25(3): 432-440. doi: 10.1049/cje.2016.05.007

A Hybrid Model for Microblog Real-Time Filtering

doi: 10.1049/cje.2016.05.007
Funds:  This work is supported by the National Natural Science Foundation of China (No.61370170, No.61402134, No.61173074) and the Youth National Social Science Foundation of China (No.14CTQ032).
More Information
  • Corresponding author: YANG Muyun was born in 1971, received his Ph.D. in computer science and engineering from Harbin Institute of Technology. As an associate professor in HIT, his research interests include machine translation, information retrieval. He is also a senior member of CCF and a member of IEEE/ACM/ACL and CIPSC. (Email: ymy@mtlab.hit.edu.cn)
  • Received Date: 2014-11-14
  • Rev Recd Date: 2015-09-30
  • Publish Date: 2016-05-10
  • The task of real-time microblog filtering is to decide if the subsequently posted tweets are relevant to a given query representing special information needs. The filters based on the retrieval model or the text classification model are the main solutions for this task. To best exploit the strengths of the two models, a hybrid model using the retrieval model as prior knowledge to rectify the hyperplane of classification is proposed. The hybrid filtering model incorporates the language model and the logistic regression model. Evaluated on the Text RetriEval Conference (TREC) 2012 microblog real-time filtering track dataset, the experimental results show that the proposed model is significantly better than the logistic regression model and the language model. Especially, it outperforms the best method of the TREC 2012 microblog real-time filtering track.
  • loading
  • H. Bosch, D. Thom, F. Heimerl, et al., "Scatterblogs2: Realtime monitoring of microblog messages through user-guided filtering", IEEE Transactions on Visualization and Computer Graphics, Vol.19, No.12, pp.2022-2031, 2013.
    D. Wu, F. Yang and C. Zhang, "Statistical methods based on semantic similarity of topics related to microblogging", Journal of Software, Vol.8, No.1, pp.192-199, 2013.
    E. Diaz-Aviles, L. Drumond, L. Schmidt-Thieme, et al., "Realtime top-n recommendation in social streams", Proc. of the Sixth ACM Conference on Recommender Systems, pp.59-66, 2012.
    N. Asadi and J. Lin, "Fast candidate generation for real-time tweet search with Bloom filter chains", ACM Transactions on Information Systems, Vol.31, No.3, pp.13:1-13:36, 2013.
    J. Golbeck, "The twitter mute button: A web filtering challenge", Proc. of the SIGCHI Conference on Human Factors in Computing Systems, ACM, pp.2755-2758, 2012.
    I. Soboroff, I. Ounis and J. Lin, "Overview of the TREC-2012 microblog track", Proc. of the Twenty-First Text REtrieval Conference, USA, 2012.
    Y. Zhang, "Using bayesian priors to combine classifiers for adaptive filtering", Proc. of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp.345-352, 2004.
    J. Allan, "Incremental relevance feedback for information filtering", Proc. of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.270-278, 1996.
    N. Limsopatham, R. McCreadie, M. Albakour, et al., "University of Glasgow at TREC 2012: Experiments with Terrier in medical records, microblog, and web tracks", Proc. of the Twenty-First Text REtrieval Conference, USA, 2012.
    Z. Han, X. Li, M. Yang, et al., "HIT at TREC 2012 microblog track", Proc. of the Twenty-First Text REtrieval Conference, USA, 2012.
    F. Liang, R. Qiang, Y. Hong, et al., "PKUICST at TREC 2012 microblog track", Proc. of the Twenty-First Text REtrieval Conference, USA, 2012.
    S. Karimi, J. Yin and P. Thomas, "Searching and filtering tweets: CSIRO at the TREC 2012 microblog track", Proc. of the Twenty-First Text REtrieval Conference, USA, 2012.
    Z. YANG, K. FAN, Y. LAI, et al., "Short texts classification through reference document expansion", Chinese Journal of Electronics, Vol.23, No.2, pp.315-321, 2014.
    M. Albakour, C. Macdonald and I. Ounis, "On sparsity and drift for effective real-time filtering in microblogs", Proc. of the 22nd ACM International Conference on Conference on Information and Knowledge Management, pp.419-428, 2013.
    K. Appel, L. Mathews, D. Lim, et al., "Siena's Twitter Information Retrieval System: The 2012 Microblog Track", Proc. of the Twenty-First Text REtrieval Conference, USA, 2012.
    A.S.E. Din and W. Magdy, "Web-based pseudo relevance feedback for microblog retrieval", Proc. of the Twenty-First Text REtrieval Conference, USA, 2012.
    J. Goodman and W. Yih, "Online discriminative spam filter training", Proc. of the Third Conference on Email and Anti- Spam, USA, 2006.
    C. Zhai and J. Lafferty, "A study of smoothing methods for language models applied to information retrieval", ACM Transactions on Information Systems, Vol.22, No.2, pp.179-214, 2004.
    J. Lafferty and C. Zhai, "Document language models, query models, and risk minimization for information retrieval", Proc. of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, pp.111-119, 2001.
    C. Zhai and J. Lafferty, "Model-based feedback in the language modeling approach to information retrieval", Proc. of the Tenth International Conference on Information and Knowledge Management, pp.403-410, 2001.
    V. Lavrenko and W.B. Croft, "Relevance based language models", Proc. of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, USA, pp.120-127, 2001.
    W. Dakka, L. Gravano and P.G. Ipeirotis, "Answering general time-sensitive queries", IEEE Transactions on Knowledge and Data Engineering, Vol.24, No.2, pp.220-235, 2012.
    X. Li and W.B. Croft, "Time-based language models", Proc. of the Twelfth International Conference on Information and Knowledge Management, USA, pp.469-475, 2003.
    M. Efron and G. Golovchinsky, "Estimation methods for ranking recent information", Proc. of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, USA, pp.495-504, 2011.
    G. DONG, et al., "Microblog burst keywords detection based on social trust and dynamics model", Chinese Journal of Electronics, Vol.23, No.4, pp.695-700, 2014.
    D. Sculley, "Practical learning from one-sided feedback", Proc. of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, USA, pp.609-618, 2007.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (204) PDF downloads(893) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return