KE Jie, DONG Hongbin, TAN Chengyu, LIANG Yiwen. PBWA:A Provenance-Based What-If Analysis Approach for Data Mining Processes[J]. Chinese Journal of Electronics, 2017, 26(5): 986-992. doi: 10.1049/cje.2017.06.003
Citation: KE Jie, DONG Hongbin, TAN Chengyu, LIANG Yiwen. PBWA:A Provenance-Based What-If Analysis Approach for Data Mining Processes[J]. Chinese Journal of Electronics, 2017, 26(5): 986-992. doi: 10.1049/cje.2017.06.003

PBWA:A Provenance-Based What-If Analysis Approach for Data Mining Processes

doi: 10.1049/cje.2017.06.003
Funds:  This work is supported by the National Natural Science Foundation of China (No.61170306).
More Information
  • Corresponding author: DONG Hongbin (corresponding author) was born in Xiantao, Hubei Province, in 1964. She received the Ph.D. degree in computer software and theory from Wuhan University, in 2000. She is now a professor of International School of Software, Wuhan University. Her research interests include data provenance, data mining and evolutionary computation. (Email:hbdong@whu.edu.cn)
  • Received Date: 2015-11-18
  • Rev Recd Date: 2016-01-27
  • Publish Date: 2017-09-10
  • This paper presents a Provenance-based what-if analysis approach (PBWA) for data mining processes, so decision makers can examine the latest mining result under hypothetical business contexts. It fills the gap that data mining only reveals past status of enterprises with historical data. Provenance information is a kind of metadata of data mining processes. PBWA uses it to identify relevant operation path and intermediate results that is affected by hypothetical business contexts. It refreshes the mining result by partially rerunning the affected portions. Different from previous studies only for relational operations, PBWA can take more general operations into account. Besides, it focuses on the whole mining processes. Experiments demonstrate that when the affected ratio is less than 74% and 87% in different contexts, PBWA can achieve better time performance.
  • loading
  • A. Labrinidis and H.V. Jagadish, "Challenges and opportunities with big data", Proc. of the Very Large Data Bases Endowment, Vol.5, No.12, pp.2032-2033, 2012.
    X. Guo, S. Zhao, C. Wang, et al., "A new visualizing mining method of multi-valued attribute association rules for ordinary users", Acta Electronica Sinica, Vol.43, No.2, pp.344-352, 2015. (in Chinese)
    A.A. Freitas, "Comprehensible classification models:A position paper", ACM SIGKDD Explorations Newsletter, Vol.15, No.1, pp.1-10, 2014.
    P.J. Haas, P.P. Maglio, P.G. Selinger, et al., "Data is dead· · · without what-if models", Proc. of the Very Large Data Bases Endowment, Vol.4, No.12, pp.399-414, 2014.
    A. Mohapatra and M.R. Genesereth, "Incremental maintenance of aggregate views", Foundations of Information and Knowledge Systems, Vol.8367, pp.1486-1489, 2011.
    A. Mohapatra and M.R. Genesereth, "An incremental algorithm to optimally maintain aggregate views", Proc. of International Conference on Logic for Programming, Artificial Intelligence and Reasoning, Vol.26, pp.88-96, 2013.
    W. Qu, S. Dessloch and J. Widom, "Provenance-based refresh in data-oriented workflows", Proc. of Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp.1659-1668, 2011.
    R. Ikeda and S. Salihoglu, "A real-time materialized view approach for analytic flows in hybrid cloud environments", Datenbank-Spektrum, Vol.14, No.2, pp.97-106, 2014.
    R. Ikeda, "Provenance in data-oriented workflows", Ph.D. Thesis, Stanford University, USA, 2012.
    C.M. Lin, Y.L. Hsieh and K.C. Yin, "ADMiner:An incremental data mining approach using a compressed FP-tree", Journal of Software, Vol.8, No.8, pp.2059-2103, 2013.
    M. Mizutani, "Incremental mining of system log format", Proc. of IEEE International Conference on Services Computing, pp.595-602, 2013.
    R. Wirth and J. Hipp, "CRISP-DM:Towards a standard process model for data mining", Proc. of Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp.29-39, 2000.
    L. Moreau and P. Groth, "Provenance:An introduction to prov", Synthesis Lectures on the Semantic Web:Theory and Technology, Vol.3, No.4, pp.1-129, 2013.
    H. Fan, "Data lineage tracing in data warehousing environments", Data Management. Data, Data Everywhere, LNCS, Vol.4587, pp.25-36, 2007.
    M. Stamatogiannakis, P. Groth and H. Bos, "Decoupling provenance capture and analysis from execution", 7th USENIX Workshop on the Theory and Practice of Provenance, pp.122-127, 2015.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (160) PDF downloads(248) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return