Volume 30 Issue 2
Apr.  2021
Turn off MathJax
Article Contents
BİLGİN Turgay Tugay and OĞUZ Murat, “Novel Approach to Minimize the Memory Requirements of Frequent Subgraph Mining Techniques,” Chinese Journal of Electronics, vol. 30, no. 2, pp. 258-267, 2021, doi: 10.1049/cje.2021.01.003
Citation: BİLGİN Turgay Tugay and OĞUZ Murat, “Novel Approach to Minimize the Memory Requirements of Frequent Subgraph Mining Techniques,” Chinese Journal of Electronics, vol. 30, no. 2, pp. 258-267, 2021, doi: 10.1049/cje.2021.01.003

Novel Approach to Minimize the Memory Requirements of Frequent Subgraph Mining Techniques

doi: 10.1049/cje.2021.01.003
More Information
  • Author Bio:

    OĞUZ Murat   was born in Sinop, Turkey, in 1977. He is now an Engineer in Microsoft Turkey Branch. His current research interests include data mining, big data analysis, high performance computing. (Email: murat.oguz@microsoft.com)

  • Corresponding author: BİLGİN Turgay Tugay   (corresponding author) was born in Kirklareli, Turkey, in 1977. He is now an Associate Professor in Bursa Technical University. His current research interests include data mining, big data analysis, high performance computing. (Email: turgay.bilgin@btu.edu.tr)
  • Received Date: 2017-05-09
  • Accepted Date: 2019-12-21
  • Publish Date: 2021-03-01
  • Frequent subgraph mining (FSM) is a subset of the graph mining domain that is extensively used for graph classification and clustering. Over the past decade, many efficient FSM algorithms have been developed with improvements generally focused on reducing the time complexity by changing the algorithm structure or using parallel programming techniques. FSM algorithms also require high memory consumption, which is another problem that should be solved. In this paper, we propose a new approach called Predictive dynamic sized structure packing (PDSSP) to minimize the memory needs of FSM algorithms. Our approach redesigns the internal data structures of FSM algorithms without making algorithmic modifications. PDSSP offers two contributions. The first is the Dynamic Sized Integer Type, a newly designed unsigned integer data type, and the second is a data structure packing technique to change the behavior of the compiler. We examined the effectiveness and efficiency of the PDSSP approach by experimentally embedding it into two state-of-the-art algorithms, gSpan and Gaston. We compared our implementations to the performance of the originals. Nearly all results show that our proposed implementation consumes less memory at each support level, suggesting that PDSSP extensions could save memory, with peak memory usage decreasing up to 38% depending on the dataset.
  • loading
  • [1]
    A. Thusoo, Z. Shao, S. Anthony, et al., "Data warehousing and analytics infrastructure at Facebook", ACM SIGMOD/PODS 2010 Conference, Indianapolis, Indiana, USA, pp. 1013–1020, 2010.
    [2]
    L. Wang, Y. Xiao, B. Shao, et al., "How to partition a billion-node graph", 2014 IEEE International Conference on Data Engineering, Chicago, IL, USA, pp. 568–579, 2014
    [3]
    A. S. Muttipati and P. Padmaja, "Analysis of large graph partitioning and frequent subgraph mining on graph data", International Journal of Advanced Research in Computer Science, Vol. 6, No. 7, pp. 29–40, 2015. http://www.researchgate.net/publication/283210911_Analysis_of_Large_Graph_Partitioning_and_Frequent_Subgraph_Mining_on_Graph_Data
    [4]
    G. V. Carlos and M. Esteban, "Comparative analysis of de Bruijn graph parallel genome assemblers", 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), IEEE, pp. 1–8, 2018.
    [5]
    N. Talukder and M. J. Zaki, "A distributed approach for graph mining in massive networks", Data Mining and Knowledge Discovery, Vol. 30, No. 5, pp. 1024–1052, 2016. doi: 10.1007/s10618-016-0466-x
    [6]
    G. Di Fatta and M. R. Berthold, "High performance subgraph mining in molecular compounds", International Conference on High Performance Computing and Communications, Springer, Berlin, Heidelberg, pp. 866–877, 2005.
    [7]
    A. Stratikopoulos, G. Chrysos, I. Papaefstathiou, et al., "HPC-gSpan: An FPGA-based parallel system for frequent subgraph mining", IEEE 2014 24th International Conference on Field Programmable Logic and Applications (FPL), Munich, Germany, pp. 1–4, 2014.
    [8]
    C. C. Aggarwal, M. A. Bhuiyan and M. Al Hasan, "Frequent pattern mining algorithms: A survey", in: C. C. Aggarwal, J. Han (editors), Frequent Pattern Mining, Switzerland: Springer International Publishing, pp. 19–64, 2014.
    [9]
    D. C. Anastasiu, J. Iverson, S. Smith, et al., "Big data frequent pattern mining", in: C. C. Aggarwal, J. Han (editors), Frequent Pattern Mining, Springer International Publishing Switzerland, pp. 225–259, 2014.
    [10]
    X. Yan and J. Han, "gSpan: Graph-based substructure pattern mining", in: IEEE 2003 IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 721–724, 2003.
    [11]
    S. Nijssen and J. N. Kok, "The Gaston tool for frequent subgraph mining", Electronic Notes in Theoretical Computer Science, Vol. 127, No. 1, pp. 77–87, 2005. doi: 10.1016/j.entcs.2004.12.039
    [12]
    X. Yan and J. Han, "CloseGraph: Mining closed frequent graph patterns", in: ACM SIGKDD 2003 International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, pp. 286–295, 2003.
    [13]
    K. Lakshmi and D. T. Meyyappan, "A comparative study of frequent subgraph mining algorithms", International Journal of Information Technology Convergence and Services, Vol. 2, No. 2, pp. 23–39, 2012. doi: 10.5121/ijitcs.2012.2203
    [14]
    C. Borgelt and M. R. Berthold, "Mining molecular fragments: Finding relevant substructures of molecules", IEEE 2003 International Conference on Data Mining, Melbourne, Florida, USA, pp. 51–58, 2003.
    [15]
    B. Guan, X. Z. Zan, B. Y. Xiao, et al., "Detecting dense subgraphs in complex networks based on edge density coefficient", Chinese Journal of Electronics, Vol. 22, No. 3, pp. 517–520, 2013. http://www.ejournal.org.cn/Jweb_cje/EN/abstract/abstract7822.shtml
    [16]
    M. Kuramochi and G. Karypis, "Frequent subgraph discovery", IEEE 2001 International Conference on Data Mining, San Jose, California, USA, pp. 313–320, 2001.
    [17]
    N. Vanetik, E. Gudes and S. E. Shimony, "Computing frequent graph patterns from semistructured data", IEEE 2002 International Conference on Data Mining, Maebashi City, Japan, pp. 458–465, 2002.
    [18]
    S. U. Rehman, S. Asghar and S. J. Fong, "Optimized and frequent subgraphs: How are they related?", IEEE Access, DOI: 10.1109/ACCESS.2018.2846604, pp.37237–37249, 2018.
    [19]
    S. Yang, R. Guo, R. Liu, et al., "cmFSM: A scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining", BMC Bioinformatics, Vol. 19, Article No. 98, 2018.
    [20]
    I. Horton, "Working with fundamental data types", in: S. Anglin (lead editor), Beginning C++, NY, USA, pp. 55–77, 2014.
    [21]
    D. A. Bader, H. Meyerhenke, P. Sanders, et al., "Benchmarking for graph clustering and partitioning", Encyclopedia of Social Network Analysis and Mining, pp. 73–84, 2012. doi: 10.1007/978-1-4614-6170-8_23
    [22]
    Randal E. Bryant and David R. O'Hallaron, Computer Systems: A programmer's Perspective, 3rd edition, Pearson, 2016.
    [23]
    Microsoft, "Working with packing structures", https://docs.microsoft.com/en-us/previousversions/ms253935, 2017-12-8.
    [24]
    Valgrind and Massif Visualizer tools, http://valgrind.org/info/tools.htm, 2017-12-8.
    [25]
    Nikil Wale, Ian A. Watson and George Karypis, "Comparison of descriptor spaces for chemical compound retrieval and classification", Knowledge and Information Systems, Vol. 14, pp. 347–375, 2008. doi: 10.1007/s10115-007-0103-5
    [26]
    P. D. Dobson and A. J. Doig, "Distinguishing enzyme structures from non-enzymes without alignments", Journal of Molecular Biology, Vol. 330, No. 4, pp. 771–783, 2003. doi: 10.1016/S0022-2836(03)00628-4
    [27]
    D. Wajdi, A. Sabeur and N. E. Mephu, "MR-SimLab: Scalable subgraph selection with label similarity for big data", Information Systems, Elsevier, Vol. 69, pp. 155–163, 2017.
    [28]
    Thoma M, Cheng H, Gretton A, et al., "Discriminative frequent subgraph mining with optimality guarantees", Statistical Analysis and Data Mining, The ASA Data Science Journal, Vol. 3, pp. 302–18, 2010.
    [29]
    [30]
    Wörlein M., Meinl T., Fischer I., et al., "A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston", European Conference on Principles of Data Mining and Knowledge Discovery, Springer, Berlin, Heidelberg, pp. 392–403, 2005.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(13)

    Article Metrics

    Article views (539) PDF downloads(13) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return