SONG Yan, YAO Shuang, YU Donghua, SHEN Yan, HU Yuzhen. A New K-Ary Crisp Decision Tree Induction with Continuous Valued Attributes[J]. Chinese Journal of Electronics, 2017, 26(5): 999-1007. doi: 10.1049/cje.2017.07.015
Citation: SONG Yan, YAO Shuang, YU Donghua, SHEN Yan, HU Yuzhen. A New K-Ary Crisp Decision Tree Induction with Continuous Valued Attributes[J]. Chinese Journal of Electronics, 2017, 26(5): 999-1007. doi: 10.1049/cje.2017.07.015

A New K-Ary Crisp Decision Tree Induction with Continuous Valued Attributes

doi: 10.1049/cje.2017.07.015
Funds:  This work is supported by the National Natural Science Foundation of China (No.51409065, No.71101034), the Heilongjiang Provincial Natural Science Foundation (No.JJ2016QN0048), the Heilongjiang Provincial Young Science Foundation (No.JJ2016QN0645), and the Heilongjiang Provincial Postdoctoral Fund (No.LBH-Z15047).
More Information
  • Corresponding author: YAO Shuang (corresponding author) was born in 1988. She received the B.S. degree in College of Computer Science and Technology from Northeast Forestry University. She is now a Ph.D. candidate in School of Economics and Management from Harbin Engineering University. Her research interests include data mining and decision analysis. (Email:alloniam@163.com)
  • Received Date: 2016-08-30
  • Rev Recd Date: 2017-04-25
  • Publish Date: 2017-09-10
  • The simplicity and interpretability of decision tree induction makes it one of the more widely used machine learning methods for data classification. However, for continuous valued (real and integer) attribute data, there is room for further improvement in classification accuracy, complexity, and tree scale. We propose a new K-ary partition discretization method with no more than K-1 cut points based on Gaussian membership functions and the expected class number. A new K-ary crisp decision tree induction is also proposed for continuous valued attributes with a Gini index, combining the proposed discretization method. Experimental results and non-parametric statistical tests on 19 real-world datasets showed that the proposed algorithm outperforms four conventional approaches in terms of both classification accuracy, tree scale, and particularly tree depth. Considering the number of nodes, the proposed methods decision tree tends to be more balanced than in the other four methods. The complexity of the proposed algorithm was relatively low.
  • loading
  • J. Han, M. Kamber and J. Pei, Data Mining:Concepts and Techniques, Morgan Kaufmann Publication Inc., 2012.
    X. Wu, V. Kumar, J.R. Quinlan, et al., " Top 10 algorithms in data mining", Knowledge & Information Systems, Vol.14, No.1, pp.1-37, 2008.
    B. Chandra and P.P. Varghese, "Fuzzy SLIQ decision tree algorithm", IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol.38, No.5, pp.1294-1301, 2008.
    L.V.N. Prasad and M.M. Naidu, "CC-SLIQ:Performance enhancement with 2k split points in SLIQ decision tree algorithm", IAENG International Journal of Computer Science, Vol.41, No.3, pp.163-173, 2014.
    B. Krawczyk, M. Wozniak and G. Schaefer, "Cost-sensitive decision tree ensembles for effective imbalanced classification", Applied Soft Computing, Vol.14, No.1, pp.554-562, 2014.
    L. Rokach, "Ensemble-based classifiers", Artificial Intelligence Review, Vol.33, No.1-2, pp.1-39, 2010.
    D. Moon, H. Im, I. Kin, et al., "DTB-IDS:An intrusion detection system based on decision tree using behavior analysis for preventing APT attacks", The Journal of Supercomputing, Vol.73, No.7, pp.2881-2895, 2017.
    J.M. Bae, "The clinical decision analysis using decision tree", Epidemiology & Health, Vol.36, pp.e2014025, 2014.
    Y. Zhang, S. Wang, P. Phillips, et al., "Binary PSO with mutation operator for feature selection using decision tree applied to spam detection", Knowledge-Based Systems, Vol.64, pp.22-31, 2014.
    J.M. Guo, Y.F. Liu, J.Y. Chang, et al., "Fingerprint classification based on decision tree from singular points and orientation field", Expert Systems with Applications, Vol.41, No.2, pp.752-764, 2014.
    J.R. Quinlan, "Induction of decision trees", Machine Learning, Vol.1, No.1, pp.81-106, 1986.
    J.R. Quinlan, C4.5:Programs for Machine Learning, Morgan Kaufmann Publishers Inc. 2014.
    L. Breiman, J.H. Friedman, R.A. Olshen, et al., Classification And Regression Trees, Wadsworth International Group, Belmont, California. 1984.
    C.C. Wu, Y.L. Chen, Y.H. Liu, et al., "Decision tree induction with a constrained number of leaf nodes",Applied Intelligence, Vol.45, No.3, pp.673-685, 2016.
    L. Liu, A.K.C. Wong and Y. Wang, "A global optimal algorithm for class-dependent discretization of continuous data", Intelligent Data Analysis, Vol.8, No.2, pp.151-170, 2004.
    M. Zeinalkhani and M. Eftekhari, "Fuzzy partitioning of continuous attributes through discretization methods to construct fuzzy decision tree classifiers", Information Sciences, Vol.278, No.26, pp.715-735, 2014.
    S. Kotsiantis and D. Kanellopoulos, "Discretization techniques:A recent survey", GESTS International Transactions on Computer Science and Engineering, Vol.32, No.1, pp.47-58, 2006.
    R.S. Gallego, S. García, M.H. Talín,et al., "Data discretization:taxonomy and big data challenge", Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery, Vol.6, No.1, pp.5-21, 2016.
    M. Mampaey, S. Nijssen, A. Feelders, et al., "Efficient algorithms for finding optimal binary features in numeric and nominal labeled data", Knowledge and Information Systems, Vol.42, No.2, pp.465-492, 2015.
    R. Wang, S. Kwong, X.Z. Wang, et al., "Segment based decision tree induction with continuous valued attributes", IEEE Transactions on Cybernetics, Vol.45, No.7, pp.1262-1275, 2015.
    H. Zhao and X.J. Li, "A cost sensitive decision tree algorithm based on weighted class distribution with batch deleting attribute mechanism", Information Sciences, Vol.378, pp.303-316, 2017.
    T. Elomaa and J. Rousu, "Efficient multisplitting revisited:Optima-preserving elimination of partition candidates", Data Mining and Knowledge Discovery, Vol.8, No.2, pp.97-126, 2004.
    K. Kim, "A hybrid classification algorithm by subspace partitioning through semi-supervised decision tree", Pattern Recognition, Vol.60, pp.157-163, 2016.
    L.A. Kurgan and K.J. Cios, "CAIM discretization algorithm", IEEE Transactions on Knowledge and Data Engineering, Vol.16, No.2, pp.145-153, 2004.
    Y. Yuan and M.J. Shaw, "Induction of fuzzy decision trees", Fuzzy Sets and Systems, Vol.69, No.2, pp.125-139, 1995.
    Y. Lertworaprachaya, Y. Yang and R. John, "Interval-valued fuzzy decision trees with optimal neighbourhood perimeter", Applied Soft Computing, Vol.24, pp.851-866, 2014.
    L. Rutkowski, M. Jaworski, L. Pietruczuk, et al., "The CART decision tree for mining data streams", Information Sciences, Vol.266, pp.1-15, 2014.
    E.B. Hunt, J. Marin and P.J. Stone, Experiments in Induction, Academic Press, New York, NY, USA, 1966.
    J.R. Quinlan, "Improved use of continuous attributes in C4.5", Journal of artificial intelligence research, Vol.4, No.1, pp.77-90, 1996.
    R.J. Lewis, "An introduction to classification and regression tree (CART) analysis", Annual Meeting of the Society for Academic Emergency Medicine in San Francisco, pp.1-14, 2000.
    B.R. Yates, G.H. Gonnet and N. Ziviani, "Expected behaviour analysis of AVL trees", Scandinavian Worksshop on Algorithm Theory, Sweden, pp.143-159, 1990.
    S. Garca, A. Fernndez, J. Luengo, et al., "Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining:Experimental analysis of power", Information Sciences, Vol.180, No.10, pp.2044-2064, 2010.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (190) PDF downloads(289) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return