A Novel Single-Feature and Synergetic-Features Selection Method by Using ISE-Based KDE and Random Permutation

ZHANG Jingxiang; WANG Shitong

doi:10.1049/cje.2016.01.018

Volume 25 Issue 1

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Electronics > 2016 > 25(1): 114-120

ZHANG Jingxiang and WANG Shitong, “A Novel Single-Feature and Synergetic-Features Selection Method by Using ISE-Based KDE and Random Permutation,” Chinese Journal of Electronics, vol. 25, no. 1, pp. 114-120, 2016, doi: 10.1049/cje.2016.01.018

Citation:

ZHANG Jingxiang and WANG Shitong, “A Novel Single-Feature and Synergetic-Features Selection Method by Using ISE-Based KDE and Random Permutation,” Chinese Journal of Electronics, vol. 25, no. 1, pp. 114-120, 2016, doi: 10.1049/cje.2016.01.018

Citation:

PDF( 289 KB)

A Novel Single-Feature and Synergetic-Features Selection Method by Using ISE-Based KDE and Random Permutation

doi: 10.1049/cje.2016.01.018

ZHANG Jingxiang^1,2,
WANG Shitong¹

1.
School of Digital Media, Jiangnan University, Jiangsu 214122, China;
2.
School of Science, Jiangnan University, Jiangsu 214122, China

Funds: This work is supported by the National Natural Science Foundation of China (No.61202311, No.61300151), the Research and Development Frontier Grant of Jiangsu Province (No.BY2013015-02), Doctor Candidate Foundation of Jiangnan University (No.JUDCF13031), and the 2013 Postgraduate Students Creative Research Fund of Jiangsu Province (No.CXLX13_748).

Received Date: 2014-02-01
Rev Recd Date: 2014-08-01
Publish Date: 2016-01-10

Abstract

Abstract

The Integrated square error (ISE), as a robust criterion for measuring the difference of densities between two datasets, have been commonly used in pattern recognition. In this paper, two different criteria for evaluating candidate feature subsets are investigated: first, a novel supervised feature selection criterion based on ISE and random permutation of a single feature is proposed, which presents a feature ranking criterion to measure the importance of each feature by computing the ISE over the feature space. Second, a synergetic feature selection criterion is developed. Experimental results on synthetic and real data set show the superior or at least comparable performance compared with existing feature selection algorithms.
- Feature selection,
- Integrated squared error,
- Random permutation

FullText(HTML)

References(28)

References

C. Lee and D.A. Landgrebe, “Feature extraction based on decision boundaries”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.15, No.4, pp.388-400, 1993.

W.H. Hsu, “Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning”, Information Sciences, Vol.163, No.17, pp.103- 122, 2004.

M. Krzysztof, et al., “Correlation-based feature selection strategy in classification problems”, Int. J. of Application Mathematics Computer Science, Vol.16, No,4, pp.503-511, 2006.

J. Zhong and Q.G. Sun, “A novel feature selection method based on probability latent semantic analysis for Chinese text classification”, Chinese Journal of Electronics, Vol.20, No.2, pp.228- 232, 2011.

X.F. He, D. Cai and Niyogi P, “Laplacian scores for feature selection”, Proc. of the Neural Information Processing Systems. MIT Press, pp.507-514, 2005

W.J. Hu, K.S. Choi, Y.G. Gu, et al, “Minimum or maximum local structure information for feature selection”, Pattern Recognition Letters, Vol,34. pp.527-535, 2013.

K. Kira and L.A. Rendell, “A practical method to feature selection”, Proc. of the 9th International Workshop on Machine Leaning, San Francisco, USA, pp.249-256, 1992.

I. Kononenko, “Estimating attributes: Analysis and extensions of RELIEF”, Proc. of ECML. Catania, New York, pp.171-182, 1994.

Z.H. Deng, F.L. Chung and S.T. Wang, “Robust relief-feature weighing, margin maximization, and fuzzy optimization”, IEEE Transactions on Fuzzy Systems, Vol.18, No.4, pp.726-744, 2010.

P. He and F. Long, “Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and minredundancy”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.27, No.8, pp.1226-1238, 2005.

M. Girolami and C. He, “Probability density estimation from optimally condensed data samples”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.25, No.10, pp.1253-1264, 2003.

N. Kwak and C.H. Choi, “Input feature selection by mutual information based on Parzen window”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.24, No.12, pp.1667-1671, 2002.

W.S. David, “Parametric statistical modeling by minimum integrated square error”, Technometrics, Vol.43, pp.274-285, 2005.

X.M. Wang and S.T. Wang, “Feature ranking by weighting and the ISE criterion of nonparametric estimation”, Journal of Applied Sciences, Vol.9, No.6, pp.1014-1024, 2009.

J. Kim and C.D. Scott, “L2 kernel classification”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.32, No.10, pp.1822-1831, 2010.

K.Q. Shen, C.J. Ong, X.P. Li, et al., “Feature selection via sensitivity analysis of SVM probabilistic outputs”, Machine Learning, Vol.70, No.1, pp.1-20, 2008.

J.B. Yang and C.J. Ong, “An effective feature selection method via mutual information estimation”, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, Vol.42, No.6, pp.1550-1559, 2012.

J.B. Yang, K.Q. Shen, C.J. Ong, et al., “Feature selection for MLP neural network: The use of random permutation of probabilistic outputs”, IEEE Transactions on Neural Networks, Vol.20, No.12, pp.1911-1922, 2009.

Z.C. Lu, Q. Zheng and Q. Jin, “Constructing rough set based unbalanced binary tree for feature selection”, Chinese Journal of Electronics, Vol.23, No.3, pp.474-479, 2014.

E. Parzen, “On estimation of a probability density function and mode source”, Ann. Math. Statist, Vol.33, No.3, pp.1065-1076, 1962.

M.D. Marzio and C.C. Taylor, “Kernel density classification and boosting: An L2 analysis”, Statistics and Computing, Statistics and Computing, Vol.15, No.11, pp.113-123, 2005.

M. Rosenblatt, “Global measures of deviation for kernel and nearest neighbor density estimates”, Springer Berlin Heidelberg, pp.181-190, 1979.

P.K. Pelckmans, et al., “A risk minimization principle for a class of parzen estimators”, Proc. of the Advances in Neural Information Processing Systems, pp.123-130, 2007.

P. Meinicke, T. Twellmann and H. Ritter, “Discriminative densities from maximum contrast estimation”, Proc. of the Neural Information Proceeding Systems 15, Vancouver, Canada, pp.985-992, 2002.

A.K. Jain, P.W. Robert and J.C. Mao, “Statistical pattern recognition: A review”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.22, No.1, pp.4-37, 2000.

S.T. Wang, J. Wang and F.L. Chung, “Kernel density estimation, kernel methods, and fast learning in large data sets”, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, Vol.44, No.1, pp.1-20, 2014.

L. Breiman, “Random forests”, Machine Learning, Vol.1, No.45, pp.5-32, 2001.

A. Asuncion and D.J. Newman, UCI Machine Learning Repository, Univ. California, Irvine, CA, 2007. Available: http://www.ics.uci.edu/ mlearn/MLRepository.html. 2014-9-8.

Relative Articles

Supplements(0)

Cited By

Proportional views