ZHANG Shangli, ZHANG Lili, QIU Kuanmin, LU Ying, CAI Baigen. Variable Selection in Logistic Regression Model[J]. Chinese Journal of Electronics, 2015, 24(4): 813-817. doi: 10.1049/cje.2015.10.025
 Citation: ZHANG Shangli, ZHANG Lili, QIU Kuanmin, LU Ying, CAI Baigen. Variable Selection in Logistic Regression Model[J]. Chinese Journal of Electronics, 2015, 24(4): 813-817.

Variable Selection in Logistic Regression Model

doi: 10.1049/cje.2015.10.025
Funds:  This work is supported by the National Natural Science Foundation of China (No.61070236, No.U1334211), and the Project of State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University (No.RCS2012ZT004).
• Corresponding author: ZHANG Lili (corresponding author)was born in Inner Mongolia, shereceived the B.E. degree in Mathematicsfrom Inner Mongolia University. She isnow a Ph.D. candidate of Chonnam NationalUniversity. Her research interests includepattern recognition and biostatistic.(Email: l1lzhang@126.com)
• Rev Recd Date: 2014-04-08
• Publish Date: 2015-10-10
• Variable selection is one of the most important problems in pattern recognition. In linear regression model, there are many methods can solve this problem, such as Least absolute shrinkage and selection operator (LASSO) and many improved LASSO methods, but there are few variable selection methods in generalized linear models. We study the variable selection problem in logistic regression model. We propose a new variable selection method-the logistic elastic net, prove that it has grouping effect which means that the strongly correlated predictors tend to be in or out of the model together. The logistic elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the LASSO is not a very satisfactory variable selection method in the case when p is more larger than n. The advantage and effectiveness of this method are demonstrated by real leukemia data and a simulation study.
•  A.E. Hoerl and R.W. Kennard, "Ridge regression: Biased estimation for nonorthogonal problem", Technometrics, Vol.12, No.1, pp.55-67, 1970. R. Tibshirani, "Regression shrinkage and selection via the LASSO", Journal of the Royal Statistical Society, Series B, Vol.58, No.1, pp.267-288, 1996 B. Efron, T. Hastie, I. Johnstone and R. Tibshirani, "Least angle regression", The Annals of Statistics, Vol.32, No.2, pp.407- 499, 2004. J. Fan and R.Z. Li, "Variable selection via penalized likelihood", Journal of American Statistical Association, Vol.96, No.456, pp.1348-1360, 2001. R. Tibshirani and M. Saunders, "Sparsity and smoothness via the fused LASSO", Journal of the Royal Statistical Society, Series B, Vol.67, No.1, pp.91-108, 2005. M. Yuan and Y. Lin, "Model selection and estimation in regression with grouped variables", Journal of the Royal Statistical Society, Series B, Vol.68, No.1, pp.49-67, 2006. N. Meinshausen, "Relaxed LASSO", Computational Statistics and Data Analysis, Vol.52, No.1, pp.374-393, 2007. R. Tibshirani, "The LASSO method for variable selection in Cox models", Statistics in Medicine, Vol.16, No.4, pp.385-395, 1997. R. Li and H. Liang, "Variable selection in semiparamtric regression modeling", The Annals of Statistics, Vol.36, No.1, pp.261- 286, 2008. M.Y. Park and T. Hastie, "L1-regularization-path algorithm for generalized linear models", Journal of the Royal Statistical Society, Series B, Vol.69, No.4, pp.659-677, 2007. P. Cai and Q. Gao, "Variable selection in generalized linear model", Journal of University of Science and Technology of China, Vol.36, No.9, pp.927-931, 2006. (In Chinese) D.R. Wang and Z.Z. Zhang, "Variable selection in joint generalized linear models", Chinese Journal of Applied Probability and Statistics, Vol.25, No.3, pp.245-256, 2009. H. Zou and T. Hastie, "Regularization and variable selection via the elastic net", Journal of the Royal Statistical Society, Series B, Vol.67, No.2, pp.301-320, 2005. I. Guyon, J. Weston, S. Barnhill and V. Vapnil, "Gene selection for cancer classification using support vector machines", Machine Learning, Vol.46, No.1-3, pp.389-422, 2002. J. Zhu and T. Hastie, "Classification of gene microarrays by penalized logistic regression", Biostatistics, Vol.5, No.3, pp.427- 443, 2004. T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing and M. Calingiuri, "Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring", Science, Vol.286, No.5439, pp.531-537, 1999. M. Segal, K. Dahlquist and B. Conklin, "Regression approach for microarray data analysis", Computational Biology, Vol.10, No.6, pp.961-980, 2003. S. Van De Geer and H. Van Houwelingen, "High dimensional data: p > n in mathematical statistics and bio-medical applications", Bernoulli, Vol.10, No.6, pp.939-943, 2004. E. Candes and T. Tao, "The Dantzig selector: Statistical estimation when p is much larger than n", The Annals of Statistics, Vol.35, No.6, pp.2313-2351, 2007. S.L. Zhang, Z.L. Ke, G.D. Wei and L.L. Zhang, "The random simulation algorithm for variable selection", Journal of Information and Computational Science, Vol.9, No.17, pp.5119- 5125, 2012. S.L. Zhang, L.C. Wang and H. Lian, "Estimation by polynomial splines with variable selection in additive Cox models", Statistics, Vol.48, No.1, pp.67-80, 2014. H. Zou, T. Hastie and R. Tibshirani, "Sparse principal component analysis", Journal of Computational and Graphical Statistics, Vol.15, No.2, pp.265-286, 2006.

Catalog

通讯作者: 陈斌, bchen63@163.com
• 1.

沈阳化工大学材料科学与工程学院 沈阳 110142

/