LIAN Zifeng, JING Xiaojun, WANG Xiaohan, HUANG Hai, TAN Youheng, CUI Yuanhao. DropConnect Regularization Method with Sparsity Constraint for Neural Networks[J]. Chinese Journal of Electronics, 2016, 25(1): 152-158. doi: 10.1049/cje.2016.01.023
Citation: LIAN Zifeng, JING Xiaojun, WANG Xiaohan, HUANG Hai, TAN Youheng, CUI Yuanhao. DropConnect Regularization Method with Sparsity Constraint for Neural Networks[J]. Chinese Journal of Electronics, 2016, 25(1): 152-158. doi: 10.1049/cje.2016.01.023

DropConnect Regularization Method with Sparsity Constraint for Neural Networks

doi: 10.1049/cje.2016.01.023
Funds:  This work is supported by the National Natural Science Foundation of China (No.61143008, No.61471066), National High Technology Research and Development Program of China (No.2011AA01A204).
  • Received Date: 2015-05-11
  • Rev Recd Date: 2015-06-23
  • Publish Date: 2016-01-10
  • DropConnect is a recently introduced algorithm to prevent the co-adaptation of feature detectors. Compared to Dropout, DropConnect gains state-of-the-art results on several image recognition benchmarks. Motivated by the success of DropConnect, we extended this algorithm with the ability of sparse feature selection. In DropConnect algorithm, the dropping masks of weights are generated using Bernoulli gating variables that are independent of the weights and activations. We introduce a new strategy to generate masks depending on the outputs of previous layer. Using this method, neurons which are promising to produce sparser features will be assigned a bigger possibility to keep active in the forward and backward propagations. We then evaluate such sparsity constrained DropConnect on MNIST and CIFAR datasets in comparison with ordinary DropConnect and Dropout method. The results show that our new method improves the sparsity of features significantly, while not degrading the precision.
  • loading
  • G.E. Hinton, N. Srivastava, A. Krizhevsky, et al., “Improving neural networks by preventing co-adaptation of feature detectors”, arXiv preprint arXiv:1207.0580, 2012.
    L. Wan, et al., “Regularization of neural networks using Drop- Connect”, Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp.1058-1066, 2013.
    Y. Bengio, “Learning deep architectures for AI”, Foundations and trends in Machine Learning, Vol.2, No.1, pp.1-127, 2009
    Y. Bengio, et al., “Representation learning: A review and new perspectives”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.35, No.8, pp.1798-1828, 2013.
    A.S. Weigend, D.E. Rumelhart and B.A. Huberman, “Generalization by weight-elimination with application to forecasting”, Neural Information Processing Systems (NIPS), 1991.
    D.J.C. Mackay, “Probable networks and plausible predictions — A review of practical bayesian methods for supervised neural networks”, Network Computation in Neural Systems, Vol.6, No.3, pp.469-505, 1995.
    P. Vincent, et al., “Extracting and composing robust features with denoising autoencoders”, Proceedings of the 25th International Conference on Machine Learning, ACM, pp.1096-1103, 2008.
    P. Vincent, et al., “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion”, Proceedings of the 27th International Conference on Machine Learning, ACM, pp.3371-3408, 2010.
    Y. LeCun, L. Bottou, Y. Bengio and P. Haner, “Gradient-based learning applied to document recognition”, Proceedings of the IEEE, Vol.86, No.11, pp.2278-2324, 1998.
    Hinton, Geoffrey, S. Osindero and Yee-Whye Teh, “A fast learning algorithm for deep belief nets”, Neural computation, Vol.18, No.7, pp.1527-1554, 2006.
    D.J. Field, “What is the goal of sensory coding?”, Neural Computation, Vol.6, pp.559-601, 1994.
    P. Zhao, G. Rocha and B. Yu, “The composite absolute penalties family for grouped and hierarchical variable selection”, Annals of Statistics, Vol.37, No.6A, pp.3468-3497, 2009.
    P.O. Hoyer, “Non-negative matrix factorization with sparseness constraints”, The Journal of Machine Learning Research, Vol.5, pp.1457-1469, 2004.
    A. Krizhevsky, “Learning multiple layers of features from tiny images”, Master's Thesis, University of Toronto, 2009.
    A. Krizhevsky, “Cuda-convnet”, available at http://code.google.c om/p/cuda-convnet/, 2015-12-18.
    Torralba, Antonio, R. Fergus and W.T. Freeman, “80 million tiny images: A large data set for nonparametric object and scene recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.30, No.11, pp.1958-1970, 2008.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (278) PDF downloads(974) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return