LI Fei, GAO Xiaoguang, WAN Kaifang. Training Restricted Boltzmann Machine Using Gradient Fixing Based Algorithm[J]. Chinese Journal of Electronics, 2018, 27(4): 694-703. doi: 10.1049/cje.2018.05.007
Citation: LI Fei, GAO Xiaoguang, WAN Kaifang. Training Restricted Boltzmann Machine Using Gradient Fixing Based Algorithm[J]. Chinese Journal of Electronics, 2018, 27(4): 694-703. doi: 10.1049/cje.2018.05.007

Training Restricted Boltzmann Machine Using Gradient Fixing Based Algorithm

doi: 10.1049/cje.2018.05.007
Funds:  This work is supported by the National Natural Science Foundation of China (No.61573285, No.61305133) and the Fundamental Research Funds for the Central Universities (No.3102015BJ(II)GH01, No.3102016CG002).
More Information
  • Corresponding author: GAO Xiaoguang (corresponding author) was born in 1957. She received the Ph.D. degree in electronic engineering from Northwestern Polytechnical University, China. She is a professor of systems engineering. Her research interests include system simulation and machine learning. (Email:CXG2012@nwpu.edu.cn)
  • Received Date: 2016-08-30
  • Rev Recd Date: 2017-04-14
  • Publish Date: 2018-07-10
  • Most of the algorithms for training restricted Boltzmann machines (RBM) are based on Gibbs sampling. When the sampling algorithm is used to calculate the gradient, the sampling gradient is the approximate value of the true gradient and there is a big error between the sampling gradient and the true gradient, which seriously affects the training effect of the network. Aiming at this problem, this paper analysed the numerical error and orientation error between the approximate gradient and the true gradient. Their influence on the performance of network training is given then. An gradient fixing model was established. It was designed to adjust the numerical value and orientation of the approximate gradient and reduce the error. We also designed gradient fixing based Gibbs sampling training algorithm (GFGS) and gradient fixing based parallel tempering algorithm (GFPT), and the comparison experiment of the novel algorithms and the existing algorithms is given. It has been demonstrated that the new algorithms can effectively tackle the issue of gradient error, and can achieve higher training accuracy at a reasonable expense of computational runtime.
  • loading
  • G.E. Hinton and R.R. Salakhutdinov, "Reducing the dimensionality of data with neural networks", Science, Vol.313, No.5786, pp.504-507, 2006.
    R. Salakhutdinov and G.E. Hinton, "Deep Boltzmann machines", Proc. of the 12th International Conference on Artificial Intelligence and Statistics, Vol.5, pp.448-455, 2009b.
    Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, "Gradientbased learning applied to document recognition", Proceedings of the IEEE, Vol.86, No.11, pp.2278-324, 1998.
    L. Deng, M. Seltzer, D. Yu, A. Acero, A. Mohamed and G. Hinton, " Binary coding of speech spectrograms using a deep autoencoder", Proc. of Eleventh Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, 2013.
    N. Le Roux, N. Heess, J. Shotton and J.M. Winn, "Learning a generative model of images by factoring appearance and shape", Neural Computation, Vol.23, No.3, pp.593-650, 2011.
    L. Deng, O. Abdel-Hamid and D. Yu, "A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion", Proc. of the International Conference on Acoustics Speech and Signal Processing, Vancouver, Canada, pp.6669-6673, 2013.
    L. Deng, "Design and learning of output representations for speech recognition", Proc. of Advances in Neural Information Processing Systems Workshop on Learning Output Representations, Vol.35, No.2, pp.156-162, 2013.
    L. Deng, G. Hinton and B. Kingsbury, "New types of deep neural network learning for speech recognition and related applications:An overview", Proc. of International Conference on Acoustics Speech and Signal Processing, 2013b.
    J. feng, G. Jin-an and ZH. Dao, "Deception detection study based on PCANet and support vector machine", Acta Electronica Sinica, Vol.44, No.8, pp.1969-1973, 2016.
    D. Li and Y. Dong, "Deep learning:methods and applications", Foundations and Trends? in Signal Processing, Vol.7, No.3, pp.197-387, 2014.
    L. Zifeng, J. Xiaojun and W. Xiaohan, "DropConnect regularization method with sparsity constraint for neural networks", Chinese Journal of Electronics, Vol.25, No.1, pp.152-158, 2016.
    L. Pengchao, P. Liangrui and W. Juan, "Rejecting character recognition errors using CNN based confidence estimation", Chinese Journal of Electronics, Vol.25, No.3, pp.520-526, 2016.
    D. Xiangyu and M. Yide, "PCNN model analysis and its automatic parameters determination in image segmentation and edge detection", Chinese Journal of Electronics, Vol.23, No.1, pp.97-103, 2014.
    K. Swersky, B. Chen, B. Marlin and N. de Freitas, "A tutorial on stochastic approximation algorithms for training restricted Boltzmann machines and deep belief nets", Proc. of Information Theory and Applications Workshop, La Jolla, California, USA, pp.1-10, 2010.
    G.E. Hinton and R.R. Salakhutdinov, "Reducing the dimensionality of data with neural networks", Science, Vol.313, No.5786, pp.504-507, 2006.
    D. Erhan, A. Courville, Y. Bengio and P. Vincent. "Why does unsupervised pre-training help deep learning?", Proc. of the 13th International Conference on Artificial Intelligence and Statistics, Lauderdale, USA, Vol.9, pp.201-208, 2010.
    G.E. Hinton, "Training products of experts by minimizing contrastive divergence", Neural Computation, Vol.14, No.8, pp.1771-1800, 2002.
    A. Fischer and C. Igel, "Bounding the bias of contrastive divergence learning", Neural Computation, Vol.23, No.3, pp.664-673, 2011a.
    T. Tieleman, "Training restricted Boltzmann machines using approximations to the likelihood gradient", Proc. the 25th International Conference on Machine learning, Helsinki, Finland, pp.1064-1071, ACM, 2008.
    T. Tieleman and G.E. Hinton, "Using fast weights to improve persistent contrastive divergence", Proc. the 26th International Conference on Machine Learning (ICML), Montreal, Canada, pp.1033-1040, ACM, 2009.
    A. Fischer and C. Igel, "Parallel tempering, importance sampling, and restricted Boltzmannmachines", Proc. of The 5th Workshop on Theory of Randomized Search Heuristics, Copenhagen, Denmark, 2011b.
    G. Desjardins, A. Courville and Y. Bengio, "Adaptive parallel tempering for stochastic maximum likelihood learning of RBMs", Proc. of Advances in Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, Hyatt Regency, Vancouver, CANADA, 2010.
    K. Cho, T. Raiko and A. Ilin, "Parallel tempering is efficient for learning restricted Boltzmann machines", Proc. the International Joint Conference on Neural Networks, Barcelona, Spain, pp.3246-3253, 2010.
    P. Brakel, S. Dieleman and B. Schrauwen, "Training restricted Boltzmann machines with multi-tempering:Harnessing parallelization", Proc. of European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, pp.287-292, 2012.
    G. Desjardins, A. Courville, Y. Bengio, P. Vincent and O. Dellaleau, "Tempered Markov chain Monte Carlo for training of restricted Boltzmann machines", Proc. of the 13th International Workshop on Artificial Intelligence and Statistics, Sardinia, Italy, Vol.9, pp.45-152, 2010b.
    A. Fischer and C. Igel, "Training restricted Boltzmann machines:An introduction", Pattern Recognition, Vol.47, No.1, pp.25-39, 2014.
    Y. Bengio and O. Delalleau, "Justifying and generalizing contrastive divergence", Neural Computation, Vol.21, No.6, pp.1601-1621, 2009.
    N. Morgan, "Deep and wide:Multiple layers in automatic speech recognition", IEEE Transactions on Audio, Speech, & Language Processing, Vol.20, No.1, pp.7-13, 2012.
    Y. LeCun L. Bottou Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition", Proceedings of the IEEE, Vol.86, No.11, pp.2278-2324, 1998a.
    Y. LeCun, F.J. Huang and L. Bottou, "Learning methods for generic object recognition with invariance to pose and lighting", Proc. of Computer Vision and Pattern Recognition, Washington, DC, USA, Vol.2, pp.Ⅱ-104, 2004.
    A. Krizhevsky, et al., "Learning multiple layers of features from tiny images", available at https://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf, 2018-3-15.
    A. Torralba, R. Fergus and W.T. Freeman, "80 million tiny images:A large dataset for non-parametric object and scene recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.30, No.11, pp.1958-1970, 2008.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (158) PDF downloads(159) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return