Volume 29 Issue 6
Dec.  2020
Turn off MathJax
Article Contents
YUAN Yong, CHEN Chen, HU Xiyuan, et al., “CNQ: Compressor-Based Non-uniform Quantization of Deep Neural Networks,” Chinese Journal of Electronics, vol. 29, no. 6, pp. 1126-1133, 2020, doi: 10.1049/cje.2020.09.014
Citation: YUAN Yong, CHEN Chen, HU Xiyuan, et al., “CNQ: Compressor-Based Non-uniform Quantization of Deep Neural Networks,” Chinese Journal of Electronics, vol. 29, no. 6, pp. 1126-1133, 2020, doi: 10.1049/cje.2020.09.014

CNQ: Compressor-Based Non-uniform Quantization of Deep Neural Networks

doi: 10.1049/cje.2020.09.014
Funds:  This work is supported by the National Natural Science Foundation of China (No.61906194, No.61571438).
More Information
  • Corresponding author: CHEN Chen (corresponding author) received the M.S. and Ph.D. degrees in computer science from University of Copenhagen, Denmark, in 2011 and 2013, respectively. She is currently an assistant professor in Institute of Automation, Chinese Academy of Sciences, Beijing, China. Her research interests focus on pattern recognition and machine learning. (Email:chen.chen@ia.ac.cn)
  • Received Date: 2019-10-31
  • Publish Date: 2020-12-25
  • Deep neural networks (DNNs) have achieved state-of-the-art performance in a number of domains but suffer intensive complexity. Network quantization can effectively reduce computation and memory costs without changing network structure, facilitating the deployment of DNNs on mobile devices. While the existing methods can obtain good performance, low-bit quantization without time-consuming training or access to the full dataset is still a challenging problem. In this paper, we develop a novel method named Compressorbased non-uniform quantization (CNQ) method to achieve non-uniform quantization of DNNs with few unlabeled samples. Firstly, we present a compressor-based fast nonuniform quantization method, which can accomplish nonuniform quantization without iterations. Secondly, we propose to align the feature maps of the quantization model with the pre-trained model for accuracy recovery. Considering the property difference between different activation channels, we utilize the weighted-entropy perchannel to optimize the alignment loss. In the experiments, we evaluate the proposed method on image classification and object detection. Our results outperform the existing post-training quantization methods, which demonstrate the effectiveness of the proposed method.
  • loading
  • Y. Liu, H. Liu, J. Fan, et al., "A survey of research and application of small object detection based on deep learning", Chinese Journal of Electronics, Vol.48, No.3, pp.590-601, 2020.
    M. Courbariaux, Y. Bengio and J.P. David, "BinaryConnect:Training deep neural networks with binary weights during propagations", Proceedings of the International Conference on Neural Information Processing Systems, Lille, France, pp.3123-3131, 2015.
    I. Hubara, D. Soudry and R.E Yaniv, "Binarized neural networks", Advances in Neural Information Processing Systems, Barcelona, Spain, pp.4107-4115, 2016.
    M. Rastegari, V. Ordonez and J. Redmon, "XNOR-Net:ImageNet classification using binary convolutional neural networks", Proceedings of the European Conference on Computer Vision, Amsterdam, Netherlands, pp.525-542, 2016.
    S. Zhou, Y. Wu, Z. Ni, et al., "Dorefa-net:Training low bitwidth convolutional neural networks with low bitwidth gradients", arXiv preprint, arXiv:1606.06160, 2016.
    Q. Jian, P. Zhang and X. Wang, "An FPGA implementation method for configurable CNN co-accelerator", Chinese Journal of Electronics, Vol.47, No.7, pp.1525-1531, 2019.
    S. Han, H. Mao and W.J. Dally, "Deep compression:Compressing deep neural networks with pruning, trained quantization and huffman coding", Proceedings of the International Conference on Learning Representations, 2016.
    E. L. Denton, W. Zaremba, J. Bruna, et al., "Exploiting linear structure within convolutional networks for efficient evaluation", Advances in Neural Information Processing Systems, Montreal, QC, Canada, pp.1269-1277, 2014.
    A.G. Howard, M. Zhu, B.Chen, et al., "Mobilenets:Efficient convolutional neural networks for mobile vision applications", arXiv preprint, arXiv:1704.04861, 2017.
    G. Hinton, O. Vinyals and J. Dean, "Distilling the knowledge in a neural network", arXiv preprint, arXiv:1503.02531, 2015.
    A. Romero, N. Ballas, S.E. Kahou, et al., "Fitnets:Hints for thin deep nets", International Conference on Learning Representations, 2015.
    A. Howard, M. Sandler, G. Chu, et al., "Searching for mobilenetv3", Proceedings of the IEEE International Conference on Computer Vision, pp.1314-1324, 2019.
    S. Migacz, "8-bit inference with TensorRT", GPU Technology Conference, San Jose, CA, USA, Page 7, 2017.
    B. Ron, N. Yury, H. Elad, et al., "Post training 4-bit quantization of convolution networks for rapid-deployment", Advances in Neural Information Processing Systems, Vancouver, Canada, pp.7948-7956, 2019.
    X. He and J. Cheng, "Learning compression from limited unlabeled data", Proceedings of the European Conference on Computer Vision, Munich, Germany, pp.752-769, 2018.
    Y. Choukroun, E. Kravchik and P. Kisilev, "Low-bit quantization of neural networks for efficient inference", Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019.
    E. Park, J. Ahn and S. Yoo, "Weighted-entropy-based quantization for deep neural networks", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, pp.5456-5464, 2017.
    Y. Wei, X. Pan, H. Qin, et al., "Quantization mimic:Towards very tiny cnn for object detection", Proceedings of the European Conference on Computer Vision, Munich, Germany, pp.267-283, 2018.
    B. Zhuang, C. Shen, M. Tan, et al., "Towards effective low-bitwidth convolutional neural networks", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA, pp.7920-7928, 2018.
    N.S. Jayant and P. Noll, "Digital coding of waveforms:Principles and applications to speech and video", Englewood Cliffs, NJ, pp.115-251, 1984.
    N. Judell and L. Scharf, "A simple derivation of Lloyd's classical result for the optimum scalar quantizer", IEEE Transactions on Information Theory, Vol.32, No.2, pp.326-328, 1986.
    Y. Yuan, C. Chen, X. Hu, et al., "Unlabeled data driven channel-wise bit-width allocation and quantization refinement", International Conference on Neural Information Processing, Sydney, Australia, pp.9-16, 2019.
    W. Liu, D.Anguelov, D. Erhan, et al., "SSD:Single shot multibox detector", Proceedings of the European Conference on Computer Vision, Amsterdam, Netherlands, pp.21-37, 2016.
    S. Liu and D. Huang, "Receptive field block net for accurate and fast object detection", Proceedings of the European Conference on Computer Vision, Munich, Germany, pp.385-400, 2018.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (733) PDF downloads(93) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return