Volume 30 Issue 6
Nov.  2021
Turn off MathJax
Article Contents
WANG Liyuan, ZHANG Jing, YAO Jiacheng, et al., “Porn Streamer Recognition in Live Video Based on Multimodal Knowledge Distillation,” Chinese Journal of Electronics, vol. 30, no. 6, pp. 1096-1102, 2021, doi: 10.1049/cje.2021.07.027
Citation: WANG Liyuan, ZHANG Jing, YAO Jiacheng, et al., “Porn Streamer Recognition in Live Video Based on Multimodal Knowledge Distillation,” Chinese Journal of Electronics, vol. 30, no. 6, pp. 1096-1102, 2021, doi: 10.1049/cje.2021.07.027

Porn Streamer Recognition in Live Video Based on Multimodal Knowledge Distillation

doi: 10.1049/cje.2021.07.027
Funds:

This work is supported by the National Natural Science Foundation of China (No.61971016, No.61531006) and the Beijing Education Committee Cooperation Beijing Natural Science Foundation (No.KZ201910005007).

  • Received Date: 2020-07-08
  • Rev Recd Date: 2020-09-20
  • Available Online: 2021-09-23
  • Publish Date: 2021-11-05
  • Although deep learning has reached a higher accuracy for video content analysis, it is not satisfied with practical application demands of porn streamer recognition in live video because of multiple parameters, complex structures of deep network model. In order to improve the recognition efficiency of porn streamer in live video, a deep network model compression method based on multimodal knowledge distillation is proposed. First, the teacher model is trained with visual-speech deep network to obtain the corresponding porn video prediction score. Second, a lightweight student model constructed with MobileNetV2 and Xception transfers the knowledge from the teacher model by using multimodal knowledge distillation strategy. Finally, porn streamer in live video is recognized by combining the lightweight student model of visualspeech network with the bullet screen text recognition network. Experimental results demonstrate that the proposed method can effectively drop the computation cost and improve the recognition speed under the proper accuracy.
  • loading
  • X. Sun, L. Ma and G. Li, "Multi-vision attention networks for on-line red Jujube grading", Chinese Journal of Electronics, Vol.28, No.6, pp.1108-1117, 2019.
    D. Meng and L. Sun, "Some new trends of deep learning research", Chinese Journal of Electronics, Vol.28, No.6, pp.1087-1091, 2019.
    L. Wu, Y. Rao, H. Yu, et al., "A multi-semantics classification method based on deep learning for incredible messages on social media", Chinese Journal of Electronics, Vol.28, No.4, pp.754-763, 2019.
    J. Wehrmann, G.S. Simões, R.C. Barros, et al., "Adult content detection in videos with convolutional and recurrent neural networks", Neurocomputing, Vol.272, No.1, pp.432-438, 2018.
    N. Takahashi, M. Gygli and L.V. Gool, "AENet:Learning deep audio features for video analysis", IEEE Transactions on Multimedia, Vol.20, No.3, pp.513-524, 2018.
    Y. Cheng, D. Wang, P. Zhou, et al., "Model compression and acceleration for deep neural networks:The principles, progress, and challenges", IEEE Signal Processing Magazine, Vol.35, No.1, pp.126-136, 2018.
    P. Luo, Z. Zhu and Z. Liu, "Face model compression by distilling knowledge from neurons", AAAI Conference on Artificial Intelligence, Phoenix, USA, pp.3560-3566, 2016.
    G. Hinton, O. Vinyals and J. Dean, "Distilling the knowledge in a neural network", arXiv preprint, arXiv:1503.02531, 2015-3-9
    L. Wang, J. Zhang, Q. Tian, et al., "Porn streamer recognition in live video streaming via attention-gated multimodal deep features", IEEE Transactions on Circuits System Video Technology, Doi:10.1109/TCSVT.2019.2958871, 2019.
    L. Wang, J. Zhang, M. Wang, et al., "Multilevel fusion of multimodal deep features for porn streamer recognition in live video", Pattern Recognition Letters, Doi:10.1016/j.patrec.2020.09.027, 2020.
    J. Devlin, M.W. Chang, K. Lee, et al., "BERT:Pretraining of deep bidirectional transformers for language understanding", Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Minneapolis, USA, pp.4171-4186, 2019.
    S. Bhutani, "Keras documentation", https://keras.io/api/applications/mobilenet/#mobilenetv2-function,2020-9-29.
    F. Chollet, "Xception:Deep learning with depthwise separable convolutions", IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp.1800-1807, 2017.
    M. Sandler, A. Howard, M. Zhu, et al., "Mobilenetv2:Inverted residuals and linear bottlenecks", IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp.4510-4520, 2018.
    D. Tran, L. Bourdev, R. Fergus, et al., "Learning spatiotemporal features with 3D convolutional networks", IEEE International Conference on Computer Vision, Santiago, Chile, pp.4489-4497, 2015.
    J. Carreira and A. Zisserman, "Quo vadis, action recognition? A new model and the kinetics dataset", IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp.4724-4733, 2017.
    X. Wang, R. Girshick, A. Gupta, et al., "Non-local neural networks", IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp.7794-7803, 2018.
    D. Ghosal, M.S. Akhtar, D. Chauhan, et al., "Contextual intermodal attention for multi-modal sentiment analysis", Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp.3454-3466, 2018.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (697) PDF downloads(27) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return