Porn Streamer Recognition in Live Video Based on Multimodal Knowledge Distillation

WANG Liyuan; ZHANG Jing; YAO Jiacheng; ZHUO Li

doi:10.1049/cje.2021.07.027

Volume 30 Issue 6

Nov. 2021

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Electronics > 2021 > 30(6): 1096-1102

WANG Liyuan, ZHANG Jing, YAO Jiacheng, et al., “Porn Streamer Recognition in Live Video Based on Multimodal Knowledge Distillation,” Chinese Journal of Electronics, vol. 30, no. 6, pp. 1096-1102, 2021, doi: 10.1049/cje.2021.07.027

Citation:

WANG Liyuan, ZHANG Jing, YAO Jiacheng, et al., “Porn Streamer Recognition in Live Video Based on Multimodal Knowledge Distillation,” Chinese Journal of Electronics, vol. 30, no. 6, pp. 1096-1102, 2021, doi: 10.1049/cje.2021.07.027

Citation:

PDF( 1590 KB)

Porn Streamer Recognition in Live Video Based on Multimodal Knowledge Distillation

doi: 10.1049/cje.2021.07.027

WANG Liyuan^1,2,
ZHANG Jing^{1,2
,
,},
YAO Jiacheng^1,2,
ZHUO Li^1,2

1. Faculty of Information, Beijing University of Technology, Beijing 100124, China;
2. Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China

Funds:

This work is supported by the National Natural Science Foundation of China (No.61971016, No.61531006) and the Beijing Education Committee Cooperation Beijing Natural Science Foundation (No.KZ201910005007).

Received Date: 2020-07-08
Rev Recd Date: 2020-09-20

Available Online: 2021-09-23

Publish Date: 2021-11-05

Abstract

Abstract

Although deep learning has reached a higher accuracy for video content analysis, it is not satisfied with practical application demands of porn streamer recognition in live video because of multiple parameters, complex structures of deep network model. In order to improve the recognition efficiency of porn streamer in live video, a deep network model compression method based on multimodal knowledge distillation is proposed. First, the teacher model is trained with visual-speech deep network to obtain the corresponding porn video prediction score. Second, a lightweight student model constructed with MobileNetV2 and Xception transfers the knowledge from the teacher model by using multimodal knowledge distillation strategy. Finally, porn streamer in live video is recognized by combining the lightweight student model of visualspeech network with the bullet screen text recognition network. Experimental results demonstrate that the proposed method can effectively drop the computation cost and improve the recognition speed under the proper accuracy.
- Live video,
- Porn streamer recognition,
- Multimodal,
- Knowledge distillation,
- Lightweight student model

FullText(HTML)

References(18)

References

X. Sun, L. Ma and G. Li, "Multi-vision attention networks for on-line red Jujube grading", Chinese Journal of Electronics, Vol.28, No.6, pp.1108-1117, 2019.

D. Meng and L. Sun, "Some new trends of deep learning research", Chinese Journal of Electronics, Vol.28, No.6, pp.1087-1091, 2019.

L. Wu, Y. Rao, H. Yu, et al., "A multi-semantics classification method based on deep learning for incredible messages on social media", Chinese Journal of Electronics, Vol.28, No.4, pp.754-763, 2019.

J. Wehrmann, G.S. Simões, R.C. Barros, et al., "Adult content detection in videos with convolutional and recurrent neural networks", Neurocomputing, Vol.272, No.1, pp.432-438, 2018.

N. Takahashi, M. Gygli and L.V. Gool, "AENet:Learning deep audio features for video analysis", IEEE Transactions on Multimedia, Vol.20, No.3, pp.513-524, 2018.

Y. Cheng, D. Wang, P. Zhou, et al., "Model compression and acceleration for deep neural networks:The principles, progress, and challenges", IEEE Signal Processing Magazine, Vol.35, No.1, pp.126-136, 2018.

P. Luo, Z. Zhu and Z. Liu, "Face model compression by distilling knowledge from neurons", AAAI Conference on Artificial Intelligence, Phoenix, USA, pp.3560-3566, 2016.

G. Hinton, O. Vinyals and J. Dean, "Distilling the knowledge in a neural network", arXiv preprint, arXiv:1503.02531, 2015-3-9

L. Wang, J. Zhang, Q. Tian, et al., "Porn streamer recognition in live video streaming via attention-gated multimodal deep features", IEEE Transactions on Circuits System Video Technology, Doi:10.1109/TCSVT.2019.2958871, 2019.

L. Wang, J. Zhang, M. Wang, et al., "Multilevel fusion of multimodal deep features for porn streamer recognition in live video", Pattern Recognition Letters, Doi:10.1016/j.patrec.2020.09.027, 2020.

J. Devlin, M.W. Chang, K. Lee, et al., "BERT:Pretraining of deep bidirectional transformers for language understanding", Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, Minneapolis, USA, pp.4171-4186, 2019.

S. Bhutani, "Keras documentation", https://keras.io/api/applications/mobilenet/#mobilenetv2-function,2020-9-29.

F. Chollet, "Xception:Deep learning with depthwise separable convolutions", IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp.1800-1807, 2017.

M. Sandler, A. Howard, M. Zhu, et al., "Mobilenetv2:Inverted residuals and linear bottlenecks", IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp.4510-4520, 2018.

D. Tran, L. Bourdev, R. Fergus, et al., "Learning spatiotemporal features with 3D convolutional networks", IEEE International Conference on Computer Vision, Santiago, Chile, pp.4489-4497, 2015.

J. Carreira and A. Zisserman, "Quo vadis, action recognition? A new model and the kinetics dataset", IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp.4724-4733, 2017.

X. Wang, R. Girshick, A. Gupta, et al., "Non-local neural networks", IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp.7794-7803, 2018.

D. Ghosal, M.S. Akhtar, D. Chauhan, et al., "Contextual intermodal attention for multi-modal sentiment analysis", Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp.3454-3466, 2018.

Relative Articles

Supplements(0)

Cited By

Proportional views