Citation: | LI Yanshan, GUO Tianyu, LIU Xing, et al., “Action Status Based Novel Relative Feature Representations for Interaction Recognition,” Chinese Journal of Electronics, vol. 31, no. 1, pp. 168-180, 2022, doi: 10.1049/cje.2020.00.088 |
[1] |
S. S. Rautaray and A. Agrawal, “Vision based hand gesture recognition for human computer interaction: A survey,” Artificial Intelligence Review, vol.43, no.1, pp.1–54, 2015. doi: 10.1007/s10462-012-9356-9
|
[2] |
Z. Xu, C. Hu, and L. Mei, “Video structured description technology based intelligence analysis of surveillance videos for public security applications,” Multimedia Tools and Applications, vol.75, no.19, pp.12155–12172, 2016. doi: 10.1007/s11042-015-3112-5
|
[3] |
S. Yan, Y. Xiong, and D. Lin, “Social interaction discovery by statistical analysis of f-formations,” in Proc. of British Machine Vision Conference, Dundee, Scotland, pp.1−12, 2011.
|
[4] |
Y. Tian, L. Cao, Z. Liu, et al., “Hierarchical filtered motion for action recognition in crowded videos,” IEEE Trans. on Systems Man and Cybernetics, vol.42, no.3, pp.313–323, 2012. doi: 10.1109/TSMCC.2011.2149519
|
[5] |
S. Yi, X. Wang, C. Lu, et al., “L0 regularized stationary time estimation for crowd group analysis,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, pp.2211–2218, 2014.
|
[6] |
O. Aran and D. Gaticaperez, “One of a kind: Inferring personality impressions in meetings,” in Proc. of Int. Conf. on Machine Learning, Sydney, pp.11–18, 2013.
|
[7] |
G. Liu, J. Yang, and Z. Li, “Content-based image retrieval using computational visual attention model,” Pattern Recognition, vol.48, no.8, pp.2554–2566, 2015. doi: 10.1016/j.patcog.2015.02.005
|
[8] |
S. Sempena, N. U. Maulidevi, and P. R. Aryan, “Human action recognition using dynamic time warping,” in Proc. of International Conference on Electrical Engineering and Informatics, Bandung, pp.1–5, 2011.
|
[9] |
A. Manzi, L. Fiorini, R. Limosani, et al., “Two-person activity recognition using skeleton data,” IET Computer Vision, vol.12, no.1, pp.27–35, 2018. doi: 10.1049/iet-cvi.2017.0118
|
[10] |
Q. Huang, F. Pan, W. Li, et al., “Differential diagnosis of atypical hepatocellular carcinoma in contrast-enhanced ultrasound using spatio-temporal diagnostic semantics,” IEEE Journal of Biomedical and Health Informatics, vol.24, no.10, pp.2860–2869, 2020.
|
[11] |
A. Patronperez, M. Marszalek, A. Zisserman, et al., “High five: Recognising human interactions in TV shows,” in Proc. of British Machine Vision Conference, Aberystwyth, Wales, pp.1–11, 2010.
|
[12] |
C. V. Gemeren, R. Poppe, and R.C. Veltkamp, “Hands-on: Deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions,” EURASIP Journal on Image and Video Processing, vol.2018, no.1, article no.16, 2018.
|
[13] |
Y. Kong, Y. Jia, and Y. Fu, “Interactive phrases: Semantic descriptions for human interaction recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.36, no.9, pp.1775–1788, 2014. doi: 10.1109/TPAMI.2014.2303090
|
[14] |
J. Aggarwal and L. Xia, “Human activity recognition from 3d data: A review,” Pattern Recognition Letters, vol.48, pp.70–80, 2014. doi: 10.1016/j.patrec.2014.04.011
|
[15] |
F. Han, B. Reily, W. Hoff, et al., “Space-time representation of people based on 3D skeletal data: A review,” IEEE Transactions on Systems Man and Cybernetics, vol.158, pp.85–105, 2017.
|
[16] |
Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE Multimedia, vol.19, no.2, pp.4–10, 2012. doi: 10.1109/MMUL.2012.24
|
[17] |
G. Papandreou, T. Zhu, N. Kanazawa, et al., “Towards accurate multi-person pose estimation in the wild,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, pp.3711–3719, 2017.
|
[18] |
A. Kanazawa, M. J. Black, D. W. Jacobs, et al., “End-to-end recovery of human shape and pose,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, pp.7122–7131, 2018.
|
[19] |
H. Rhodin, M. Salzmann, and P. Fua, “Unsupervised geometry-aware representation for 3D human pose estimation,” in Proc. of the European Conference on Computer Vision, Munich, pp.750–767, 2018.
|
[20] |
G. Varol, D. Ceylan, B. Russell, et al., “Bodynet: Volumetric inference of 3D human body shapes,” in Proc. of the European Conference on Computer Vision, Munich, pp.20–36, 2018.
|
[21] |
A. Stergiou and R. Poppe, “Analyzing human-human interactions: A survey,” Computer Vision and Image Understanding, vol.188, article no.102799, 2019.
|
[22] |
K. Yun, J. Honorio, and D. Chattopadhyay, et al., “Two-person interaction detection using body-pose features and multiple instance learning,” in Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, pp.28–35, 2012.
|
[23] |
Y. Ji, G. Ye, and H. Cheng, “Interactive body part contrast mining for human interaction recognition,” in Proc. of International Conference on Multimedia and Expo, Chengdu, pp.1–6, 2014.
|
[24] |
Y. Ji, H. Cheng, Y. Zheng, et al., “Learning contrastive feature distribution model for interaction recognition,” J. of Visual Communication and Image Representation, vol.33, pp.340–349, 2015. doi: 10.1016/j.jvcir.2015.10.001
|
[25] |
N. Xu, A. Liu, W. Nie, et al., “Multi-modal and multi-view and interactive benchmark dataset for human action recognition,” in Proc. of ACM Multimedia, Sydney, pp.1195–1198, 2015.
|
[26] |
M. Li and H. Leung, “Multiview skeletal interaction recognition using active joint interaction graph,” IEEE Transactions on Multimedia, vol.18, no.11, pp.2293–2302, 2016. doi: 10.1109/TMM.2016.2614228
|
[27] |
H. Wu, J. Shao, X. Xu, et al., “Recognition and detection of two-person interactive actions using automatically selected skeleton features,” IEEE Transactions on Human-Machine Systems, vol.48, no.3, pp.304–310, 2018. doi: 10.1109/THMS.2017.2776211
|
[28] |
A. Shahroudy, J. Liu, T. Ng, et al., “NTU RGB+D: A large scale dataset for 3D human activity analysis,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp.1010–1019, 2016.
|
[29] |
J. Liu, A. Shahroudy, M. Perez, et al., “NTU RGB+D 120: A large-scale benchmark for 3D human activity understanding,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.42, no.10, pp.2684–2701, 2020.
|
[30] |
Y. Du, W. Wang, and L. Wang, “Hierarchical recurrent neural network for skeleton based action recognition,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, pp.1110–1118, 2015.
|
[31] |
P. Zhang, C. Lan, J. Xing and et al., “View adaptive recurrent neural networks for high performance human action recognition from skeleton data,” in Proc. of IEEE International Conference on Computer Vision, Venice, pp.2117–2126, 2017.
|
[32] |
W. Zhu, C. Lan, J. Xing, et al., “Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks,” in Proc. of AAAI Conference on Artificial Intelligence, Phoenix, Arizona, pp.3697–3703, 2016.
|
[33] |
S. Song, C. Lan, J. Xing, et al., “An end-to-end spatio-temporal attention model for human action recognition from skeleton data,” in Proc. of AAAI Conference on Artificial Intelligence, San Francisco, California, pp.4263–4270, 2017.
|
[34] |
Y. Du, Y. Fu, and L. Wang, “Skeleton based action recognition with convolutional neural network,” in Proc. of Asian Conference on Pattern Recognition, Kuala Lumpur, pp.579–583, 2015.
|
[35] |
M. Liu, H. Liu, and C. Chen, “Enhanced skeleton visualization for view invariant human action recognition,” Pattern Recognition, vol.68, pp.346–362, 2017. doi: 10.1016/j.patcog.2017.02.030
|
[36] |
R. Trabelsi, J. Varadarajan, L. Zhang, et al., “Understanding the dynamics of social interactions: A multi-modal multi-view approach,” ACM Transactions on Multimedia Computing Communications and Applications, vol.15, no.1, pp.1–16, 2019.
|
[37] |
J. Serra, Image Analysis and Mathematical Morphology, London: Academic Press, pp.184–185, 1982.
|
[38] |
K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Proc. of Neural Information Processing Systems, Montreal, pp.568–576, 2014.
|
[39] |
K. He, X. Zhang, S. Ren, et al., “Deep residual learning for image recognition,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp.770–778, 2016.
|
[40] |
M. Li and H. Leung, “Multi-view depth-based pairwise feature learning for person-person interaction recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol.78, no.5, pp.5731–5749, 2019.
|
[41] |
R. Vemulapalli, F. Arrate, and R. Chellappa, “Human action recognition by representing 3d skeletons as points in a lie group,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, pp.588–595, 2014.
|
[42] |
J. Hu, W. Zheng, J. Lai, et al., “Jointly learning heterogeneous features for RGB-D activity recognition,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, pp.5344–5352, 2015.
|