LIU Wu and MA Huadong, “Hybrid Semantic Concept Temporal Pooling for Large-Scale Video Event Analysis,” Chinese Journal of Electronics, vol. 26, no. 6, pp. 1125-1131, 2017, doi: 10.1049/cje.2017.09.010
Citation: LIU Wu and MA Huadong, “Hybrid Semantic Concept Temporal Pooling for Large-Scale Video Event Analysis,” Chinese Journal of Electronics, vol. 26, no. 6, pp. 1125-1131, 2017, doi: 10.1049/cje.2017.09.010

Hybrid Semantic Concept Temporal Pooling for Large-Scale Video Event Analysis

doi: 10.1049/cje.2017.09.010
Funds:  This work is supported by the National Natural Science Foundation of China (No.61602049), the National Key Research and Development Plan (No.2016YFC0801005), the Funds for Creative Research Groups of China (No.61421061), the Beijing Training Project for the Leading Talents in S&T (No.ljrc 201502), and the CCF-Tencent Open Research Fund (No.AGR20160113).
  • Received Date: 2017-02-06
  • Rev Recd Date: 2017-07-28
  • Publish Date: 2017-11-10
  • To solve the task of detecting and recounting events in videos with limited training examples, we propose a novel two-stage hybrid concept temporal pooling approach that is aware of potential concept drift in the video stream. We initially partition videos into temporal pyramids consisting of keyframes. Semantic concepts in keyframes is detected, which enables us to derive aggregated detection scores for each temporal pyramid using average-pooling and ultimately for the entire video via max-pooling. Owing to this refined hybrid pooling, our method yields more discriminative semantic representations with respect to the event query. We also develop an effective filtering strategy to cope with noisy concept detectors to robustify the textual description generation in recounting. Experiments on the large scale TRECVID MEDTest dataset demonstrate our method improves the accuracies over state-of-the-art methods, both for event detection and recounting.
  • loading
  • W. Liu, T. Mei, et al., "Instant mobile video search with layered audio-video indexing and progressive transmission", IEEE Transactions on Multimedia, Vol.16, No.8, pp.2242-2255, 2014.
    C. Yan, Y. Zhang, et al., "A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors", IEEE Signal Process. Lett., Vol.21, No.5, pp.573-576, 2014.
    C. Yan, Y. Zhang, F. Dai, et al., "Efficient parallel framework for HEVC deblocking filter on many-core platform", Data Compression Conference, pp.530, 2013.
    C. Gan, N. Wang, Y. Yang, et al., "Devnet:A deep event network for multimedia event detection and evidence recounting", IEEE Conference on Computer Vision and Pattern Recognition, pp.2568-2577, 2015.
    H. Wang and C. Schmid, "Action recognition with improved trajectories", International Conference on Computer Vision, pp.3551-3558, 2013.
    C.C. Yan, Y. Zhang, J. Xu, et al., "Efficient parallel framework for HEVC motion estimation on many-core processors", IEEE Trans. Circuits Syst. Video Techn., Vol.24, No.12, pp.2077-2089, 2014.
    J. Liu, Q. Yu, O. Javed, et al., "Video event recognition using concept attributes", IEEE Workshop on Applications of Computer Vision, pp.339-346, 2013.
    W. Liu, T. Mei, Y. Zhang, et al., "Multi-task deep visualsemantic embedding for video thumbnail selection", IEEE Conference on Computer Vision and Pattern Recognition, pp.3707-3715, 2015.
    Y.-G. Jiang, S. Bhattacharya, et al., "High-level event recognition in unconstrained videos", International Journal of Multimedia Information Retrieval, Vol.2, No.2, pp.73-101, 2013.
    Y.S. Zhang, Y.F. Zhang, et al., "Adaptive resource allocation with svm-based multi-hop video packet delay bound violation modeling", Chinese Journal of Electronics, Vol.20, No.2, pp.261-267, 2011.
    B. Wei, Z. Yin, Y. Jie, et al., "A novel approach to text detection and extraction from videos by discriminative features and density", Chinese Journal of Electronics, Vol.23, No.2, pp.322-328, 2014.
    W. Liu, Y. Zhang, S. Tang, J. Tang, et al., "Accurate estimation of human body orientation from RGB-D sensors", IEEE Transactions on Cybernetics, Vol.43, No.5, pp.1442-1452, 2013.
    V. Vovk, "Kernel ridge regression", Empirical Inference, pp.105-116, 2013.
    D.G. Lowe, "Distinctive image features from scale-invariant keypoints", International Journal of Computer Vision, Vol.60, No.2, pp.91-110, 2004.
    L. Liu, L.F. Sun and S.Q. Yang, "Web video duplicate detection basedon video vocabulary", Chinese Journal of Electronics, Vol.18, No.1, pp.25-30, 2009.
    I. Laptev, "On space-time interest points", International Journal of Computer Vision, Vol.64, No.2-3, pp.107-123, 2005.
    D. Oneata, J. Verbeek, et al., "Action and event recognition with fisher vectors on a compact feature set", International Conference on Computer Vision, pp.1817-1824, 2013.
    C. Sun and R. Nevatia, "DISCOVER:Discovering important segments for classification of video events and recounting", IEEE Conference on Computer Vision and Pattern Recognition, pp.2569-2576, 2014.
    M. Mazloom, E. Gavves, et al., "Searching informative concept banks for video event detection", ACM International Conference on Multimedia Retrieval, pp.255-262, 2013.
    A. Habibian and C. G. Snoek, "Recommendations for recognizing video events by concept vocabularies", Computer Vision and Image Understanding, Vol.124, pp.110-122, 2014.
    X. Wang, T. Zha, C. Wu, et al., "Text semantics based automatic summarization for chinese videos", Chinese Journal of Electronics, Vol.24, No.3, pp.462-467, 2015.
    C. Gan, C. Sun and R. Nevatia, "DECK:discovering event composition knowledge from web images for zero-shot event detection and recounting in videos", AAAI Conference on Artificial Intelligence, pp.4032-4038, 2017.
    C. Gan, T. Yang and B. Gong, "Learning attributes equals multi-source domain generalization", IEEE Conference on Computer Vision and Pattern Recognition, pp.87-97, 2016.
    L.-J. Li, H. Su, L. Fei-Fei and E.P. Xing, "Object bank:A highlevel image representation for scene classification & semantic feature sparsification", Conference on Neural Information Processing Systems, pp.1378-1386, 2010.
    L. Torresani, M. Szummer and A. Fitzgibbon, "Efficient object category recognition using classemes", European Conference on Computer Vision, pp.776-789, 2010.
    S. Sadanand and J.J. Corso, "Action bank:A high-level representation of activity in video", IEEE Conference on Computer Vision and Pattern Recognition, pp.1234-1241, 2012.
    F. Tian and X. Shen, "Learning semantic concepts from noisy media collection for automatic image annotation", Chinese Journal of Electronics, Vol.24, No.4, pp.790-794, 2015.
    C. Sun, B. Burns, et al., "ISOMER:Informative segment observations for multimedia event recounting", ACM International Conference on Multimedia Retrieval, pp.241-241, 2014.
    K.-T. Lai, X.Y. Felix, M.-S. Chen, et al., "Video event detection by inferring temporal instance labels", IEEE Conference on Computer Vision and Pattern Recognition, pp.2251-2258, 2014.
    T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, et al., "Distributed representations of words and phrases and their compositionality", Conference on Neural Information Processing Systems, pp.3111-3119, 2013.
    A. Krizhevsky, I. Sutskever, et al., "Imagenet classification with deep convolutional neural networks", Conference on Neural Information Processing Systems, pp.1097-1105, 2012.
    J. Deng, W. Dong, R. Socher, et al., "Imagenet:A large-scale hierarchical image database", IEEE Conference on Computer Vision and Pattern Recognition, pp.248-255, 2009.
    A. Habibian, T. Mensink and C.G. Snoek, "Videostory:A new multimedia embedding for few-example recognition and translation of events", ACM Multimedia, pp.17-26, 2014.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (358) PDF downloads(275) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return