ZHANG Pengyuan, CHEN Hangting, BAI Haichuan, et al., “Deep Scattering Spectra with Deep Neural Networks for Acoustic Scene Classification Tasks,” Chinese Journal of Electronics, vol. 28, no. 6, pp. 1177-1183, 2019, doi: 10.1049/cje.2019.07.006
Citation: ZHANG Pengyuan, CHEN Hangting, BAI Haichuan, et al., “Deep Scattering Spectra with Deep Neural Networks for Acoustic Scene Classification Tasks,” Chinese Journal of Electronics, vol. 28, no. 6, pp. 1177-1183, 2019, doi: 10.1049/cje.2019.07.006

Deep Scattering Spectra with Deep Neural Networks for Acoustic Scene Classification Tasks

doi: 10.1049/cje.2019.07.006
Funds:  This work is supported by the National Natural Science Foundation of China (No.11590774, No.11590770), the Key Science and Technology Project of the Xinjiang Uygur Autonomous Region (No.2016A03007-1), and the Pre-research Project for Equipment of General Information System (No.JZX2017-0994/Y306).
  • Received Date: 2018-01-15
  • Rev Recd Date: 2018-08-02
  • Publish Date: 2019-11-10
  • As one of the most commonly used features, Mel-frequency cepstral coefficients (MFCCs) are less discriminative at high frequency. A novel technique, known as Deep scattering spectrum (DSS), addresses this issue and looks to preserve greater details. DSS feature has shown promise both on classification and recognition tasks. In this paper, we extend the use of DSS feature for acoustic scene classification task. Results on Detection and classification of acoustic scenes and events (DCASE) 2016 and 2017 show that DSS provided 4.8% and 17.4% relative improvements in accuracy over MFCC features, within a state-of-the-art time delay neural network framework.
  • loading
  • A. Mesaros, T. Heittola, A. Diment, et al., "DCASE 2017 challenge setup:Tasks, datasets and baseline system", Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), pp.85-92, 2017.
    A. Mesaros, T. Heittola and T. Virtanen, "TUT database for acoustic scene classification and sound event detection", 2016 24th European Signal Processing Conference (EUSIPCO), pp.1128-1132, 2016.
    J. Andén and S. Mallat, "Deep scattering spectrum", IEEE Transactions on Signal Processing, Vol.62, No.16, pp.4114-4128, 2014.
    T. N. Sainath, V. Peddinti, B. Kingsbury, et al., "Deep scattering spectra with deep neural networks for LVCSR tasks", Fifteenth Annual Conference of the International Speech Communication Association, pp.900-904, 2014.
    X. Chen and P. J. Ramadge, "Music genre classification using multiscale scattering and sparse representations", 2013 47th Annual Conference on Information Sciences and Systems (CISS), pp.1-6, 2013.
    J. Li, W. Dai, F. Metze, et al., "A comparison of deep learning methods for environmental sound detection", 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.126-130, 2017.
    J. Schröder, J. Anemüller and S. Goetze, "Performance comparison of GMM, HMM and DNN based approaches for acoustic event detection within task 3 of the DCASE 2016 challenge", Proc. Workshop Detect. Classification Acoust. Scenes Events, pp.80-84, 2016.
    S. S. Stevens, J. Volkmann and E. B. Newman, "A scale for the measurement of the psychological magnitude pitch", the Journal of the Acoustical Society of America, Vol.8, No.3, pp.185-190, 1937.
    J. Andén and S. Mallat, "Multiscale scattering for audio classification", 12th International Society for Music Information Retrieval Conference, pp.657-662, 2011.
    S. Young, G. Evermann, M. Gales, et al., the HTK Book (V3.4), Cambridge University, 2006.
    R. Cai, L. Lu, H. J. Zhang, et al., "Highlight sound effects detection in audio stream", 2003 International Conference on Multimedia and Expo. ICME'03. Proceedings, Vol.3, pp.III-37, 2003.
    A. Waibel, T. Hanazawa, G. Hinton, et al., "Phoneme recognition using time-delay neural networks", IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol.37, No.3, pp.328-339, 1989.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (627) PDF downloads(118) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return