LI Ying, HUANG Hongkeng, WU Zhibin. Animal Sound Recognition Based on Double Feature of Spectrogram[J]. Chinese Journal of Electronics, 2019, 28(4): 667-673. doi: 10.1049/cje.2019.04.005
Citation: LI Ying, HUANG Hongkeng, WU Zhibin. Animal Sound Recognition Based on Double Feature of Spectrogram[J]. Chinese Journal of Electronics, 2019, 28(4): 667-673. doi: 10.1049/cje.2019.04.005

Animal Sound Recognition Based on Double Feature of Spectrogram

doi: 10.1049/cje.2019.04.005
Funds:  This work is supported by the Natural Science Foundation of Fujian Province (No.2018J01793) and the National Natural Science Foundation of China (No.61075022).
  • Received Date: 2016-04-21
  • Rev Recd Date: 2019-04-02
  • Publish Date: 2019-07-10
  • Due to existence of different environments and noises, the existing method is difficult to ensure the recognition accuracy of animal sound in low Signal-to-noise (SNR) conditions. To address these problems, we propose a double feature, which consists of projection feature and Local binary pattern variance (LBPV) feature, combined with Random forest (RF) for animal sound recognition. In feature extraction, an operation of projecting is made on spectrogram to generate the projection feature. Meanwhile, LBPV feature is generated by means of accumulating the corresponding variances of all pixels for every Uniform local binary pattern (ULBP) in the spectrogram. Short-time spectral estimation algorithm is used to enhance sound signals in severe mismatched noise conditions. In the experiments, we classify 40 kinds of common animal sounds under different SNRs with rain noise, traffic noise, and wind noise. As the experimental results show, the proposed framework consisting of shorttime spectrum estimation, double feature, and RF, can recognize a wide range of animal sounds and still remains a recognition rate over 80% even under 0dB SNR.
  • loading
  • C. P. H. Elemans, K. Heeck and M. Muller, “Spectrogram analysis of animal sound production”, Bioacoustics, Vol.18, No.2, pp.183–212, 2008.
    M. Depraetere, S. Pavoine, F. Jiguet, et al.,“Monitoring animal diversity using acoustic indices: implementation in a temperate woodland”, Ecological Indicators, Vol.13, No.1, pp.46–54, 2012.
    M. Towsey, B. Planitz, A. Nantes, et al.,“A toolbox for animal call recognition”, Bioacoustics, Vol.21, No.2, pp.107-125, 2012.
    T. A. Marques, L. Thomas, S. W. Martin, et al.,“Estimating animal population density using passive acoustics”, Biological Reviews, Vol.88, No.2, pp.287–309, 2013.
    J. Wang, C. Lin, B, Chen, et al.,“Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation”, IEEE Trans. Autom. Sci. Eng., Vol.11, no. 2, pp.607–613, Apr. 2014.
    S. Ou, P. Song and Y. Gao, “Soft decision based gaussianLaplacian combination model for noisy speech enhancement”, Chinese Journal of Electronics, Vol.27, No.4, pp.827–834, 2018.
    J. Wei and Y. Li, “Rapid bird sound recognition using antinoise texture features and random forest”, Acta Electronica Sinica, Vol.43, No.1, pp.185–190, 2015. (in Chinese)
    Y, Li and J. Yin, “Sound event detection at low SNR based on multi-random forests”, Acta Electronica Sinica, Vol.46, No.11, pp.2705–2713, 2018. (in Chinese)
    Y. Li, Q. Wang, X. Zhang, et al.,“Audio events clustering based on agglomerative information bottleneck”, Acta Electronica Sinica, Vol.45, No.5, pp.1064–1071, 2017. (in Chinese)
    J. Dennis, H. D. Tran and E. S. Chng, “Image feature representation of the subband power distribution for robust sound event classification”, IEEE Trans. Audio, Speech, Lang. Process., Vol.21, No.2, pp.367–377, 2013.
    X. Liu and Y. Gao, “Speech enhancement algorithm with leading-in delay”, Modern Electronic Technology, Vol.34, No.5, pp.85–88, 2011. (in Chinese).
    Z. Guo, Z. Lei and D. Zhang, “Rotation invariant texture classification using LBP variance (LBPV) with global matching”, Pattern Recognition, Vol.43, No.3, pp.707–719, 2010.
    T. Ojala, P. Matti and T. Maenpaa, “Multiresolution grayscale and rotation invariant texture classification with local binary patterns”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.24, No.7, pp.971–987, 2002.
    K. M. Chang and S. H. Liu, “Gaussian noise filtering from ECG by Wiener filter and ensemble empirical mode decomposition”, Journal of Signal Processing Systems, Vol.64, No.2, pp.249–264, 2011.
    K. Paliwal, K. Wójcicki and B. Schwerin, “Single-channel speech enhancement using spectral subtraction in the shorttime modulation domain”, Speech Communication, Vol.52, No.5, pp.450–475, 2010.
    G. Roma, P. Herrera and X. Serra, “Characterization of the Freesound online community”, Proc. of 3rd int. Workshop Cognitive Inf. Process., Barcelona, Spain, pp.1–6, 2012.
    T. Ojala, P. Matti and D. Harwood, “A comparative study of texture measures with classification based on featured distributions”, Pattern Recognition, Vol.29, No.1, pp.51–59, 1996.
    A. Rakotomamonjy and G. Gasso, “Histogram of gradients of time-frequency representations for audio scene classification”, IEEE Trans. Audio, Speech, Lang. Process., Vol.23, No.1, PP.142–153, 2015.
  • 加载中


    通讯作者: 陈斌,
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (191) PDF downloads(391) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint