Animal Sound Recognition Based on Double Feature of Spectrogram

LI Ying; HUANG Hongkeng; WU Zhibin

doi:10.1049/cje.2019.04.005

Volume 28 Issue 4

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Electronics > 2019 > 28(4): 667-673

LI Ying, HUANG Hongkeng, WU Zhibin, “Animal Sound Recognition Based on Double Feature of Spectrogram,” Chinese Journal of Electronics, vol. 28, no. 4, pp. 667-673, 2019, doi: 10.1049/cje.2019.04.005

Citation:

LI Ying, HUANG Hongkeng, WU Zhibin, “Animal Sound Recognition Based on Double Feature of Spectrogram,” Chinese Journal of Electronics, vol. 28, no. 4, pp. 667-673, 2019, doi: 10.1049/cje.2019.04.005

Citation:

PDF( 770 KB)

Animal Sound Recognition Based on Double Feature of Spectrogram

doi: 10.1049/cje.2019.04.005

1.
College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China;
2.
Key Laboratory of Information Security of Network Systems(Fuzhou University), Fuzhou 350116, China

Funds: This work is supported by the Natural Science Foundation of Fujian Province (No.2018J01793) and the National Natural Science Foundation of China (No.61075022).

Received Date: 2016-04-21
Rev Recd Date: 2019-04-02
Publish Date: 2019-07-10

Abstract

Abstract

Due to existence of different environments and noises, the existing method is difficult to ensure the recognition accuracy of animal sound in low Signal-to-noise (SNR) conditions. To address these problems, we propose a double feature, which consists of projection feature and Local binary pattern variance (LBPV) feature, combined with Random forest (RF) for animal sound recognition. In feature extraction, an operation of projecting is made on spectrogram to generate the projection feature. Meanwhile, LBPV feature is generated by means of accumulating the corresponding variances of all pixels for every Uniform local binary pattern (ULBP) in the spectrogram. Short-time spectral estimation algorithm is used to enhance sound signals in severe mismatched noise conditions. In the experiments, we classify 40 kinds of common animal sounds under different SNRs with rain noise, traffic noise, and wind noise. As the experimental results show, the proposed framework consisting of shorttime spectrum estimation, double feature, and RF, can recognize a wide range of animal sounds and still remains a recognition rate over 80% even under 0dB SNR.
- Animal sound recognition,
- Local binary pattern variance (LBPV),
- Projection feature,
- Random forest (RF),
- Sound enhancement

FullText(HTML)

References(18)

References

C. P. H. Elemans, K. Heeck and M. Muller, “Spectrogram analysis of animal sound production”, Bioacoustics, Vol.18, No.2, pp.183–212, 2008.

M. Depraetere, S. Pavoine, F. Jiguet, et al.,“Monitoring animal diversity using acoustic indices: implementation in a temperate woodland”, Ecological Indicators, Vol.13, No.1, pp.46–54, 2012.

M. Towsey, B. Planitz, A. Nantes, et al.,“A toolbox for animal call recognition”, Bioacoustics, Vol.21, No.2, pp.107-125, 2012.

T. A. Marques, L. Thomas, S. W. Martin, et al.,“Estimating animal population density using passive acoustics”, Biological Reviews, Vol.88, No.2, pp.287–309, 2013.

J. Wang, C. Lin, B, Chen, et al.,“Gabor-based nonuniform scale-frequency map for environmental sound classification in home automation”, IEEE Trans. Autom. Sci. Eng., Vol.11, no. 2, pp.607–613, Apr. 2014.

S. Ou, P. Song and Y. Gao, “Soft decision based gaussianLaplacian combination model for noisy speech enhancement”, Chinese Journal of Electronics, Vol.27, No.4, pp.827–834, 2018.

J. Wei and Y. Li, “Rapid bird sound recognition using antinoise texture features and random forest”, Acta Electronica Sinica, Vol.43, No.1, pp.185–190, 2015. (in Chinese)

Y, Li and J. Yin, “Sound event detection at low SNR based on multi-random forests”, Acta Electronica Sinica, Vol.46, No.11, pp.2705–2713, 2018. (in Chinese)

Y. Li, Q. Wang, X. Zhang, et al.,“Audio events clustering based on agglomerative information bottleneck”, Acta Electronica Sinica, Vol.45, No.5, pp.1064–1071, 2017. (in Chinese)

J. Dennis, H. D. Tran and E. S. Chng, “Image feature representation of the subband power distribution for robust sound event classification”, IEEE Trans. Audio, Speech, Lang. Process., Vol.21, No.2, pp.367–377, 2013.

X. Liu and Y. Gao, “Speech enhancement algorithm with leading-in delay”, Modern Electronic Technology, Vol.34, No.5, pp.85–88, 2011. (in Chinese).

Z. Guo, Z. Lei and D. Zhang, “Rotation invariant texture classification using LBP variance (LBPV) with global matching”, Pattern Recognition, Vol.43, No.3, pp.707–719, 2010.

T. Ojala, P. Matti and T. Maenpaa, “Multiresolution grayscale and rotation invariant texture classification with local binary patterns”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.24, No.7, pp.971–987, 2002.

K. M. Chang and S. H. Liu, “Gaussian noise filtering from ECG by Wiener filter and ensemble empirical mode decomposition”, Journal of Signal Processing Systems, Vol.64, No.2, pp.249–264, 2011.

K. Paliwal, K. Wójcicki and B. Schwerin, “Single-channel speech enhancement using spectral subtraction in the shorttime modulation domain”, Speech Communication, Vol.52, No.5, pp.450–475, 2010.

G. Roma, P. Herrera and X. Serra, “Characterization of the Freesound online community”, Proc. of 3rd int. Workshop Cognitive Inf. Process., Barcelona, Spain, pp.1–6, 2012.

T. Ojala, P. Matti and D. Harwood, “A comparative study of texture measures with classification based on featured distributions”, Pattern Recognition, Vol.29, No.1, pp.51–59, 1996.

A. Rakotomamonjy and G. Gasso, “Histogram of gradients of time-frequency representations for audio scene classification”, IEEE Trans. Audio, Speech, Lang. Process., Vol.23, No.1, PP.142–153, 2015.

Relative Articles

Supplements(0)

Cited By

Proportional views