WAN Yulong, WANG Xianliang, ZHOU Ruohua, et al., “Automatic Piano Music Transcription Using Audio-Visual Features,” Chinese Journal of Electronics, vol. 24, no. 3, pp. 596-603, 2015, doi: 10.1049/cje.2015.07.027
Citation: WAN Yulong, WANG Xianliang, ZHOU Ruohua, et al., “Automatic Piano Music Transcription Using Audio-Visual Features,” Chinese Journal of Electronics, vol. 24, no. 3, pp. 596-603, 2015, doi: 10.1049/cje.2015.07.027

Automatic Piano Music Transcription Using Audio-Visual Features

doi: 10.1049/cje.2015.07.027
Funds:  This work is partially supported by the National Natural Science Foundation of China (No.10925419, No.90920302, No.61072124, No.11074275, No.11161140319, No.91120001, No.61271426), the Strategic Priority Research Program of the Chinese Academy of Sciences (No.XDA06030100, No.XDA06030500), the National 863 Program (No.2012AA012503), and the CAS Priority Deployment Project (No.KGZD-EW-103-2).
  • Received Date: 2013-11-14
  • Rev Recd Date: 2014-02-21
  • Publish Date: 2015-07-10
  • The performance of automatic music transcription seems to have reached a limit over the last decade, and a promising direction of improvements could be to incorporate music instruments' specific parameters. We propose a novel piano-specific transcription system, using both audio and visual features for the first time. Contribution of the paper mainly includes two parts: A new onset detection method is proposed using a specific spectrum envelope matched filter on multiple frequency bands. A computer-vision method is proposed to enhance audio-only piano music transcription, through tracking the positions of the pianist's hands on the piano keyboard. Based on the MIDI Aligned piano sounds (MAPS) database and a selfrecorded video database, we carried out comparable experiments for audio-only onset detection and overall system, respectively. The performance was compared with the best piano transcription system in Music information retrieval evaluation exchange (MIREX), and the results showed that the proposed system outperforms the state-of-art method substantially.
  • loading
  • X. Shao, M.C. Maddage, C. Xu, and M.S. Kankanhalli, "Automatic music summarization based on music structure analysis", IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings, Philadelphia, Pennsylvania, USA, pp.1169-1172, 2005.
    E.D. Scheirer, "Tempo and beat analysis of acoustic musical signals", The Journal of the Acoustical Society of America, Vol.103, No.1, pp.588-601, 1998.
    M. Goto, "An audio-based real-time beat tracking system for music with or without drum-sounds", Journal of New Music Research, Vol.30, No.2, pp.159-171, 2001.
    A. Klapuri, "Sound onset detection by applying psychoacoustic knowledge", IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings, Phoenix, Arizona, USA, pp.3089-3092, 1999.
    C. Duxbury, M. Sandler, and M. Davies, "A hybrid approach to musical note onset detection", Proc. Digital Audio Effects Conf. (DAFX,02), Hamburg, Germany, pp.33-38, 2002.
    J.P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M.B. Sandler, "A tutorial on onset detection in music signals", IEEE Transactions on Speech and Audio Processing, Vol.13, No.5, pp.1035-1047, 2005.
    J.A. Moorer, "On the transcription of musical sound by computer", Computer Music Journal, Vol.1, No.4, pp.32-38, 1977.
    C. Chafe, D. Jaffe, K. Kashima, B. Mont-Reynaud and J. Smith, Techniques for Note Identification in Polyphonic Music, CCRMA, Department of Music, Stanford University, 1985.
    J.P. Stautner, "Analysis and synthesis of music using the auditory transform", Ph.D. Thesis, Massachusetts Institute of Technology, 1983.
    K.D. Martin, "A blackboard system for automatic transcription of simple polyphonic music", Massachusetts Institute of Technology Media Laboratory Perceptual Computing Section Technical Report, pp.385, 1996.
    Music information retrieval evaluation exchange (MIREX), available at http://music-ir.org/mirexwiki/, 2008.
    F. Argenti, P. Nesi and G. Pantaleo, "Automatic transcription of polyphonic music based on the constant-q bispectral analysis", IEEE Transactions on Audio, Speech, and Language Processing, Vol.19, No.6, pp.1610-1630, 2011.
    B. Niedermayer, "Non-negative matrix division for the automatic transcription of polyphonic music", Proc. ISMIR, Drexel University, Philadelphia, PA, USA, pp.544-549, 2008.
    V. Emiya, R. Badeau and B. David, "Automatic transcription of piano music based on HMM tracking of jointly-estimated pitches", Proc. Eur. Conf. Sig. Proces.(EUSIPCO), Lausanne, Switzerland, 2008.
    E. Benetos, S. Dixon, D. Giannoulis, H. Kirchhoff and A. Klapuri, "Automatic music transcription: Challenges and future directions", Journal of Intelligent Information Systems, Vol.41, No.3, pp.407-434, 2013.
    I. Barbancho, C. de la Bandera, A.M. Barbancho and L.J. Tardon, "Transcription and expressiveness detection system for violin music", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Taipei, Taiwan, pp.189-192, 2009.
    M. Marolt, "Automatic transcription of bell chiming recordings", IEEE Transactions on Audio, Speech, and Language Processing, Vol.20, No.3, pp.844-853, 2012.
    O. Gillet and G. Richard, "Automatic labelling of tabla signals", Proc. ISMIR, Baltimore, Maryland, USA, 2003.
    A.M. Barbancho, A. Klapuri, L.J. Tardon and I. Barbancho, "Automatic transcription of guitar chords and fingering from audio", IEEE Transactions on Audio, Speech, and Language Processing, Vol.20, No.3, pp.915-921, 2012.
    M. Marolt, "A connectionist approach to automatic transcription of polyphonic piano music", IEEE Transactions on Multimedia, Vol.6, No.3, pp.439-449, 2004.
    P. Smaragdis and M. Casey, "Audio/visual independent components", Proc. ICA, Nara, Japan, pp.709-714, 2003.
    O. Gillet and G. Richard, "Automatic transcription of drum sequences using audiovisual features", IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, Pennsylvania, USA, pp.iii-205, 2005.
    Y.Wang, B. Zhang and O. Schleusing, "Educational violin transcription by fusing multimedia streams", Proceedings of the International Workshop on Educational Multimedia and Multimedia Education, Augsburg, Bavaria, Germany, pp.57-66, 2007.
    M. Paleari, B. Huet, A. Schutz and D. Slock, "A multimodal approach to music transcription", IEEE International Conference on Image Processing, San Diego, California, USA, pp.93-96, 2008.
    R. Zhou and J.D. Reiss, "A real-time polyphonic music transcription system", Proc. Music Information Retrieval Evaluation eXchange (MIREX), Philadelphia, Pennsylvania, USA, 2008.
    R. Zhou, "Feature extraction of musical content for automatic music transcription", Ph.D. Thesis, Swiss Federal Institute of Technology, Lausanne, Switzerland, 2006.
    E. Benetos, "Automatic transcription of polyphonic music exploiting temporal evolution", Ph.D. Thesis, Queen Mary University of London, 2012.
    D.W. Robinson and R.S. Dadson, "A re-determination of the equal-loudness relations for pure tones", British Journal of Applied Physics, Vol.7, No.5, pp.166, 1956.
    G.L. Turin, "An introduction to matched filters", IRE Transactions on Information Theory, Vol.6, No.3, pp.311-329, 1960.
    J.J. Ding, C.J. Tseng, C.M. Hu and T. Hsien, "Improved onset detection algorithm based on fractional power envelope match filter", European Signal Processing Conference (EUSIPCO), Barcelona, Spain, pp.709-713, 2011.
    "Open source computer vision library (Opencv)", available at http://opencv.org, 2013.
    W. Westerman, "Hand tracking, finger identification, and chordic manipulation on a multi-touch surface", Ph.D. Thesis, University of Delaware, 1999.
    D. Chai and K.N. Ngan, "Face segmentation using skin-color map in videophone applications", IEEE Transactions on Circuits and Systems for Video Technology, Vol.9, No.4, pp.551- 564, 1999.
    V. Emiya, R. Badeau and B. David, "Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle", IEEE Transactions on Audio, Speech, and Language Processing, Vol.18, No.6, pp.1643-1654, 2010.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (803) PDF downloads(1171) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return