Automatic Piano Music Transcription Using Audio-Visual Features
-
Abstract
The performance of automatic music transcription seems to have reached a limit over the last decade, and a promising direction of improvements could be to incorporate music instruments' specific parameters. We propose a novel piano-specific transcription system, using both audio and visual features for the first time. Contribution of the paper mainly includes two parts: A new onset detection method is proposed using a specific spectrum envelope matched filter on multiple frequency bands. A computer-vision method is proposed to enhance audio-only piano music transcription, through tracking the positions of the pianist's hands on the piano keyboard. Based on the MIDI Aligned piano sounds (MAPS) database and a selfrecorded video database, we carried out comparable experiments for audio-only onset detection and overall system, respectively. The performance was compared with the best piano transcription system in Music information retrieval evaluation exchange (MIREX), and the results showed that the proposed system outperforms the state-of-art method substantially.
-
-