Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Comparison of audio input representations on piano transcription using neural networks

Full metadata record
DC Field Value Language
dc.contributor.author한혜민-
dc.contributor.author정윤서-
dc.date.accessioned2022-03-06T07:40:19Z-
dc.date.available2022-03-06T07:40:19Z-
dc.date.created2022-02-10-
dc.date.issued2021-
dc.identifier.issn1598-9402-
dc.identifier.urihttps://scholar.korea.ac.kr/handle/2021.sw.korea/137957-
dc.description.abstractWe compare the effect of multiple input representations on polyphonic piano music transcription based on neural networks. A state-of-the-art piano transcription neural network model, onsets and frames, is explored. We first provide detailed backgrounds of the piano transcription and input representations for the readers who are unfamiliar with this area. For comparing their effects, we consider four spectrograms; Mel-spectrogram, Linear-spectrogram, Log-spectrogram and constant-Q-transform with various hyper parameters. The effects of frequency bins, Short Time Fourier Transformation (STFT) window size and hop length on the four spectrograms are also examined. Our results show that Mel-spectrogram of 2,048 STFT window size, 512 frequency bins and 256 hop length yields the highest accuracy. We show that Mel-spectrogram is one of the most satisfactory input representations in general. Mel-spectrogram dominates other spectrograms and keeps a relatively high transcription accuracy even at the low resolutions in our experiments.-
dc.languageEnglish-
dc.language.isoen-
dc.publisher한국데이터정보과학회-
dc.titleComparison of audio input representations on piano transcription using neural networks-
dc.title.alternativeComparison of audio input representations on piano transcription using neural networks-
dc.typeArticle-
dc.contributor.affiliatedAuthor정윤서-
dc.identifier.bibliographicCitation한국데이터정보과학회지, v.32, no.2, pp.439 - 453-
dc.relation.isPartOf한국데이터정보과학회지-
dc.citation.title한국데이터정보과학회지-
dc.citation.volume32-
dc.citation.number2-
dc.citation.startPage439-
dc.citation.endPage453-
dc.type.rimsART-
dc.identifier.kciidART002701713-
dc.description.journalClass2-
dc.description.journalRegisteredClasskci-
dc.subject.keywordAuthorAudio input representation-
dc.subject.keywordAuthorautomatic music transcription-
dc.subject.keywordAuthorneural network-
dc.subject.keywordAuthorspectrogram-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Political Science & Economics > Department of Statistics > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE