Comparison of audio input representations on piano transcription using neural networks

한혜민; 정윤서

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Comparison of audio input representations on piano transcription using neural networks

Full metadata record

DC Field	Value	Language
dc.contributor.author	한혜민	-
dc.contributor.author	정윤서	-
dc.date.accessioned	2022-03-06T07:40:19Z	-
dc.date.available	2022-03-06T07:40:19Z	-
dc.date.created	2022-02-10	-
dc.date.issued	2021	-
dc.identifier.issn	1598-9402	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/137957	-
dc.description.abstract	We compare the effect of multiple input representations on polyphonic piano music transcription based on neural networks. A state-of-the-art piano transcription neural network model, onsets and frames, is explored. We first provide detailed backgrounds of the piano transcription and input representations for the readers who are unfamiliar with this area. For comparing their effects, we consider four spectrograms; Mel-spectrogram, Linear-spectrogram, Log-spectrogram and constant-Q-transform with various hyper parameters. The effects of frequency bins, Short Time Fourier Transformation (STFT) window size and hop length on the four spectrograms are also examined. Our results show that Mel-spectrogram of 2,048 STFT window size, 512 frequency bins and 256 hop length yields the highest accuracy. We show that Mel-spectrogram is one of the most satisfactory input representations in general. Mel-spectrogram dominates other spectrograms and keeps a relatively high transcription accuracy even at the low resolutions in our experiments.	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	한국데이터정보과학회	-
dc.title	Comparison of audio input representations on piano transcription using neural networks	-
dc.title.alternative	Comparison of audio input representations on piano transcription using neural networks	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	정윤서	-
dc.identifier.bibliographicCitation	한국데이터정보과학회지, v.32, no.2, pp.439 - 453	-
dc.relation.isPartOf	한국데이터정보과학회지	-
dc.citation.title	한국데이터정보과학회지	-
dc.citation.volume	32	-
dc.citation.number	2	-
dc.citation.startPage	439	-
dc.citation.endPage	453	-
dc.type.rims	ART	-
dc.identifier.kciid	ART002701713	-
dc.description.journalClass	2	-
dc.description.journalRegisteredClass	kci	-
dc.subject.keywordAuthor	Audio input representation	-
dc.subject.keywordAuthor	automatic music transcription	-
dc.subject.keywordAuthor	neural network	-
dc.subject.keywordAuthor	spectrogram	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Political Science & Economics > Department of Statistics > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,668,645; Today View :70

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE