음질 및 속도 향상을 위한 선형 스펙트로그램 활용 Text-to-speech
DC Field | Value | Language |
---|---|---|
dc.contributor.author | 윤혜빈 | - |
dc.contributor.author | 남호성 | - |
dc.date.accessioned | 2022-03-08T08:41:58Z | - |
dc.date.available | 2022-03-08T08:41:58Z | - |
dc.date.created | 2022-02-10 | - |
dc.date.issued | 2021 | - |
dc.identifier.issn | 2005-8063 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/138207 | - |
dc.description.abstract | Most neural-network-based speech synthesis models utilize neural vocoders to convert mel-scaled spectrograms into high-quality, human-like voices. However, neural vocoders combined with mel-scaled spectrogram prediction models demand considerable computer memory and time during the training phase and are subject to slow inference speeds in an environment where GPU is not used. This problem does not arise in linear spectrogram prediction models, as they do not use neural vocoders, but these models suffer from low voice quality. As a solution, this paper proposes a Tacotron 2 and Transformer-based linear spectrogram prediction model that produces high-quality speech and does not use neural vocoders. Experiments suggest that this model can serve as the foundation of a high-quality text-to-speech model with fast inference speed. | - |
dc.language | Korean | - |
dc.language.iso | ko | - |
dc.publisher | 한국음성학회 | - |
dc.title | 음질 및 속도 향상을 위한 선형 스펙트로그램 활용 Text-to-speech | - |
dc.title.alternative | Text-to-speech with linear spectrogram prediction for quality and speed improvement | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | 남호성 | - |
dc.identifier.doi | 10.13064/KSSS.2021.13.3.071 | - |
dc.identifier.bibliographicCitation | 말소리와 음성과학, v.13, no.3, pp.71 - 78 | - |
dc.relation.isPartOf | 말소리와 음성과학 | - |
dc.citation.title | 말소리와 음성과학 | - |
dc.citation.volume | 13 | - |
dc.citation.number | 3 | - |
dc.citation.startPage | 71 | - |
dc.citation.endPage | 78 | - |
dc.type.rims | ART | - |
dc.identifier.kciid | ART002763124 | - |
dc.description.journalClass | 2 | - |
dc.description.journalRegisteredClass | kci | - |
dc.subject.keywordAuthor | artificial intelligence | - |
dc.subject.keywordAuthor | machine learning | - |
dc.subject.keywordAuthor | speech synthesis | - |
dc.subject.keywordAuthor | text-to-speech (TTS) | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.