음질 및 속도 향상을 위한 선형 스펙트로그램 활용 Text-to-speech

윤혜빈; 남호성

doi:10.13064/KSSS.2021.13.3.071

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

음질 및 속도 향상을 위한 선형 스펙트로그램 활용 Text-to-speech

Full metadata record

DC Field	Value	Language
dc.contributor.author	윤혜빈	-
dc.contributor.author	남호성	-
dc.date.accessioned	2022-03-08T08:41:58Z	-
dc.date.available	2022-03-08T08:41:58Z	-
dc.date.created	2022-02-10	-
dc.date.issued	2021	-
dc.identifier.issn	2005-8063	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/138207	-
dc.description.abstract	Most neural-network-based speech synthesis models utilize neural vocoders to convert mel-scaled spectrograms into high-quality, human-like voices. However, neural vocoders combined with mel-scaled spectrogram prediction models demand considerable computer memory and time during the training phase and are subject to slow inference speeds in an environment where GPU is not used. This problem does not arise in linear spectrogram prediction models, as they do not use neural vocoders, but these models suffer from low voice quality. As a solution, this paper proposes a Tacotron 2 and Transformer-based linear spectrogram prediction model that produces high-quality speech and does not use neural vocoders. Experiments suggest that this model can serve as the foundation of a high-quality text-to-speech model with fast inference speed.	-
dc.language	Korean	-
dc.language.iso	ko	-
dc.publisher	한국음성학회	-
dc.title	음질 및 속도 향상을 위한 선형 스펙트로그램 활용 Text-to-speech	-
dc.title.alternative	Text-to-speech with linear spectrogram prediction for quality and speed improvement	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	남호성	-
dc.identifier.doi	10.13064/KSSS.2021.13.3.071	-
dc.identifier.bibliographicCitation	말소리와 음성과학, v.13, no.3, pp.71 - 78	-
dc.relation.isPartOf	말소리와 음성과학	-
dc.citation.title	말소리와 음성과학	-
dc.citation.volume	13	-
dc.citation.number	3	-
dc.citation.startPage	71	-
dc.citation.endPage	78	-
dc.type.rims	ART	-
dc.identifier.kciid	ART002763124	-
dc.description.journalClass	2	-
dc.description.journalRegisteredClass	kci	-
dc.subject.keywordAuthor	artificial intelligence	-
dc.subject.keywordAuthor	machine learning	-
dc.subject.keywordAuthor	speech synthesis	-
dc.subject.keywordAuthor	text-to-speech (TTS)	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Liberal Arts > Department of English Language and Literature > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Nam, Ho sung photo

Nam, Ho sung: 문과대학 (영어영문학과)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,751,843; Today View :34,763

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE