Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

음질 및 속도 향상을 위한 선형 스펙트로그램 활용 Text-to-speechText-to-speech with linear spectrogram prediction for quality and speed improvement

Other Titles
Text-to-speech with linear spectrogram prediction for quality and speed improvement
Authors
윤혜빈남호성
Issue Date
2021
Publisher
한국음성학회
Keywords
speech synthesis; machine learning; artificial intelligence; text-to-speech (TTS)
Citation
말소리와 음성과학, v.13, no.3, pp 71 - 78
Pages
8
Indexed
KCI
Journal Title
말소리와 음성과학
Volume
13
Number
3
Start Page
71
End Page
78
URI
https://scholar.korea.ac.kr/handle/2021.sw.korea/138207
DOI
10.13064/KSSS.2021.13.3.071
ISSN
2005-8063
Abstract
Most neural-network-based speech synthesis models utilize neural vocoders to convert mel-scaled spectrograms into high-quality, human-like voices. However, neural vocoders combined with mel-scaled spectrogram prediction models demand considerable computer memory and time during the training phase and are subject to slow inference speeds in an environment where GPU is not used. This problem does not arise in linear spectrogram prediction models, as they do not use neural vocoders, but these models suffer from low voice quality. As a solution, this paper proposes a Tacotron 2 and Transformer-based linear spectrogram prediction model that produces high-quality speech and does not use neural vocoders. Experiments suggest that this model can serve as the foundation of a high-quality text-to-speech model with fast inference speed.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Liberal Arts > Department of English Language and Literature > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Nam, Ho sung photo

Nam, Ho sung
College of Liberal Arts (Department of English Language and Literature)
Read more

Altmetrics

Total Views & Downloads

BROWSE