Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, S. | - |
dc.contributor.author | Noh, H. | - |
dc.contributor.author | Nam, W. | - |
dc.contributor.author | Lee, S. | - |
dc.date.accessioned | 2022-04-12T22:42:07Z | - |
dc.date.available | 2022-04-12T22:42:07Z | - |
dc.date.created | 2022-04-12 | - |
dc.date.issued | 2022 | - |
dc.identifier.issn | 2329-9290 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/140172 | - |
dc.description.abstract | Several voice conversion (VC) methods using a simple autoencoder with a carefully designed information bottleneck have recently been studied. In general, they extract content information from a given speech through the information bottleneck between the encoder and the decoder, providing it to the decoder along with the target speaker information to generate the converted speech. However, their performance is highly dependent on the downsampling factor of an information bottleneck. In addition, such frame-by-frame conversion methods cannot convert speaking styles associated with the length of utterance, such as the duration. In this paper, we propose a novel duration controllable voice conversion (DCVC) model, which can transfer the speaking style and control the speed of the converted speech through a phoneme-based information bottleneck. The proposed information bottleneck does not need to find an appropriate downsampling factor, achieving a better audio quality and VC performance. In our experiments, DCVC outperformed the baseline models with a 3.78 MOS and a 3.83 similarity score. It can also smoothly control the speech duration while achieving a 39.35x speedup compared with a Seq2seq-based VC in terms of the inference speed. Author | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
dc.title | Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Lee, S. | - |
dc.identifier.doi | 10.1109/TASLP.2022.3156757 | - |
dc.identifier.scopusid | 2-s2.0-85126301672 | - |
dc.identifier.wosid | 000776210900002 | - |
dc.identifier.bibliographicCitation | IEEE/ACM Transactions on Audio Speech and Language Processing, v.30, pp.1173 - 1183 | - |
dc.relation.isPartOf | IEEE/ACM Transactions on Audio Speech and Language Processing | - |
dc.citation.title | IEEE/ACM Transactions on Audio Speech and Language Processing | - |
dc.citation.volume | 30 | - |
dc.citation.startPage | 1173 | - |
dc.citation.endPage | 1183 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Acoustics | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Acoustics | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.subject.keywordAuthor | Decoding | - |
dc.subject.keywordAuthor | Generative adversarial networks | - |
dc.subject.keywordAuthor | information bottleneck | - |
dc.subject.keywordAuthor | Licenses | - |
dc.subject.keywordAuthor | non-autoregressive model | - |
dc.subject.keywordAuthor | Speech | - |
dc.subject.keywordAuthor | Speech processing | - |
dc.subject.keywordAuthor | Timbre | - |
dc.subject.keywordAuthor | Training | - |
dc.subject.keywordAuthor | voice conversion | - |
dc.subject.keywordAuthor | voice style transfer | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.