Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck

Lee, S.; Noh, H.; Nam, W.; Lee, S.

doi:10.1109/TASLP.2022.3156757

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, S.	-
dc.contributor.author	Noh, H.	-
dc.contributor.author	Nam, W.	-
dc.contributor.author	Lee, S.	-
dc.date.accessioned	2022-04-12T22:42:07Z	-
dc.date.available	2022-04-12T22:42:07Z	-
dc.date.created	2022-04-12	-
dc.date.issued	2022	-
dc.identifier.issn	2329-9290	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/140172	-
dc.description.abstract	Several voice conversion (VC) methods using a simple autoencoder with a carefully designed information bottleneck have recently been studied. In general, they extract content information from a given speech through the information bottleneck between the encoder and the decoder, providing it to the decoder along with the target speaker information to generate the converted speech. However, their performance is highly dependent on the downsampling factor of an information bottleneck. In addition, such frame-by-frame conversion methods cannot convert speaking styles associated with the length of utterance, such as the duration. In this paper, we propose a novel duration controllable voice conversion (DCVC) model, which can transfer the speaking style and control the speed of the converted speech through a phoneme-based information bottleneck. The proposed information bottleneck does not need to find an appropriate downsampling factor, achieving a better audio quality and VC performance. In our experiments, DCVC outperformed the baseline models with a 3.78 MOS and a 3.83 similarity score. It can also smoothly control the speech duration while achieving a 39.35x speedup compared with a Seq2seq-based VC in terms of the inference speed. Author	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Lee, S.	-
dc.identifier.doi	10.1109/TASLP.2022.3156757	-
dc.identifier.scopusid	2-s2.0-85126301672	-
dc.identifier.wosid	000776210900002	-
dc.identifier.bibliographicCitation	IEEE/ACM Transactions on Audio Speech and Language Processing, v.30, pp.1173 - 1183	-
dc.relation.isPartOf	IEEE/ACM Transactions on Audio Speech and Language Processing	-
dc.citation.title	IEEE/ACM Transactions on Audio Speech and Language Processing	-
dc.citation.volume	30	-
dc.citation.startPage	1173	-
dc.citation.endPage	1183	-
dc.type.rims	ART	-
dc.type.docType	Article	-
dc.description.journalClass	1	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Acoustics	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalWebOfScienceCategory	Acoustics	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.subject.keywordAuthor	Decoding	-
dc.subject.keywordAuthor	Generative adversarial networks	-
dc.subject.keywordAuthor	information bottleneck	-
dc.subject.keywordAuthor	Licenses	-
dc.subject.keywordAuthor	non-autoregressive model	-
dc.subject.keywordAuthor	Speech	-
dc.subject.keywordAuthor	Speech processing	-
dc.subject.keywordAuthor	Timbre	-
dc.subject.keywordAuthor	Training	-
dc.subject.keywordAuthor	voice conversion	-
dc.subject.keywordAuthor	voice style transfer	-

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School > Department of Artificial Intelligence > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Lee, Seong Whan photo

Lee, Seong Whan: 인공지능학과

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,446,510; Today View :22,801

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE