한국어 인공신경망 기계번역의 서브 워드 분절 연구 및 음절 기반 종성 분리 토큰화 제안

어수경; 박찬준; 문현석; 임희석

doi:10.15207/JKCS.2021.12.3.001

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

한국어 인공신경망 기계번역의 서브 워드 분절 연구 및 음절 기반 종성 분리 토큰화 제안Research on Subword Tokenization of Korean Neural Machine Translation and Proposal for Tokenization Method to Separate Jongsung from Syllables

Other Titles: Research on Subword Tokenization of Korean Neural Machine Translation and Proposal for Tokenization Method to Separate Jongsung from Syllables

Authors: 어수경; 박찬준; 문현석; 임희석

Issue Date: 2021

Publisher: 한국융합학회

Keywords: Machine Translation; Preprocessing; Subword Tokenization; Subword; Deep Learning; Convergence; 기계번역; 전처리; 서브 워드 분절; 서브 워드; 딥러닝; 융합

Citation: 한국융합학회논문지, v.12, no.3, pp.1 - 7

Indexed: KCI

Journal Title: 한국융합학회논문지

Volume: 12

Number: 3

Start Page: 1

End Page: 7

URI: https://scholar.korea.ac.kr/handle/2021.sw.korea/129868

DOI: 10.15207/JKCS.2021.12.3.001

ISSN: 2233-4890

Abstract: 인공신경망 기계번역(Neural Machine Translation, NMT)은 한정된 개수의 단어만을 번역에 이용하기 때문에 사전에 등록되지 않은 단어들이 입력으로 들어올 가능성이 있다. 이러한 Out of Vocabulary(OOV) 문제를 완화하고자 고안된 방법이 서브 워드 분절(Subword Tokenization)이며, 이는 문장을 단어보다 더 작은 서브 워드 단위로 분할하여 단어를 구성하는 방법론이다. 본 논문에서는 일반적인 서브 워드 분절 알고리즘들을 다루며, 나아가 한국어의 무한한 용언 활용을 잘 다룰 수 있는 사전을 만들기 위해 한국어의 음절 중 종성을 분리하여 서브 워드 분절을 학습하는 새로운 방법론을 제안한다. 실험결과 본 논문에서 제안하는 방법론이 기존의 서브 워드 분리 방법론보다 높은 성능을 거두었다.

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :9,547,377; Today View :11,771

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE