Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Empirical Analysis of Parallel Corpora and In-Depth Analysis Using LIWC

Full metadata record
DC Field Value Language
dc.contributor.authorPark, Chanjun-
dc.contributor.authorShim, Midan-
dc.contributor.authorEo, Sugyeong-
dc.contributor.authorLee, Seolhwa-
dc.contributor.authorSeo, Jaehyung-
dc.contributor.authorMoon, Hyeonseok-
dc.contributor.authorLim, Heuiseok-
dc.date.accessioned2022-08-13T11:40:32Z-
dc.date.available2022-08-13T11:40:32Z-
dc.date.created2022-08-12-
dc.date.issued2022-06-
dc.identifier.issn2076-3417-
dc.identifier.urihttps://scholar.korea.ac.kr/handle/2021.sw.korea/143025-
dc.description.abstractThe machine translation system aims to translate source language into target language. Recent studies on MT systems mainly focus on neural machine translation. One factor that significantly affects the performance of NMT is the availability of high-quality parallel corpora. However, high-quality parallel corpora concerning Korean are relatively scarce compared to those associated with other high-resource languages, such as German or Italian. To address this problem, AI Hub recently released seven types of parallel corpora for Korean. In this study, we conduct an in-depth verification of the quality of corresponding parallel corpora through Linguistic Inquiry and Word Count (LIWC) and several relevant experiments. LIWC is a word-counting software program that can analyze corpora in multiple ways and extract linguistic features as a dictionary base. To the best of our knowledge, this study is the first to use LIWC to analyze parallel corpora in the field of NMT. Our findings suggest the direction of further research toward obtaining the improved quality parallel corpora through our correlation analysis in LIWC and NMT performance.-
dc.languageEnglish-
dc.language.isoen-
dc.publisherMDPI-
dc.titleEmpirical Analysis of Parallel Corpora and In-Depth Analysis Using LIWC-
dc.typeArticle-
dc.contributor.affiliatedAuthorLim, Heuiseok-
dc.identifier.doi10.3390/app12115545-
dc.identifier.scopusid2-s2.0-85134717356-
dc.identifier.wosid000808913600001-
dc.identifier.bibliographicCitationAPPLIED SCIENCES-BASEL, v.12, no.11-
dc.relation.isPartOfAPPLIED SCIENCES-BASEL-
dc.citation.titleAPPLIED SCIENCES-BASEL-
dc.citation.volume12-
dc.citation.number11-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaChemistry-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaMaterials Science-
dc.relation.journalResearchAreaPhysics-
dc.relation.journalWebOfScienceCategoryChemistry, Multidisciplinary-
dc.relation.journalWebOfScienceCategoryEngineering, Multidisciplinary-
dc.relation.journalWebOfScienceCategoryMaterials Science, Multidisciplinary-
dc.relation.journalWebOfScienceCategoryPhysics, Applied-
dc.subject.keywordAuthorneural machine translation-
dc.subject.keywordAuthorKorean-English neural machine translation-
dc.subject.keywordAuthortransformer-
dc.subject.keywordAuthorparallel corpus-
dc.subject.keywordAuthorAI Hub-
Files in This Item
There are no files associated with this item.
Appears in
Collections
Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE