Neural spelling correction: translating incorrect sentences to correct sentences for multimedia
- Authors
- Park, Chanjun; Kim, Kuekyeng; Yang, YeongWook; Kang, Minho; Lim, Heuiseok
- Issue Date
- 2021
- Publisher
- SPRINGER
- Keywords
- Korean spelling correction; Automatic noise generation; Neural machine translation; Transformer; Copy mechanism; Overcorrection
- Citation
- MULTIMEDIA TOOLS AND APPLICATIONS, v.80, no.26-27, pp.34591 - 34608
- Indexed
- SCIE
SCOPUS
- Journal Title
- MULTIMEDIA TOOLS AND APPLICATIONS
- Volume
- 80
- Number
- 26-27
- Start Page
- 34591
- End Page
- 34608
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/139627
- DOI
- 10.1007/s11042-020-09148-2
- ISSN
- 1380-7501
- Abstract
- The aim of a spelling correction task is to detect spelling errors and automatically correct them. In this paper we aim to perform the Korean spelling correction task from a machine translation perspective, allowing it to overcome the limitations of cost, time and data. Based on a sequence to sequence model, the model aligns its source sentence with an 'error filled sentence' and its target sentence aligned to the correct counter part. Thus, 'translating' the error sentence to a correct sentence. For this research, we have also proposed three new data generation methods allowing the creation of multiple spelling correction parallel corpora from just a single monolingual corpus. Additionally, we discovered that applying the Copy Mechanism not only resolves the problem of overcorrection but even improves it. For this paper, we evaluated our model upon these aspects: Performance comparisons to other models and evaluation on overcorrection. The results show the proposed model to even out-perform other systems currently in commercial use.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - Graduate School > Department of Computer Science and Engineering > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.