BERT Learns More than Word Frequency Information: A Case Study of Do-Be ConstructionsBERT Learns More than Word Frequency Information: A Case Study of Do-Be Constructions
- Other Titles
- BERT Learns More than Word Frequency Information: A Case Study of Do-Be Constructions
- Authors
- 신운섭; 송상헌
- Issue Date
- 2022
- Publisher
- 한국언어학회
- Keywords
- Do-Be construction; agreement attraction; neural language model; synonym substitution; web corpora
- Citation
- 언어, v.47, no.3, pp.467 - 489
- Indexed
- KCI
- Journal Title
- 언어
- Volume
- 47
- Number
- 3
- Start Page
- 467
- End Page
- 489
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/144123
- DOI
- 10.18855/lisoko.2022.47.3.004
- ISSN
- 1229-4039
- Abstract
- This study aims to understand BERT’s linguistic ability using naturally occurring data. In particular, the study collected marginal language data, such as what we do is create Frankenstein, which is referred to as a Do-Be construction (DBC) (Flickinger & Wasow, 2013). Using web corpora, the study first collected 17,737 instances of the DBC across text genres and English dialects. The corpus analysis supports the idea that DBC is a computationally challenging phenomenon for data-driven language systems due to its statistical sparsity and linguistic complexity. With manual annotations of DBCs, the study designed two computational prediction tasks: subject―verb agreement and synonym substitution tasks, based on the introspective judgment of linguists. The study found that BERT is hugely sensitive to linguistic acceptability of grammatical forms and felicitous words in the prediction tasks, even though the target phenomenon is rarely observed in corpus data. These results show that the neural language model, BERT, can learn abstract linguistic properties beyond surface frequency information.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Liberal Arts > Department of Linguistics > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.