Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

BERT Learns More than Word Frequency Information: A Case Study of Do-Be Constructions

Full metadata record
DC Field Value Language
dc.contributor.author신운섭-
dc.contributor.author송상헌-
dc.date.accessioned2022-10-06T13:41:57Z-
dc.date.available2022-10-06T13:41:57Z-
dc.date.created2022-10-06-
dc.date.issued2022-
dc.identifier.issn1229-4039-
dc.identifier.urihttps://scholar.korea.ac.kr/handle/2021.sw.korea/144123-
dc.description.abstractThis study aims to understand BERT’s linguistic ability using naturally occurring data. In particular, the study collected marginal language data, such as what we do is create Frankenstein, which is referred to as a Do-Be construction (DBC) (Flickinger & Wasow, 2013). Using web corpora, the study first collected 17,737 instances of the DBC across text genres and English dialects. The corpus analysis supports the idea that DBC is a computationally challenging phenomenon for data-driven language systems due to its statistical sparsity and linguistic complexity. With manual annotations of DBCs, the study designed two computational prediction tasks: subject―verb agreement and synonym substitution tasks, based on the introspective judgment of linguists. The study found that BERT is hugely sensitive to linguistic acceptability of grammatical forms and felicitous words in the prediction tasks, even though the target phenomenon is rarely observed in corpus data. These results show that the neural language model, BERT, can learn abstract linguistic properties beyond surface frequency information.-
dc.languageEnglish-
dc.language.isoen-
dc.publisher한국언어학회-
dc.titleBERT Learns More than Word Frequency Information: A Case Study of Do-Be Constructions-
dc.title.alternativeBERT Learns More than Word Frequency Information: A Case Study of Do-Be Constructions-
dc.typeArticle-
dc.contributor.affiliatedAuthor송상헌-
dc.identifier.doi10.18855/lisoko.2022.47.3.004-
dc.identifier.bibliographicCitation언어, v.47, no.3, pp.467 - 489-
dc.relation.isPartOf언어-
dc.citation.title언어-
dc.citation.volume47-
dc.citation.number3-
dc.citation.startPage467-
dc.citation.endPage489-
dc.type.rimsART-
dc.identifier.kciidART002882708-
dc.description.journalClass2-
dc.description.journalRegisteredClasskci-
dc.subject.keywordAuthorDo-Be construction-
dc.subject.keywordAuthoragreement attraction-
dc.subject.keywordAuthorneural language model-
dc.subject.keywordAuthorsynonym substitution-
dc.subject.keywordAuthorweb corpora-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Liberal Arts > Department of Linguistics > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE