Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

EARSHOT: A Minimal Neural Network Model of Incremental Human Speech Recognition

Full metadata record
DC Field Value Language
dc.contributor.authorMagnuson, James S.-
dc.contributor.authorYou, Heejo-
dc.contributor.authorLuthra, Sahil-
dc.contributor.authorLi, Monica-
dc.contributor.authorNam, Hosung-
dc.contributor.authorEscabi, Monty-
dc.contributor.authorBrown, Kevin-
dc.contributor.authorAllopenna, Paul D.-
dc.contributor.authorTheodore, Rachel M.-
dc.contributor.authorMonto, Nicholas-
dc.contributor.authorRueckl, Jay G.-
dc.date.accessioned2021-08-31T04:52:00Z-
dc.date.available2021-08-31T04:52:00Z-
dc.date.created2021-06-18-
dc.date.issued2020-04-
dc.identifier.issn0364-0213-
dc.identifier.urihttps://scholar.korea.ac.kr/handle/2021.sw.korea/56805-
dc.description.abstractDespite the lack of invariance problem (the many-to-many mapping between acoustics and percepts), human listeners experience phonetic constancy and typically perceive what a speaker intends. Most models of human speech recognition (HSR) have side-stepped this problem, working with abstract, idealized inputs and deferring the challenge of working with real speech. In contrast, carefully engineered deep learning networks allow robust, real-world automatic speech recognition (ASR). However, the complexities of deep learning architectures and training regimens make it difficult to use them to provide direct insights into mechanisms that may support HSR. In this brief article, we report preliminary results from a two-layer network that borrows one element from ASR, long short-term memory nodes, which provide dynamic memory for a range of temporal spans. This allows the model to learn to map real speech from multiple talkers to semantic targets with high accuracy, with human-like timecourse of lexical access and phonological competition. Internal representations emerge that resemble phonetically organized responses in human superior temporal gyrus, suggesting that the model develops a distributed phonological code despite no explicit training on phonetic or phonemic targets. The ability to work with real speech is a major advance for cognitive models of HSR.-
dc.languageEnglish-
dc.language.isoen-
dc.publisherWILEY-
dc.subjectPERCEPTION-
dc.titleEARSHOT: A Minimal Neural Network Model of Incremental Human Speech Recognition-
dc.typeArticle-
dc.contributor.affiliatedAuthorNam, Hosung-
dc.identifier.doi10.1111/cogs.12823-
dc.identifier.scopusid2-s2.0-85083180780-
dc.identifier.wosid000529264000001-
dc.identifier.bibliographicCitationCOGNITIVE SCIENCE, v.44, no.4-
dc.relation.isPartOfCOGNITIVE SCIENCE-
dc.citation.titleCOGNITIVE SCIENCE-
dc.citation.volume44-
dc.citation.number4-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.description.journalRegisteredClassssci-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaPsychology-
dc.relation.journalWebOfScienceCategoryPsychology, Experimental-
dc.subject.keywordPlusPERCEPTION-
dc.subject.keywordAuthorHuman speech recognition-
dc.subject.keywordAuthorComputational modeling-
dc.subject.keywordAuthorNeurobiology of language-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Liberal Arts > Department of English Language and Literature > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Nam, Ho sung photo

Nam, Ho sung
문과대학 (영어영문학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE