Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks

Lee, Ki-Sun; Lee, Eunyoung; Choi, Bareun; Pyun, Sung-Bom

doi:10.3390/diagnostics11020300

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks

Authors: Lee, Ki-Sun; Lee, Eunyoung; Choi, Bareun; Pyun, Sung-Bom

Issue Date: 2월-2021

Publisher: MDPI

Keywords: videofluoroscopic swallowing study; action recognition; deep learning; convolutional neural network; transfer learning

Citation: DIAGNOSTICS, v.11, no.2

Indexed: SCIE
SCOPUS

Journal Title: DIAGNOSTICS

Volume: 11

Number: 2

URI: https://scholar.korea.ac.kr/handle/2021.sw.korea/49365

DOI: 10.3390/diagnostics11020300

ISSN: 2075-4418

Abstract: Background: Video fluoroscopic swallowing study (VFSS) is considered as the gold standard diagnostic tool for evaluating dysphagia. However, it is time consuming and labor intensive for the clinician to manually search the recorded long video image frame by frame to identify the instantaneous swallowing abnormality in VFSS images. Therefore, this study aims to present a deep leaning-based approach using transfer learning with a convolutional neural network (CNN) that automatically annotates pharyngeal phase frames in untrimmed VFSS videos such that frames need not be searched manually. Methods: To determine whether the image frame in the VFSS video is in the pharyngeal phase, a single-frame baseline architecture based the deep CNN framework is used and a transfer learning technique with fine-tuning is applied. Results: Compared with all experimental CNN models, that fine-tuned with two blocks of the VGG-16 (VGG16-FT5) model achieved the highest performance in terms of recognizing the frame of pharyngeal phase, that is, the accuracy of 93.20 (+/- 1.25)%, sensitivity of 84.57 (+/- 5.19)%, specificity of 94.36 (+/- 1.21)%, AUC of 0.8947 (+/- 0.0269) and Kappa of 0.7093 (+/- 0.0488). Conclusions: Using appropriate and fine-tuning techniques and explainable deep learning techniques such as grad CAM, this study shows that the proposed single-frame-baseline-architecture-based deep CNN framework can yield high performances in the full automation of VFSS video analysis.

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School > Department of Biomedical Sciences > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,708,345; Today View :39,786

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE