External validation of deep learning-based bone-age software: a preliminary study with real world data
- Authors
- Lea, Winnah Wu-in; Hong, Suk-Joo; Nam, Hyo-Kyoung; Kang, Woo-Young; Yang, Ze-Pa; Noh, Eun-Jin
- Issue Date
- 24-1월-2022
- Publisher
- NATURE PORTFOLIO
- Citation
- SCIENTIFIC REPORTS, v.12, no.1
- Indexed
- SCIE
SCOPUS
- Journal Title
- SCIENTIFIC REPORTS
- Volume
- 12
- Number
- 1
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/136518
- DOI
- 10.1038/s41598-022-05282-z
- ISSN
- 2045-2322
- Abstract
- Artificial intelligence (AI) is increasingly being used in bone-age (BA) assessment due to its complicated and lengthy nature. We aimed to evaluate the clinical performance of a commercially available deep learning (DL)-based software for BA assessment using a real-world data. From Nov. 2018 to Feb. 2019, 474 children (35 boys, 439 girls, age 4-17 years) were enrolled. We compared the BA estimated by DL software (DL-BA) with that independently estimated by 3 reviewers (R1: Musculoskeletal radiologist, R2: Radiology resident, R3: Pediatric endocrinologist) using the traditional Greulich-Pyle atlas, then to his/her chronological age (CA). A paired t-test, Pearson's correlation coefficient, Bland-Altman plot, mean absolute error (MAE) and root mean square error (RMSE) were used for the statistical analysis. The intraclass correlation coefficient (ICC) was used for inter-rater variation. There were significant differences between DL-BA and each reviewer's BA (P < 0.025), but the correlation was good with one another (r = 0.983, P < 0.025). RMSE (MAE) values were 10.09 (7.21), 10.76 (7.88) and 13.06 (10.06) months between DL-BA and R1, R2, R3 BA. Compared with the CA, RMSE (MAE) values were 13.54 (11.06), 15.18 (12.11), 16.19 (12.78) and 19.53 (17.71) months for DL-BA, R1, R2, R3 BA, respectively. Bland-Altman plots revealed the software and reviewers' tendency to overestimate the BA in general. ICC values between 3 reviewers were 0.97, 0.85 and 0.86, and the overall ICC value was 0.93. The BA estimated by DL-based software showed statistically similar, or even better performance than that of reviewers' compared to the chronological age in the real world clinic.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Medicine > Department of Medical Science > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.