A Study on Hierarchical Deep Learning-Based Automated Diagnosis Model for Dysphagia Using VFSS Images

doi:10.22716/sckt.2026.14.1.008

All Issue

2026 Vol.14, Issue 1 Preview Page Next Page

Research Article

A Study on Hierarchical Deep Learning-Based Automated Diagnosis Model for Dysphagia Using VFSS Images VFSS 영상을 활용한 계층적 딥러닝 기반 삼킴장애 자동 진단 모델 연구: Jin Gyeong Lee¹ and Hee-Kyung Moon²
이진경¹, 문희경²; ¹Researcher, Sarcopenia Total Solution Center, Wonkwang University
²Research Professor, Office of Educational Innovation, Wonkwang University

¹원광대학교 근감소증 토탈솔루션 센터 연구원, ²원광대학교 교육혁신원 연구교수

31 March 2026. pp. 87-98

PDF

Abstract

비디오투시연하검사(VFSS)는 삼킴장애(Dysphagia)를 평가하는 표준 진단 도구이나, 판독 시간 지연과 평가자 간 편차 등 주관적 요소에 의존한다는 한계가 있다. 이러한 문제를 해결하기 위해 최근에는 인공지능(AI)을 활용한 자동화 분석 기법을 도입하여 VFSS 영상의 정량적이고 표준화된 해석을 가능하게 하려는 연구가 국내외에서 활발히 진행되고 있다. 본 연구에서는 VFSS 영상에서 주요 증상인 침투(Penetration)와 흡인(Aspiration)을 자동으로 분류하기 위한 계층적 딥러닝 모델(Hierarchical Deep Learning Model )을 제안한다. 이를 위해 VFSS 데이터를 대상으로 화질과 가시성이 확보된 고품질 영상을 선별하여 학습 및 평가에 활용하였다. 제안하는 모델은 객체 탐지(YOLOv8), 영역 분할(U-Net), 이미지 분류(ResNet18)의 세 단계가 순차적으로 수행되는 파이프라인 구조를 가진다. 모델은 각 구성 단계별로 독립적인 학습과 평가를 수행하였으며, 이를 통해 전체 파이프라인의 성능과 실효성을 검증하였다. 본 연구의 결과로 단계별 딥러닝 모듈을 결합하여 삼킴장애 진단 보조 도구로서의 가능성을 입증하였으며, 향후 객체 탐지 모듈의 고도화를 통해 완전 자동화된 진단 시스템으로 발전시킨다면 임상 환경에서 진단 효율성과 정확성을 크게 높일 수 있을 것으로 기대된다.

Videofluoroscopic Swallowing Study (VFSS) is the gold standard for evaluating dysphagia. However, it is limited by delayed interpretation times and subjective variability among raters. To address these issues, recent domestic and international studies have focused on integrating Artificial Intelligence (AI) to enable quantitative and standardized analysis of VFSS images. In this study, we propose a Hierarchical Deep Learning Model designed to automatically classify penetration and aspiration, the primary symptoms observed in VFSS. High-quality VFSS videos with ensured clarity and visibility were selected for training and evaluation. The proposed model features a sequential pipeline architecture consisting of three stages: object detection (YOLOv8), semantic segmentation (U-Net), and image classification (ResNet18). Each module within the pipeline was independently trained and evaluated to verify its performance and practical efficacy. Our results demonstrate the potential of this integrated deep learning framework as a diagnostic support tool for dysphagia. We anticipate that further refinement of the object detection module will lead to a fully automated diagnostic system, significantly enhancing diagnostic efficiency and accuracy in clinical environments.

Keywords

Dysphagia

Videofluoroscopic Swallowing Study (VFSS)

Deep Learning

Penetration

Aspiration

References

P. E. Marik and D. Kaplan, “Aspiration pneumonia and dysphagia in the elderly”, Chest Journal, Vol. 124, No. 1. pp. 328-336, July 2003. DOI: https://doi.org/10.1378/chest.124.1.328
10.1378/chest.124.1.328
W. J. Dodds, “The physiology of swallowing”, Dysphagia 3, Vol. 3, pp. 171-178, 1989. DOI: https://doi.org/10.1007/BF02407219
10.1007/BF02407219
K. C. Chen, Y. Jeng, W. T. Wu, T. G. Wang, D. S. Han, L. Özçakar, and K. V. Chang, “Sarcopenic dysphagia: A narrative review from diagnosis to intervention”, Nutrients, Vol. 13, No. 11, pp. 1-19, 2021. DOI: https://doi.org/10.3390/nu13114043. PMID: 34836299; PMCID: PMC8621579.
10.3390/nu13114043
B. Martin-Harris, and B. Jones, “The videofluorographic swallowing study”, Physical Medicine and Rehabilitation Clinics of North America, Vol. 19, No. 4, pp. 769-785, 2008. DOI: https://doi.org/10.1016/j.pmr.2008.06.004. PMID: 18940640; PMCID: PMC2586156.
10.1016/j.pmr.2008.06.004
G. H. McCullough, R. T. Wertz, J. C. Rosenbek, R. H. Mills, W. G. Webb, and K. B. Ross, “Inter-And intrajudge reliability for videofluoroscopic swallowing evaluation measures”, Dysphagia, Vol. 16, No. 2, pp. 110-118, 2001. DOI: https://doi.org/10.1007/PL00021291. PMID: 11305220.
10.1007/PL00021291
I. Min, H. Woo, J. Y. Kim, T. L. Kim, Y. Lee, W. K. Chang, S. H. Jung, W. H. Lee, B. M. Oh, T. R. Han, and H. G. Seo, “Inter-rater and Intra-rater reliability of the videofluoroscopic dysphagia scale with the standardized protocol”, Dysphagia, Vol. 39, No. 1, pp. 43-51, 2024. DOI: https://doi.org/10.1007/s00455-023-10590-1. Epub 2023 May 19. PMID: 37204525.
10.1007/s00455-023-10590-1
C.W. Jeong, D.W. Lim, S.H. Noh, H.K. Moon, C. Park, N. Ko, and M.S. Kim, “Multi-center validation of artificial intelligence-based video analysis platform for automatic evaluation of swallowing disorders”, Diagnostics, Vol. 16, No. 45, pp. 1-13, 2025. DOI: https://doi.org/10.3390/diagnostics16010045.
10.3390/diagnostics16010045
D.W. Lim, C.S. Lee, and H.K. Moon, “Development of AI web service for diagnosis of swallowing disorders”, The Society of Convergence Knowledge Transactions, Vol. 11. No. 4, pp. 93-104, Dec. 2023. DOI: https://doi.org/10.22716/sckt.2023.11.4.038
10.22716/sckt.2023.11.4.038
C.W. Jeong, C.S. Lee, D.W. Lim, S.H. Noh, H.K. Moon, C. Park, and M.S. Kim, “The development of an artificial intelligence video analysis-based web application to diagnose oropharyngeal Dysphagia: A pilot study”, Brain Sciences, Vol. 14, No. 6, pp 1-14, 2024. DOI: https://doi.org/10.3390/brainsci14060546
10.3390/brainsci14060546
H.K. Moon, “Development of a YOLOv7-Based web AI system for automated VFSS swallowing disorder diagnosis”, International Journal of Advanced Smart Convergence, Vol. 14, No. 3, pp. 352-359, 2025. DOI: http://dx.doi.org/10.7236/ IJASC.2025.14.3.352
10.7236/
J. K. Kim, Y. J. Choo, G. S. Choi, H. K. Shin, M. C. Chang, and D. H. Park, “Deep learning analysis to automatically detect the presence of penetration or aspiration in videofluoroscopic swallowing study”, Journal of Korean Medical Science, Vol. 37, No. 6, pp. 1-8, 2022. DOI: https://doi.org/10.3346/jkms.2022.37.e42
10.3346/jkms.2022.37.e42
Y. Ariji, M. Gotoh, M. Fukuda, S. Watanabe, T. Nagao, A. Katsumata and E. Ariji, “A preliminary deep learning study on automatic segmentation of contrast-enhanced bolus in videofluorography of swallowing”, Scientific Reports, Vol. 12, No. 18754, pp. 1-8, 2022. DOI: https://doi.org/10.1038/s41598-022-21530-8
10.1038/s41598-022-21530-8
K. H. Nam, C. Y. Lee , T. H. Lee, M. S. Shin, B. H. Kim, and J. W. Park “Automated laryngeal invasion detector of boluses in videofluoroscopic swallowing study videos using action recognition-based networks”, Diagnostics, Vol. 14, No. 13, pp. 1-8, 2024. DOI: https://doi.org/10.3390/diagnostics14131444
10.3390/diagnostics14131444
S. J. Hwang, H. B. Moon, and J. W. Park, “Automated penetration–Aspiration scale scoring with deep learning (VFSS video clips)”, Vol. 106, No. 4, pp. e29, Archives of Physical Medicine and Rehabilitation, 2025. DOI: https://doi.org/10.1016/j.apmr.2025.01.075. 2025.01.075
10.1016/j.apmr.2025.01.075
A. Fakhry, S. M. Antony, E. Park, and J. T. Lee, “Deep learning for video fluoroscopic swallowing study analysis: A survey on classification, detection, and segmentation techniques”, IEEE Access, Vol. 13, pp. 94239-94255, 2025. DOI: https://doi.org/10.1109/ACCESS.2025.3573282
10.1109/ACCESS.2025.3573282
K. Matsuo, and J. B. Palmer, “Anatomy and physiology of feeding and swallowing: normal and abnormal”, Physical Medicine and Rehabilitation Clinics of North America, Vol. 19, No. 4, pp. 691-707, 2008. DOI: https://doi.org/10.1016/j.pmr.2008.06.001. PMID: 18940636; PMCID: PMC2597750.
10.1016/j.pmr.2008.06.001
J. C. Rosenbek, J. A. Robbins, E. B. Roecker, J. L. Coyle, and J. L. Wood, “A penetration-aspiration scale”, Dysphagia, Vol. 11, No. 2, pp. 93-98, 1996. DOI: https://doi.org/10.1007/BF00417897. PMID: 8721066.
10.1007/BF00417897
J. Terven, D. M. Cordova-Esparza, and J. A. Romero-Gonzalez, “A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS”, Vol. 5, No.4, pp. 1680-1716, 2023. DOI: https://doi.org/10.3390/make5040083
10.3390/make5040083
N. Siddique, S. Paheding, C. P. Elkin, and V. Devabhaktuni, “U-Net and its variants for medical image segmentation: A review of theory and applications”, IEEE Access, Vol. 9, pp. 82031-82057, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3086020.
10.1109/ACCESS.2021.3086020
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp. 770-778.
10.1109/CVPR.2016.90
L. Maaten and G. Hinton, “Visualizing Data using t-SNE”, Journal of Machine Learning Research (JMLR), Vol. 9, pp. 2579-2605, 2008.

Information

Publisher :The Society of Convergence Knowledge
Publisher(Ko) :융복합지식학회
Journal Title :The Society of Convergence Knowledge Transactions
Journal Title(Ko) :융복합지식학회논문지
Volume : 14
No :1
Pages :87-98
DOI :https://doi.org/10.22716/sckt.2026.14.1.008

[1] P. E. Marik and D. Kaplan, “Aspiration pneumonia and dysphagia in the elderly”, Chest Journal, Vol. 124, No. 1. pp. 328-336, July 2003. DOI: https://doi.org/10.1378/chest.124.1.328
10.1378/chest.124.1.328

[2] W. J. Dodds, “The physiology of swallowing”, Dysphagia 3, Vol. 3, pp. 171-178, 1989. DOI: https://doi.org/10.1007/BF02407219
10.1007/BF02407219

[3] K. C. Chen, Y. Jeng, W. T. Wu, T. G. Wang, D. S. Han, L. Özçakar, and K. V. Chang, “Sarcopenic dysphagia: A narrative review from diagnosis to intervention”, Nutrients, Vol. 13, No. 11, pp. 1-19, 2021. DOI: https://doi.org/10.3390/nu13114043. PMID: 34836299; PMCID: PMC8621579.
10.3390/nu13114043

[4] B. Martin-Harris, and B. Jones, “The videofluorographic swallowing study”, Physical Medicine and Rehabilitation Clinics of North America, Vol. 19, No. 4, pp. 769-785, 2008. DOI: https://doi.org/10.1016/j.pmr.2008.06.004. PMID: 18940640; PMCID: PMC2586156.
10.1016/j.pmr.2008.06.004

[5] G. H. McCullough, R. T. Wertz, J. C. Rosenbek, R. H. Mills, W. G. Webb, and K. B. Ross, “Inter-And intrajudge reliability for videofluoroscopic swallowing evaluation measures”, Dysphagia, Vol. 16, No. 2, pp. 110-118, 2001. DOI: https://doi.org/10.1007/PL00021291. PMID: 11305220.
10.1007/PL00021291

[6] I. Min, H. Woo, J. Y. Kim, T. L. Kim, Y. Lee, W. K. Chang, S. H. Jung, W. H. Lee, B. M. Oh, T. R. Han, and H. G. Seo, “Inter-rater and Intra-rater reliability of the videofluoroscopic dysphagia scale with the standardized protocol”, Dysphagia, Vol. 39, No. 1, pp. 43-51, 2024. DOI: https://doi.org/10.1007/s00455-023-10590-1. Epub 2023 May 19. PMID: 37204525.
10.1007/s00455-023-10590-1

[7] C.W. Jeong, D.W. Lim, S.H. Noh, H.K. Moon, C. Park, N. Ko, and M.S. Kim, “Multi-center validation of artificial intelligence-based video analysis platform for automatic evaluation of swallowing disorders”, Diagnostics, Vol. 16, No. 45, pp. 1-13, 2025. DOI: https://doi.org/10.3390/diagnostics16010045.
10.3390/diagnostics16010045

[8] D.W. Lim, C.S. Lee, and H.K. Moon, “Development of AI web service for diagnosis of swallowing disorders”, The Society of Convergence Knowledge Transactions, Vol. 11. No. 4, pp. 93-104, Dec. 2023. DOI: https://doi.org/10.22716/sckt.2023.11.4.038
10.22716/sckt.2023.11.4.038

[9] C.W. Jeong, C.S. Lee, D.W. Lim, S.H. Noh, H.K. Moon, C. Park, and M.S. Kim, “The development of an artificial intelligence video analysis-based web application to diagnose oropharyngeal Dysphagia: A pilot study”, Brain Sciences, Vol. 14, No. 6, pp 1-14, 2024. DOI: https://doi.org/10.3390/brainsci14060546
10.3390/brainsci14060546

[10] H.K. Moon, “Development of a YOLOv7-Based web AI system for automated VFSS swallowing disorder diagnosis”, International Journal of Advanced Smart Convergence, Vol. 14, No. 3, pp. 352-359, 2025. DOI: http://dx.doi.org/10.7236/ IJASC.2025.14.3.352
10.7236/

[11] J. K. Kim, Y. J. Choo, G. S. Choi, H. K. Shin, M. C. Chang, and D. H. Park, “Deep learning analysis to automatically detect the presence of penetration or aspiration in videofluoroscopic swallowing study”, Journal of Korean Medical Science, Vol. 37, No. 6, pp. 1-8, 2022. DOI: https://doi.org/10.3346/jkms.2022.37.e42
10.3346/jkms.2022.37.e42

[12] Y. Ariji, M. Gotoh, M. Fukuda, S. Watanabe, T. Nagao, A. Katsumata and E. Ariji, “A preliminary deep learning study on automatic segmentation of contrast-enhanced bolus in videofluorography of swallowing”, Scientific Reports, Vol. 12, No. 18754, pp. 1-8, 2022. DOI: https://doi.org/10.1038/s41598-022-21530-8
10.1038/s41598-022-21530-8

[13] K. H. Nam, C. Y. Lee , T. H. Lee, M. S. Shin, B. H. Kim, and J. W. Park “Automated laryngeal invasion detector of boluses in videofluoroscopic swallowing study videos using action recognition-based networks”, Diagnostics, Vol. 14, No. 13, pp. 1-8, 2024. DOI: https://doi.org/10.3390/diagnostics14131444
10.3390/diagnostics14131444

[14] S. J. Hwang, H. B. Moon, and J. W. Park, “Automated penetration–Aspiration scale scoring with deep learning (VFSS video clips)”, Vol. 106, No. 4, pp. e29, Archives of Physical Medicine and Rehabilitation, 2025. DOI: https://doi.org/10.1016/j.apmr.2025.01.075. 2025.01.075
10.1016/j.apmr.2025.01.075

[15] A. Fakhry, S. M. Antony, E. Park, and J. T. Lee, “Deep learning for video fluoroscopic swallowing study analysis: A survey on classification, detection, and segmentation techniques”, IEEE Access, Vol. 13, pp. 94239-94255, 2025. DOI: https://doi.org/10.1109/ACCESS.2025.3573282
10.1109/ACCESS.2025.3573282

[16] K. Matsuo, and J. B. Palmer, “Anatomy and physiology of feeding and swallowing: normal and abnormal”, Physical Medicine and Rehabilitation Clinics of North America, Vol. 19, No. 4, pp. 691-707, 2008. DOI: https://doi.org/10.1016/j.pmr.2008.06.001. PMID: 18940636; PMCID: PMC2597750.
10.1016/j.pmr.2008.06.001

[17] J. C. Rosenbek, J. A. Robbins, E. B. Roecker, J. L. Coyle, and J. L. Wood, “A penetration-aspiration scale”, Dysphagia, Vol. 11, No. 2, pp. 93-98, 1996. DOI: https://doi.org/10.1007/BF00417897. PMID: 8721066.
10.1007/BF00417897

[18] J. Terven, D. M. Cordova-Esparza, and J. A. Romero-Gonzalez, “A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS”, Vol. 5, No.4, pp. 1680-1716, 2023. DOI: https://doi.org/10.3390/make5040083
10.3390/make5040083

[19] N. Siddique, S. Paheding, C. P. Elkin, and V. Devabhaktuni, “U-Net and its variants for medical image segmentation: A review of theory and applications”, IEEE Access, Vol. 9, pp. 82031-82057, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3086020.
10.1109/ACCESS.2021.3086020

[20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp. 770-778.
10.1109/CVPR.2016.90

[21] L. Maaten and G. Hinton, “Visualizing Data using t-SNE”, Journal of Machine Learning Research (JMLR), Vol. 9, pp. 2579-2605, 2008.

The Society of Convergence Knowledge Transactions ISSN:2287-8920(Print) 융복합지식학회논문지

All Issue