Development of an Ozone Concentration Prediction Model Using Machine Learning

doi:10.22716/sckt.2026.14.1.004

All Issue

2026 Vol.14, Issue 1 Preview Page Next Page

Research Article

Development of an Ozone Concentration Prediction Model Using Machine Learning 기계학습을 이용한 오존 농도 예측 모형 개발: Inkyu Song¹ and Jaehyun Kim²
송인규¹ ,김재현²; ¹ Associate Professor, School of Future Convergence, Seokyeong University
² Professor, School of Future Convergence, Seokyeong University

¹ 서경대학교 미래융합학부 부교수, ² 서경대학교 미래융합학부 교수

31 March 2026. pp. 33-44

PDF

Abstract

산업화 사회 전환 이후 대기오염물질에 대한 관심이 증가하였다. 대기오염은 인간의 건강과 환경에 악영향을 미치기 때문에, 주요 대기오염물질인 미세먼지와 오존(O3)은 인류의 건강에 큰 위험이 되고 있다. 특히 어린이와 노인과 같은 건강 취약층에게는 더 치명적인 위해 요소이므로 대기오염물질의 농도를 감축하는 것은 시민의 건강과 직결된다. 대기오염물질인 오존 농도 변화를 사전에 정확히 예측할 수 있다면 사전 대비를 통해 위해성을 감소시킬 수 있다. 본 연구에서는 서울시 3개 자치구의 2022년 1월부터 2024년 12월까지 3년간의 AirKorea 측정 자료를 활용해 트리 기반의 기계학습 알고리즘인 CatBoost, XGBoost, Random Forest, LSTM을 적용해 예측을 실시하였다. 예측 모형에는 오존뿐만 아니라 SO₂, CO, NO₂, PM2.5, 기온, 일사량과 더불어 오존 시차 특성과 오존의 24시간 주기적 변동을 반영하기 위한 파생 변수를 생성해 입력 변수로 사용하였다. 그 결과 단기, 중기, 장기 예측에서 CatBoost의 예측 성능이 좋은 것으로 나타났다. 특히 NO₂, PM2.5, 기온, 일사량 등 주요 변수의 영향력이 규명되어 오존 농도 변동의 핵심 요인을 제시하였다. 이러한 결과는 오존 농도 예측 연구에서 변수 선택과 모델 구조가 예측 성능 향상에 중요한 역할을 한다는 점을 뒷받침하며, 향후 다양한 지역과 시기별 맞춤형 예측 모형 개발을 위한 기초 자료로 활용될 수 있다. 본 연구 결과의 활용 방안으로는 소지역 오존 농도 예측 서비스 경보와 사전 대처가 가능하며 소지역 맞춤형 대기질 관리 정책 수립에 기여할 것으로 기대된다.

Since the transition to an industrialized society, public concern regarding air pollutants has increased. Air pollution exerts detrimental effects on both human health and the environment, with major pollutants such as fine particulate matter (PM) and ozone posing significant risks to human well-being. These risks are particularly severe for vulnerable populations, including children and the elderly, for whom exposure can be more fatal. Consequently, reducing the concentration of air pollutants is directly linked to safeguarding public health. In this study, we applied tree-based machine learning algorithms—CatBoost, XGBoost, Random Forest, and LSTM—to forecast ozone concentrations using AirKorea data collected from three districts in Seoul between January 2022 and December 2024. The prediction models incorporated not only ozone but also O₂, CO, NO₂ PM2.5, temperature, solar radiation, as well as derived features reflecting ozone lag characteristics and its 24-hour periodic fluctuations. The results showed that CatBoost achieved superior performance in short-, medium-, and long-term forecasting. These findings support the view that variable selection and model architecture play critical roles in improving prediction accuracy in ozone forecasting. Furthermore, the results can serve as foundational data for developing customized prediction models across different regions and time periods. From a practical perspective, the proposed models can be utilized to establish localized ozone prediction services and early warning systems, thereby enabling proactive responses and contributing to the formulation of tailored air quality management policies.

Keywords

Ozone Concentration

Machine Learning

Deep Learning

Prediction

Policy

References

Health Effects Institute, “State of Global Air 2024”, Special Report, pp.1-35, 2024.
https://www.airkorea.or.kr/web/airMatter?pMENU_NO=130
미세먼지특별대책위원회, “미세먼지 관리 종합계획(2020~2024)”, 2019.
R. B. Devlin, K. E. Duncan, M. Jardim, M. T. Schmitt, A. G. Rappold, and D. Diaz-Sanchez, “Controlled exposure of healthy young volunteers to ozone causes cardiovascular effects,” Circulation, Vol. 126, No. 1, pp. 104-111, 2012.
10.1161/CIRCULATIONAHA.112.094359
정주연, 김호, “오존이 일별 사망률에 미치는 영향”, 한국보건통계학회지, 제26권 제1호, pp. 3-13, 2001.
C. Capilla, “Prediction of hourly ozone concentrations with multiple regression and multilayer perceptron models.” International Journal of Sustainable Development and Planning, Vol. 11, pp. 558-565, 2016. doi:10.2495/SDP-V11-N4-558-565.
10.2495/SDP-V11-N4-558-565
B. Zhang, C. Song, and X. Jiang, “Spatiotemporal prediction of O3 concentration based on the KNN-Prophet-LSTM model”, Heliyon, Vol. 8, No. 11, e11670, 2022. doi:10.1016/j.heliyon.2022.e11670.
10.1016/j.heliyon.2022.e11670 36468093 PMC9712550
G. Lin, H. Zhao, and Y. Chi, “A comprehensive evaluation of deep learning approaches for ground-level ozone prediction across different regions”, Ecological Informatics, Vol. 86, Article 103014, 2025.
10.1016/j.ecoinf.2025.103024
J. Wang, J. Dong, J. Guo, P. Cai, R. Li, X. Zhang, et al., “Understanding temporal patterns and determinants of ground-level ozone”, Atmosphere, (Basel). Vol. 14, Issue 3, 604, 2023. doi:10.3390/atmos14030604.
10.3390/atmos14030604
Z. Liu1, Z. Lu, W. Zhu, J. Yuan, et. al., “Comparison of machine learning methods for predicting ground-level ozone pollution in Beijing”, Frontiers in Environmental Science, Vol. 13, 2025. https://doi.org/10.3389/fenvs.2025.1561794 /fenvs.2025.1561794.
10.3389/fenvs.2025.1561794
J. Du, F. Qiao, P. Lu, and L. Yu, “Forecasting ground-level ozone concentration levels using machine learning”, Resources, Conservation and Recycling, Vol. 184, https://doi.org/10.1016/j.resconrec.2022.106380.
10.1016/j.resconrec.2022.106380
전민종, 최혜진, 박지웅, 최하영, 이동희, 이욱, “Catboost 알고리즘을 통한 교통흐름 예측에 관한 연구”, 한국산학기술학회논문지, 제22권 제3호, pp. 58-64, 2021. https://doi.org/10.5762/KAIS.2021.22.3.58ISSN 1975-4701/eISSN 2288-4688.
10.5762/KAIS.2021.22.3.58ISSN
L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features”, Advanced in Neural Information Processing Systems, Vol. 31, pp. 6639-6649, 2018.
T. Chen, and C. Guestrin, “XGBoost : A Scalable Tree Boosting System”, KDD’16, p. 785, 2016.
10.1145/2939672.2939785
정유정, “XGBoost를 이용한 수질 농도 예측에 관한 연구”, 정보기술융합공학논문지, 제12권 제2호, pp. 27-33, 2022. http://data.doi.or.kr/10.22733/JITAE.2022.12.02.004.
10.22733/JITAE.2022.12.02.004
XGBoost homaepage “https://xgboost.readthedocs.io”.
Y. Lee, H, Kim, D. Lee, C. Lee, and D. Lee, “Validation of forecasting performance of two-stage probabilistic solar irradiation and solar power forecasting algorithm using XGBoost”, Journal of the Transactions of the Korean Institute of Electrical Engineers, Vol. 68, No. 12, p. 1704, 2019.
10.5370/KIEE.2019.68.12.1704
김판준, “랜덤포레스트를 이용한 국내 학술지 논문의 자동분류에 관한 연구”, 정보관리학회지, 제36권 제2호, pp. 57-77, 2019. http://dx.doi.org/10.3743/KOSIM.2019.36.2.057.
10.3743/KOSIM.2019.36.2.057
L. Breiman, Machine Learning, Random Forests, Vol. 45, No. 1, pp. 5-32 , 2001.
10.1023/A:1010933404324
L. Breiman, Bagging predictors, Machine Learning, Vol. 24, pp. 123-140, 1996.
10.1023/A:1018054314350
S. Hochreiter, and J. Schmidhuber, “Long short-term memory”, Neural Computation, Vol. 9, No. 8, pp. 1735-1780, 1997.
10.1162/neco.1997.9.8.1735
A. Graves, “Generating sequences with recurrent neural networks”, arXiv preprint, arXiv:1308.0850, 2013. https://doi.org/10.48550/arXiv.1308.0850.
10.48550/arXiv.1308.0850
B. Shickel, P. J. Tighe, A. Bihorac, and P. Rashidi, “Deep EHR: A survey of re-cent advances in deep learning techni-ques for electronic health record (EHR) analysis”, IEEE Journal of Biomedical and Health Informatics, Vol. 22, No. 5, pp. 1589-1604, 2018.
10.1109/JBHI.2017.2767063 29989977 PMC6043423
M. Schuster, and K. K. Paliwal, “Bidirectional recurrent neural networks”, IEEE Trans. Signal Process, Vol. 45, pp. 2673-2681, 1997. https://doi.org/10.1109/78.650093.
10.1109/78.650093
에어코리아 : 최종확정 측정자료, https://www.airkorea.or.kr/web/last_amb_hour_data?pMENU_NO=123.
기상청 API 허브, https://apihub.kma.go.kr/.
진세종, 조형준, “머신러닝을 활용한 계절 시계열 예측”, Journal of The Korean Data Analysis Society, 제22권 제5호, pp. 1779-1791, 2020. https://doi.org/10.37727/jkdas.2020.22.5.1779.
10.37727/jkdas.2020.22.5.1779

Information

Publisher :The Society of Convergence Knowledge
Publisher(Ko) :융복합지식학회
Journal Title :The Society of Convergence Knowledge Transactions
Journal Title(Ko) :융복합지식학회논문지
Volume : 14
No :1
Pages :33-44
DOI :https://doi.org/10.22716/sckt.2026.14.1.004

[1] Health Effects Institute, “State of Global Air 2024”, Special Report, pp.1-35, 2024.

[2] https://www.airkorea.or.kr/web/airMatter?pMENU_NO=130

[3] 미세먼지특별대책위원회, “미세먼지 관리 종합계획(2020~2024)”, 2019.

[4] R. B. Devlin, K. E. Duncan, M. Jardim, M. T. Schmitt, A. G. Rappold, and D. Diaz-Sanchez, “Controlled exposure of healthy young volunteers to ozone causes cardiovascular effects,” Circulation, Vol. 126, No. 1, pp. 104-111, 2012.
10.1161/CIRCULATIONAHA.112.094359

[5] 정주연, 김호, “오존이 일별 사망률에 미치는 영향”, 한국보건통계학회지, 제26권 제1호, pp. 3-13, 2001.

[6] C. Capilla, “Prediction of hourly ozone concentrations with multiple regression and multilayer perceptron models.” International Journal of Sustainable Development and Planning, Vol. 11, pp. 558-565, 2016. doi:10.2495/SDP-V11-N4-558-565.
10.2495/SDP-V11-N4-558-565

[7] B. Zhang, C. Song, and X. Jiang, “Spatiotemporal prediction of O3 concentration based on the KNN-Prophet-LSTM model”, Heliyon, Vol. 8, No. 11, e11670, 2022. doi:10.1016/j.heliyon.2022.e11670.
10.1016/j.heliyon.2022.e11670 36468093 PMC9712550

[8] G. Lin, H. Zhao, and Y. Chi, “A comprehensive evaluation of deep learning approaches for ground-level ozone prediction across different regions”, Ecological Informatics, Vol. 86, Article 103014, 2025.
10.1016/j.ecoinf.2025.103024

[9] J. Wang, J. Dong, J. Guo, P. Cai, R. Li, X. Zhang, et al., “Understanding temporal patterns and determinants of ground-level ozone”, Atmosphere, (Basel). Vol. 14, Issue 3, 604, 2023. doi:10.3390/atmos14030604.
10.3390/atmos14030604

[10] Z. Liu1, Z. Lu, W. Zhu, J. Yuan, et. al., “Comparison of machine learning methods for predicting ground-level ozone pollution in Beijing”, Frontiers in Environmental Science, Vol. 13, 2025. https://doi.org/10.3389/fenvs.2025.1561794 /fenvs.2025.1561794.
10.3389/fenvs.2025.1561794

[11] J. Du, F. Qiao, P. Lu, and L. Yu, “Forecasting ground-level ozone concentration levels using machine learning”, Resources, Conservation and Recycling, Vol. 184, https://doi.org/10.1016/j.resconrec.2022.106380.
10.1016/j.resconrec.2022.106380

[12] 전민종, 최혜진, 박지웅, 최하영, 이동희, 이욱, “Catboost 알고리즘을 통한 교통흐름 예측에 관한 연구”, 한국산학기술학회논문지, 제22권 제3호, pp. 58-64, 2021. https://doi.org/10.5762/KAIS.2021.22.3.58ISSN 1975-4701/eISSN 2288-4688.
10.5762/KAIS.2021.22.3.58ISSN

[13] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features”, Advanced in Neural Information Processing Systems, Vol. 31, pp. 6639-6649, 2018.

[14] T. Chen, and C. Guestrin, “XGBoost : A Scalable Tree Boosting System”, KDD’16, p. 785, 2016.
10.1145/2939672.2939785

[15] 정유정, “XGBoost를 이용한 수질 농도 예측에 관한 연구”, 정보기술융합공학논문지, 제12권 제2호, pp. 27-33, 2022. http://data.doi.or.kr/10.22733/JITAE.2022.12.02.004.
10.22733/JITAE.2022.12.02.004

[16] XGBoost homaepage “https://xgboost.readthedocs.io”.

[17] Y. Lee, H, Kim, D. Lee, C. Lee, and D. Lee, “Validation of forecasting performance of two-stage probabilistic solar irradiation and solar power forecasting algorithm using XGBoost”, Journal of the Transactions of the Korean Institute of Electrical Engineers, Vol. 68, No. 12, p. 1704, 2019.
10.5370/KIEE.2019.68.12.1704

[18] 김판준, “랜덤포레스트를 이용한 국내 학술지 논문의 자동분류에 관한 연구”, 정보관리학회지, 제36권 제2호, pp. 57-77, 2019. http://dx.doi.org/10.3743/KOSIM.2019.36.2.057.
10.3743/KOSIM.2019.36.2.057

[19] L. Breiman, Machine Learning, Random Forests, Vol. 45, No. 1, pp. 5-32 , 2001.
10.1023/A:1010933404324

[20] L. Breiman, Bagging predictors, Machine Learning, Vol. 24, pp. 123-140, 1996.
10.1023/A:1018054314350

[21] S. Hochreiter, and J. Schmidhuber, “Long short-term memory”, Neural Computation, Vol. 9, No. 8, pp. 1735-1780, 1997.
10.1162/neco.1997.9.8.1735

[22] A. Graves, “Generating sequences with recurrent neural networks”, arXiv preprint, arXiv:1308.0850, 2013. https://doi.org/10.48550/arXiv.1308.0850.
10.48550/arXiv.1308.0850

[23] B. Shickel, P. J. Tighe, A. Bihorac, and P. Rashidi, “Deep EHR: A survey of re-cent advances in deep learning techni-ques for electronic health record (EHR) analysis”, IEEE Journal of Biomedical and Health Informatics, Vol. 22, No. 5, pp. 1589-1604, 2018.
10.1109/JBHI.2017.2767063 29989977 PMC6043423

[24] M. Schuster, and K. K. Paliwal, “Bidirectional recurrent neural networks”, IEEE Trans. Signal Process, Vol. 45, pp. 2673-2681, 1997. https://doi.org/10.1109/78.650093.
10.1109/78.650093

[25] 에어코리아 : 최종확정 측정자료, https://www.airkorea.or.kr/web/last_amb_hour_data?pMENU_NO=123.

[26] 기상청 API 허브, https://apihub.kma.go.kr/.

[27] 진세종, 조형준, “머신러닝을 활용한 계절 시계열 예측”, Journal of The Korean Data Analysis Society, 제22권 제5호, pp. 1779-1791, 2020. https://doi.org/10.37727/jkdas.2020.22.5.1779.
10.37727/jkdas.2020.22.5.1779

The Society of Convergence Knowledge Transactions ISSN:2287-8920(Print) 융복합지식학회논문지

All Issue