All Issue

2026 Vol.14, Issue 1 Preview Page

Research Article

31 March 2026. pp. 53-68
Abstract
본 연구는 메이저리그 베이스볼(MLB) 팀 단위 성과 데이터를 활용하여 머신러닝 기반 팀 승률 예측 모델을 구축하고, 예측 결과의 해석을 통해 훈련 전략 수립을 위한 의사결정 지원 시사점을 도출하는 것을 목적으로 한다. 2015년부터 2024년까지의 시즌 데이터를 기반으로 선형 회귀 및 규칙 기반 모델을 기준선으로 설정하고, Random Forest와 XGBoost 모델을 적용하여 예측 성능을 비교·분석하였다. 실험 결과, 머신러닝 기반 모델은 기준선 모델 대비 전반적으로 우수한 예측 성능을 보였으며, 특히 XGBoost 모델이 가장 높은 설명력과 안정성을 나타냈다. 또한 총합 성과 지표를 제외한 변수 중요도 분석 결과, 투수진의 출루 억제 및 실점 관리 지표가 공격 지표보다 상대적으로 높은 중요도를 보였다. 이는 팀 승률이 공격 생산성뿐 아니라 투수 성과에 의해 보다 안정적으로 설명될 수 있음을 시사한다. 본 연구는 머신러닝 모델이 단순한 예측 도구를 넘어, 팀 훈련 전략 및 자원 배분을 위한 데이터 기반 의사결정 지원 도구로 활용될 수 있음을 보여준다.
The purpose of this study is to develop a machine learning-based team win rate prediction model using Major League Baseball (MLB) team-level performance data and to derive implications for training strategy decision-making through interpretation of prediction results. Season data from 2015 to 2024 were used to establish linear regression and rule-based models as baselines, and Random Forest and XGBoost models were applied to compare predictive performance. The experimental results show that machine learning-based models outperform baseline models overall, with the XGBoost model demonstrating the highest explanatory power and stability. Furthermore, variable importance analysis excluding aggregate performance indicators reveals that pitching-related metrics, such as on-base suppression and run prevention, exhibit higher importance than offensive metrics. This suggests that team win rates can be explained more stably by pitching performance in addition to offensive productivity. The findings indicate that machine learning models can be extended beyond simple prediction tools to data-driven decision support systems for team training strategies and resource allocation.
References
  1. John Thorn and Pete Palmer, “The hidden game of baseball: A revolutionary approach to baseball and its statistics”, University of Chicago Press, 1984.

  2. F. Ahmed, K. Deb, and T. M. Khoshgoftaar, “Use of machine learning and deep learning to predict the outcomes of major league baseball matches”, Applied Sciences, Vol. 11, No. 10, pp. 1-18, 2021.

    10.3390/app11104499
  3. S. H. Choi and S.-K. Ji, “A study of winning percentage in the MLB using fuzzy markov regression”, Mathematics, Vol. 13, No. 6, 1008, 2025.

    10.3390/math13061008
  4. M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we need hundreds of classifiers to solve real world classification problems?”, Journal of Machine Learning Research, Vol. 15, pp. 3133-3181, 2014.

  5. C. H. An, “A Study on the prediction model of KOSDAQ index using transfer function model”, The Society of Convergence Knowledge Transactions, Vol. 8, No. 2, pp. 11-19, 2020.

    10.22716/SCKT.2020.8.2.010
  6. S. J. Miller, “A derivation of the pythagorean won–loss formula in baseball”, Chance, Vol. 20, No. 1, pp. 40-48, 2007.

    10.1080/09332480.2007.10722831
  7. L. Breiman, “Random forests”, Machine Learning, Vol. 45, No. 1, pp. 5-32, 2001.

    10.1023/A:1010933404324
  8. J. H. Friedman, “Greedy function approximation: A gradient boosting machine”, The Annals of Statistics, Vol. 29, No. 5, pp. 1189-1232, 2001.

    10.1214/aos/1013203451
  9. M. Marchi and J. Albert, “Analyzing baseball data with R”, Journal of Quantitative Analysis in Sports, Vol. 9, No. 2, 2013.

  10. T. J. Gabbett, “The training–injury prevention paradox: Should athletes be training smarter and harder?”, British Journal of Sports Medicine, Vol. 50, No. 5, pp. 273-280, 2016.

    10.1136/bjsports-2015-095788 26758673 PMC4789704
  11. S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions”, Advances in Neural Information Processing Systems (NeurIPS), 2017.

  12. G. Kim and J. Seo, “Machine learning-based optimized training strategies using MLB team records and win rates”, Proceedings of the 2025 Fall Conference of the Society of Convergence Knowledge, Jeju, South Korea, 2025.

Information
  • Publisher :The Society of Convergence Knowledge
  • Publisher(Ko) :융복합지식학회
  • Journal Title :The Society of Convergence Knowledge Transactions
  • Journal Title(Ko) :융복합지식학회논문지
  • Volume : 14
  • No :1
  • Pages :53-68