Data-Driven Obesity Classification Integrating Genetic and Lifestyle Determinants Using Naive Bayes
DOI:
https://doi.org/10.66053/aieds.v1i2.21Keywords:
Classification, Obesity, Genetic factors, Lifestyle, Naive bayesAbstract
Purpose – This study aims to develop a data-driven obesity classification framework that integrates genetic predisposition and lifestyle determinants using the Naive Bayes algorithm, while empirically evaluating optimal training–testing data proportions for health decision support systems.
Methods – A systematic computational workflow was applied to a public obesity dataset comprising 2,112 records, which was refined to 1,259 valid instances after preprocessing. Genetic indicators and lifestyle-related variables were encoded and classified into four obesity categories: normal weight, obesity type I, obesity type II, and obesity type III. The Naive Bayes model was evaluated using three training–testing data partition ratios (75:25, 80:20, and 85:15). Model performance was assessed using six metrics: Area Under the Curve (AUC), classification accuracy, F1-score, precision, recall, and Matthews Correlation Coefficient.
Findings – The results demonstrate that the 80:20 and 85:15 data partitions achieved the highest performance, with an accuracy of 0.878 and an AUC of 0.979. The model showed excellent sensitivity in identifying severe obesity cases, while moderate misclassification occurred between obesity type I and type II due to phenotypic overlap in lifestyle patterns.
Research limitations – This study relies on a single public dataset and lacks population-specific genetic calibration, which may limit generalizability to diverse regional contexts.
Originality – This study provides empirical validation of a probabilistic obesity classification framework that integrates genetic and lifestyle factors, offering an interpretable and computationally efficient approach to support data-driven health decision making.
References
Airlangga, G. (2025). A Comparative Analysis of Machine Learning Models for Obesity Prediction. Jurnal Informatika Ekonomi Bisnis, 7(1), 1–5. https://doi.org/10.37034/infeb.v7i1.1089
Alchamdani, & Anas, A. S. (2024). Urban Obesity in Transition: Socioeconomic, Lifestyle, and Environmental Drivers in Jakarta, Indonesia. Medicor : Journal of Health Informatics and Health Policy, 2(2), 113–124. https://doi.org/10.61978/medicor.v2i2.748
Andersen, E. S., Birk-Korch, J. B., Hansen, R. S., Fly, L. H., Röttger, R., Arcani, D. M. C., Brasen, C. L., Brandslund, I., & Madsen, J. S. (2024). Monitoring performance of clinical artificial intelligence in health care: A scoping review. In JBI Evidence Synthesis (Vol. 22, Issue 12, pp. 2423–2446). Lippincott Williams and Wilkins. https://doi.org/10.11124/JBIES-24-00042
Awalia, A. D. N., Hani, M. F., & Surianto, D. F. (2025). Analysis of Naive Bayes and Support Vector Machine Algorithms in Classification of Diabetes Cases Based on Lifestyle Factors. In Journal of Embedded System Security and Intelligent Systems (Vol. 6, Issue 3).
Bhutta, Z. A. (2025). Global Burden of Disease 2023: Challenges and opportunities for a growing collaboration. In PLoS medicine (Vol. 22, Issue 11, p. e1004838). https://doi.org/10.1371/journal.pmed.1004838
Bineid, M. M., Ventura, E. F., Samidoust, A., Radha, V., Anjana, R. M., Sudha, V., Walton, G. E., Mohan, V., & Vimaleswaran, K. S. (2025). A Systematic Review of the Effect of Gene-Lifestyle Interactions on Metabolic-Disease-Related Traits in South Asian Populations. In Nutrition Reviews (Vol. 83, Issue 6, pp. 1061–1082). Oxford University Press. https://doi.org/10.1093/nutrit/nuae115
Chen, H. H., Chen, C. H., Hou, M. C., Fu, Y. C., Li, L. H., Chou, C. Y., Yeh, E. C., Tsai, M. F., Chen, C. H., Yang, H. C., Huang, Y. T., Liu, Y. M., Wei, C. Y., Su, J. P., Lin, W. J., Wang, E. H. F., Chiang, C. L., Jiang, J. K., Lee, I. H., … Fann, C. S. J. (2025). Population-specific polygenic risk scores for people of Han Chinese ancestry. Nature. https://doi.org/10.1038/s41586-025-09350-y
Colozza, D., Wang, Y. C., & Avendano, M. (2023). Does urbanisation lead to unhealthy diets? Longitudinal evidence from Indonesia. Health and Place, 83. https://doi.org/10.1016/j.healthplace.2023.103091
Dashti, H. S., Miranda, N., Cade, B. E., Huang, T., Redline, S., Karlson, E. W., & Saxena, R. (2022). Interaction of obesity polygenic score with lifestyle risk factors in an electronic health record biobank. BMC Medicine, 20(1). https://doi.org/10.1186/s12916-021-02198-9
Downie, C. G., Shrestha, P., Okello, S., Yaser, M., Lee, H. H., Wang, Y., Krishnan, M., Chen, H. H., Justice, A. E., Chittoor, G., Josyula, N. S., Gahagan, S., Blanco, E., Burrows, R., Correa-Burrows, P., Albala, C., Santos, J. L., Angel, B., Lozoff, B., … North, K. E. (2025). Trans-ancestry genome-wide association study of childhood body mass index identifies novel loci and age-specific effects. Human Genetics and Genomics Advances, 6(2). https://doi.org/10.1016/j.xhgg.2025.100411
Hadi, A., Qamal, M., & Afrillia, Y. (2025). Comparison of Random Forest Algorithm Classifier and Naïve Bayes Algorithm in Whatsapp Message Type Classification. Journal of Renewable Energy, Electrical, and Computer Engineering, 5(1), 9–17. https://doi.org/10.29103/jreece.v5i1.21227
Hosseinpour-Niazi, S., Niknam, M., Amiri, P., Mirmiran, P., Einy, E., Izadi, N., Gaeini, Z., & Azizi, F. (2024). The association between ultra-processed food consumption and health-related quality of life differs across lifestyle and socioeconomic strata. BMC Public Health, 24(1). https://doi.org/10.1186/s12889-024-19351-7
Huangfu, Y., Palloni, A., Beltrán-Sánchez, H., & McEniry, M. C. (2023). Gene-environment interactions and the case of body mass index and obesity: How much do they matter? PNAS Nexus, 2(7). https://doi.org/10.1093/pnasnexus/pgad213
Hüls, A., Wright, M. N., Bogl, L. H., Kaprio, J., Lissner, L., Molnár, D., Moreno, L. A., De Henauw, S., Siani, A., Veidebaum, T., Ahrens, W., Pigeot, I., & Foraita, R. (2021). Polygenic risk for obesity and its interaction with lifestyle and sociodemographic factors in European children and adolescents. International Journal of Obesity, 45(6), 1321–1330. https://doi.org/10.1038/s41366-021-00795-5
Jansen, P. R., Vos, N., van Uhm, J., Dekkers, I. A., van der Meer, R., Mannens, M. M. A. M., & van Haelst, M. M. (2024). The utility of obesity polygenic risk scores from research to clinical practice: A review. In Obesity Reviews (Vol. 25, Issue 11). John Wiley and Sons Inc. https://doi.org/10.1111/obr.13810
Kocak, B., Klontzas, M. E., Stanzione, A., Meddeb, A., Demircioğlu, A., Bluethgen, C., Bressem, K. K., Ugga, L., Mercaldo, N., Díaz, O., & Cuocolo, R. (2025). Evaluation metrics in medical imaging AI: fundamentals, pitfalls, misapplications, and recommendations. European Journal of Radiology Artificial Intelligence, 3, 100030. https://doi.org/10.1016/j.ejrai.2025.100030
Kurniawan, F., Sigit, F. S., Trompet, S., Yunir, E., Tarigan, T. J. E., Harbuwono, D. S., Soewondo, P., Tahapary, D. L., & de Mutsert, R. (2024). Lifestyle and clinical risk factors in relation with the prevalence of diabetes in the Indonesian urban and rural populations: The 2018 Indonesian Basic Health Survey. Preventive Medicine Reports, 38. https://doi.org/10.1016/j.pmedr.2024.102629
Loos, R. J. F. (2025). Genetic causes of obesity: mapping a path forward. In Trends in Molecular Medicine (Vol. 31, Issue 4, pp. 319–325). Elsevier Ltd. https://doi.org/10.1016/j.molmed.2025.02.002
Lyu, Y., Li, H., Sayagh, M., Jiang, Z. M., & Hassan, A. E. (2021). An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions. ACM Transactions on Software Engineering and Methodology, 30(4). https://doi.org/10.1145/3447876
Marengo, A., Pagano, A., & Santamato, V. (2025). A machine learning framework for soft skills assessment: Leveraging serious games in higher education. Computers and Education: Artificial Intelligence, 9. https://doi.org/10.1016/j.caeai.2025.100469
Mendis, S., Graham, I., Branca, F., Collins, T., Tukuitonga, C., Gunawardane, A., & Narula, J. (2025). Alarming Rise of Obesity: The 4th United Nations High-Level Meeting on Noncommunicable Diseases and Mental Health Should Advance Action to Tackle Obesity. Global Heart, 20(1). https://doi.org/10.5334/gh.1459
Muharram, F. R., Tjandra, S., Madani, N. J., Rokx, C., & Abdullah, A. (2025). Trends in the double burden of malnutrition among Indonesian adults, 2007 to 2023. Scientific Reports, 15(1). https://doi.org/10.1038/s41598-025-17348-9
Mulder, C. J. J., Bayoumy, A. B., & Ansari, A. R. (2025). The ‘Obesity First’ approach: Redefining the future of healthcare. In Indian Journal of Gastroenterology. Springer. https://doi.org/10.1007/s12664-025-01882-5
Parums, D. V. (2025). Editorial: Global Obesity Rates Continue to Rise with Challenges for New Drug Treatments Including GLP-1 Receptor Agonists. Medical Science Monitor, 31. https://doi.org/10.12659/MSM.950816
Phatcharathada, B., & Srisuradetchai, P. (2025). Randomized Feature and Bootstrapped Naive Bayes Classification. Applied System Innovation, 8(4). https://doi.org/10.3390/asi8040094
Pledger, S. L., & Ahmadizar, F. (2023). Gene-environment interactions and the effect on obesity risk in low and middle-income countries: a scoping review. In Frontiers in Endocrinology (Vol. 14). Frontiers Media SA. https://doi.org/10.3389/fendo.2023.1230445
Sawesi, S., Jadhav, A., & Rashrash, B. (2025). Machine Learning and Deep Learning Techniques for Prediction and Diagnosis of Leptospirosis: Systematic Literature Review. In JMIR Medical Informatics (Vol. 13). JMIR Publications Inc. https://doi.org/10.2196/67859
Siswanto, J. V., Mutiara, B., Austin, F., Susanto, J., Tan, C. T., Kresnadi, R. U., & Irene, K. (2025). Ancestry-Adjusted Polygenic Risk Scores for Predicting Obesity Risk in the Indonesian Population. https://doi.org/10.48550/arXiv.2505.13503
Sivakumar, M., Parthasarathy, S., & Padmapriya, T. (2024). Trade-off between training and testing ratio in machine learning for medical image processing. PeerJ Computer Science, 10. https://doi.org/10.7717/PEERJ-CS.2245
Smit, R. A. J., Wade, K. H., Hui, Q., Arias, J. D., Yin, X., Christiansen, M. R., Yengo, L., Preuss, M. H., Nakabuye, M., Rocheleau, G., Graham, S. E., Buchanan, V. L., Chittoor, G., Graff, M., Guindo-Martínez, M., Lu, Y., Marouli, E., Sakaue, S., Spracklen, C. N., … Loos, R. J. F. (2025). Polygenic prediction of body mass index and obesity through the life course and across ancestries. Nature Medicine, 31(9), 3151–3168. https://doi.org/10.1038/s41591-025-03827-z
Stappers, N. E. H., Bekker, M. P. M., Jansen, M. W. J., Kremers, S. P. J., de Vries, N. K., Schipperijn, J., & Van Kann, D. H. H. (2023). Effects of major urban redesign on sedentary behavior, physical activity, active transport and health-related quality of life in adults. BMC Public Health, 23(1). https://doi.org/10.1186/s12889-023-16035-6
Tee, E. S., & Voon, S. H. (2024). Combating obesity in Southeast Asia countries: current status and the way forward. In Global Health Journal (Vol. 8, Issue 3, pp. 147–151). KeAi Communications Co. https://doi.org/10.1016/j.glohj.2024.08.006
Thamrin, S. A., Arsyad, D. S., Kuswanto, H., Lawi, A., & Nasir, S. (2021). Predicting Obesity in Adults Using Machine Learning Techniques: An Analysis of Indonesian Basic Health Research 2018. Frontiers in Nutrition, 8. https://doi.org/10.3389/fnut.2021.669155
Verde, L., Barrea, L., Bowman-Busato, J., Yumuk, V. D., Colao, A., & Muscogiuri, G. (2024). Obesogenic environments as major determinants of a disease: It is time to re-shape our cities. In Diabetes/Metabolism Research and Reviews (Vol. 40, Issue 1). John Wiley and Sons Ltd. https://doi.org/10.1002/dmrr.3748
Wang, J. W. D. (2025). Naïve Bayes is an interpretable and predictive machine learning algorithm in predicting osteoporotic hip fracture in-hospital mortality compared to other machine learning algorithms. PLOS Digital Health, 4(1). https://doi.org/10.1371/journal.pdig.0000529
Zhao, Y., Qie, R., Han, M., Huang, S., Wu, X., Zhang, Y., Feng, Y., Yang, X., Li, Y., Wu, Y., Liu, D., Hu, F., Zhang, M., Sun, L., & Hu, D. (2021). Association of BMI with cardiovascular disease incidence and mortality in patients with type 2 diabetes mellitus: A systematic review and dose–response meta-analysis of cohort studies. Nutrition, Metabolism and Cardiovascular Diseases, 31(7), 1976–1984. https://doi.org/10.1016/j.numecd.2021.03.003
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Yusion Gandjang, Amaliah Safitri K, Nabila Dwi Anugra, Iyang Yuyung S, Akhmad Affandi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.