Data-Driven Obesity Classification Integrating Genetic and Lifestyle Determinants Using Naive Bayes

Authors

  • Yusion Gandjang Universitas Negeri Makassar
  • Amaliah Safitri K Universitas Negeri Makassar
  • Nabila Dwi Anugra Universitas Negeri Makassar
  • Iyang Yuyung S Universitas Negeri Makassar
  • Akhmad Affandi Dresden University

DOI:

https://doi.org/10.66053/aieds.v1i2.21

Keywords:

Classification, Obesity, Genetic factors, Lifestyle, Naive bayes

Abstract

Purpose – This study aims to develop a data-driven obesity classification framework that integrates genetic predisposition and lifestyle determinants using the Naive Bayes algorithm, while empirically evaluating optimal training–testing data proportions for health decision support systems.
Methods – A systematic computational workflow was applied to a public obesity dataset comprising 2,112 records, which was refined to 1,259 valid instances after preprocessing. Genetic indicators and lifestyle-related variables were encoded and classified into four obesity categories: normal weight, obesity type I, obesity type II, and obesity type III. The Naive Bayes model was evaluated using three training–testing data partition ratios (75:25, 80:20, and 85:15). Model performance was assessed using six metrics: Area Under the Curve (AUC), classification accuracy, F1-score, precision, recall, and Matthews Correlation Coefficient.
Findings – The results demonstrate that the 80:20 and 85:15 data partitions achieved the highest performance, with an accuracy of 0.878 and an AUC of 0.979. The model showed excellent sensitivity in identifying severe obesity cases, while moderate misclassification occurred between obesity type I and type II due to phenotypic overlap in lifestyle patterns.
Research limitations – This study relies on a single public dataset and lacks population-specific genetic calibration, which may limit generalizability to diverse regional contexts.
Originality – This study provides empirical validation of a probabilistic obesity classification framework that integrates genetic and lifestyle factors, offering an interpretable and computationally efficient approach to support data-driven health decision making.

References

Airlangga, G. (2025). A Comparative Analysis of Machine Learning Models for Obesity Prediction. Jurnal Informatika Ekonomi Bisnis, 7(1), 1–5. https://doi.org/10.37034/infeb.v7i1.1089

Alchamdani, & Anas, A. S. (2024). Urban Obesity in Transition: Socioeconomic, Lifestyle, and Environmental Drivers in Jakarta, Indonesia. Medicor : Journal of Health Informatics and Health Policy, 2(2), 113–124. https://doi.org/10.61978/medicor.v2i2.748

Andersen, E. S., Birk-Korch, J. B., Hansen, R. S., Fly, L. H., Röttger, R., Arcani, D. M. C., Brasen, C. L., Brandslund, I., & Madsen, J. S. (2024). Monitoring performance of clinical artificial intelligence in health care: A scoping review. In JBI Evidence Synthesis (Vol. 22, Issue 12, pp. 2423–2446). Lippincott Williams and Wilkins. https://doi.org/10.11124/JBIES-24-00042

Awalia, A. D. N., Hani, M. F., & Surianto, D. F. (2025). Analysis of Naive Bayes and Support Vector Machine Algorithms in Classification of Diabetes Cases Based on Lifestyle Factors. In Journal of Embedded System Security and Intelligent Systems (Vol. 6, Issue 3).

Bhutta, Z. A. (2025). Global Burden of Disease 2023: Challenges and opportunities for a growing collaboration. In PLoS medicine (Vol. 22, Issue 11, p. e1004838). https://doi.org/10.1371/journal.pmed.1004838

Bineid, M. M., Ventura, E. F., Samidoust, A., Radha, V., Anjana, R. M., Sudha, V., Walton, G. E., Mohan, V., & Vimaleswaran, K. S. (2025). A Systematic Review of the Effect of Gene-Lifestyle Interactions on Metabolic-Disease-Related Traits in South Asian Populations. In Nutrition Reviews (Vol. 83, Issue 6, pp. 1061–1082). Oxford University Press. https://doi.org/10.1093/nutrit/nuae115

Chen, H. H., Chen, C. H., Hou, M. C., Fu, Y. C., Li, L. H., Chou, C. Y., Yeh, E. C., Tsai, M. F., Chen, C. H., Yang, H. C., Huang, Y. T., Liu, Y. M., Wei, C. Y., Su, J. P., Lin, W. J., Wang, E. H. F., Chiang, C. L., Jiang, J. K., Lee, I. H., … Fann, C. S. J. (2025). Population-specific polygenic risk scores for people of Han Chinese ancestry. Nature. https://doi.org/10.1038/s41586-025-09350-y

Colozza, D., Wang, Y. C., & Avendano, M. (2023). Does urbanisation lead to unhealthy diets? Longitudinal evidence from Indonesia. Health and Place, 83. https://doi.org/10.1016/j.healthplace.2023.103091

Dashti, H. S., Miranda, N., Cade, B. E., Huang, T., Redline, S., Karlson, E. W., & Saxena, R. (2022). Interaction of obesity polygenic score with lifestyle risk factors in an electronic health record biobank. BMC Medicine, 20(1). https://doi.org/10.1186/s12916-021-02198-9

Downie, C. G., Shrestha, P., Okello, S., Yaser, M., Lee, H. H., Wang, Y., Krishnan, M., Chen, H. H., Justice, A. E., Chittoor, G., Josyula, N. S., Gahagan, S., Blanco, E., Burrows, R., Correa-Burrows, P., Albala, C., Santos, J. L., Angel, B., Lozoff, B., … North, K. E. (2025). Trans-ancestry genome-wide association study of childhood body mass index identifies novel loci and age-specific effects. Human Genetics and Genomics Advances, 6(2). https://doi.org/10.1016/j.xhgg.2025.100411

Hadi, A., Qamal, M., & Afrillia, Y. (2025). Comparison of Random Forest Algorithm Classifier and Naïve Bayes Algorithm in Whatsapp Message Type Classification. Journal of Renewable Energy, Electrical, and Computer Engineering, 5(1), 9–17. https://doi.org/10.29103/jreece.v5i1.21227

Hosseinpour-Niazi, S., Niknam, M., Amiri, P., Mirmiran, P., Einy, E., Izadi, N., Gaeini, Z., & Azizi, F. (2024). The association between ultra-processed food consumption and health-related quality of life differs across lifestyle and socioeconomic strata. BMC Public Health, 24(1). https://doi.org/10.1186/s12889-024-19351-7

Huangfu, Y., Palloni, A., Beltrán-Sánchez, H., & McEniry, M. C. (2023). Gene-environment interactions and the case of body mass index and obesity: How much do they matter? PNAS Nexus, 2(7). https://doi.org/10.1093/pnasnexus/pgad213

Hüls, A., Wright, M. N., Bogl, L. H., Kaprio, J., Lissner, L., Molnár, D., Moreno, L. A., De Henauw, S., Siani, A., Veidebaum, T., Ahrens, W., Pigeot, I., & Foraita, R. (2021). Polygenic risk for obesity and its interaction with lifestyle and sociodemographic factors in European children and adolescents. International Journal of Obesity, 45(6), 1321–1330. https://doi.org/10.1038/s41366-021-00795-5

Jansen, P. R., Vos, N., van Uhm, J., Dekkers, I. A., van der Meer, R., Mannens, M. M. A. M., & van Haelst, M. M. (2024). The utility of obesity polygenic risk scores from research to clinical practice: A review. In Obesity Reviews (Vol. 25, Issue 11). John Wiley and Sons Inc. https://doi.org/10.1111/obr.13810

Kocak, B., Klontzas, M. E., Stanzione, A., Meddeb, A., Demircioğlu, A., Bluethgen, C., Bressem, K. K., Ugga, L., Mercaldo, N., Díaz, O., & Cuocolo, R. (2025). Evaluation metrics in medical imaging AI: fundamentals, pitfalls, misapplications, and recommendations. European Journal of Radiology Artificial Intelligence, 3, 100030. https://doi.org/10.1016/j.ejrai.2025.100030

Kurniawan, F., Sigit, F. S., Trompet, S., Yunir, E., Tarigan, T. J. E., Harbuwono, D. S., Soewondo, P., Tahapary, D. L., & de Mutsert, R. (2024). Lifestyle and clinical risk factors in relation with the prevalence of diabetes in the Indonesian urban and rural populations: The 2018 Indonesian Basic Health Survey. Preventive Medicine Reports, 38. https://doi.org/10.1016/j.pmedr.2024.102629

Loos, R. J. F. (2025). Genetic causes of obesity: mapping a path forward. In Trends in Molecular Medicine (Vol. 31, Issue 4, pp. 319–325). Elsevier Ltd. https://doi.org/10.1016/j.molmed.2025.02.002

Lyu, Y., Li, H., Sayagh, M., Jiang, Z. M., & Hassan, A. E. (2021). An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions. ACM Transactions on Software Engineering and Methodology, 30(4). https://doi.org/10.1145/3447876

Marengo, A., Pagano, A., & Santamato, V. (2025). A machine learning framework for soft skills assessment: Leveraging serious games in higher education. Computers and Education: Artificial Intelligence, 9. https://doi.org/10.1016/j.caeai.2025.100469

Mendis, S., Graham, I., Branca, F., Collins, T., Tukuitonga, C., Gunawardane, A., & Narula, J. (2025). Alarming Rise of Obesity: The 4th United Nations High-Level Meeting on Noncommunicable Diseases and Mental Health Should Advance Action to Tackle Obesity. Global Heart, 20(1). https://doi.org/10.5334/gh.1459

Muharram, F. R., Tjandra, S., Madani, N. J., Rokx, C., & Abdullah, A. (2025). Trends in the double burden of malnutrition among Indonesian adults, 2007 to 2023. Scientific Reports, 15(1). https://doi.org/10.1038/s41598-025-17348-9

Mulder, C. J. J., Bayoumy, A. B., & Ansari, A. R. (2025). The ‘Obesity First’ approach: Redefining the future of healthcare. In Indian Journal of Gastroenterology. Springer. https://doi.org/10.1007/s12664-025-01882-5

Parums, D. V. (2025). Editorial: Global Obesity Rates Continue to Rise with Challenges for New Drug Treatments Including GLP-1 Receptor Agonists. Medical Science Monitor, 31. https://doi.org/10.12659/MSM.950816

Phatcharathada, B., & Srisuradetchai, P. (2025). Randomized Feature and Bootstrapped Naive Bayes Classification. Applied System Innovation, 8(4). https://doi.org/10.3390/asi8040094

Pledger, S. L., & Ahmadizar, F. (2023). Gene-environment interactions and the effect on obesity risk in low and middle-income countries: a scoping review. In Frontiers in Endocrinology (Vol. 14). Frontiers Media SA. https://doi.org/10.3389/fendo.2023.1230445

Sawesi, S., Jadhav, A., & Rashrash, B. (2025). Machine Learning and Deep Learning Techniques for Prediction and Diagnosis of Leptospirosis: Systematic Literature Review. In JMIR Medical Informatics (Vol. 13). JMIR Publications Inc. https://doi.org/10.2196/67859

Siswanto, J. V., Mutiara, B., Austin, F., Susanto, J., Tan, C. T., Kresnadi, R. U., & Irene, K. (2025). Ancestry-Adjusted Polygenic Risk Scores for Predicting Obesity Risk in the Indonesian Population. https://doi.org/10.48550/arXiv.2505.13503

Sivakumar, M., Parthasarathy, S., & Padmapriya, T. (2024). Trade-off between training and testing ratio in machine learning for medical image processing. PeerJ Computer Science, 10. https://doi.org/10.7717/PEERJ-CS.2245

Smit, R. A. J., Wade, K. H., Hui, Q., Arias, J. D., Yin, X., Christiansen, M. R., Yengo, L., Preuss, M. H., Nakabuye, M., Rocheleau, G., Graham, S. E., Buchanan, V. L., Chittoor, G., Graff, M., Guindo-Martínez, M., Lu, Y., Marouli, E., Sakaue, S., Spracklen, C. N., … Loos, R. J. F. (2025). Polygenic prediction of body mass index and obesity through the life course and across ancestries. Nature Medicine, 31(9), 3151–3168. https://doi.org/10.1038/s41591-025-03827-z

Stappers, N. E. H., Bekker, M. P. M., Jansen, M. W. J., Kremers, S. P. J., de Vries, N. K., Schipperijn, J., & Van Kann, D. H. H. (2023). Effects of major urban redesign on sedentary behavior, physical activity, active transport and health-related quality of life in adults. BMC Public Health, 23(1). https://doi.org/10.1186/s12889-023-16035-6

Tee, E. S., & Voon, S. H. (2024). Combating obesity in Southeast Asia countries: current status and the way forward. In Global Health Journal (Vol. 8, Issue 3, pp. 147–151). KeAi Communications Co. https://doi.org/10.1016/j.glohj.2024.08.006

Thamrin, S. A., Arsyad, D. S., Kuswanto, H., Lawi, A., & Nasir, S. (2021). Predicting Obesity in Adults Using Machine Learning Techniques: An Analysis of Indonesian Basic Health Research 2018. Frontiers in Nutrition, 8. https://doi.org/10.3389/fnut.2021.669155

Verde, L., Barrea, L., Bowman-Busato, J., Yumuk, V. D., Colao, A., & Muscogiuri, G. (2024). Obesogenic environments as major determinants of a disease: It is time to re-shape our cities. In Diabetes/Metabolism Research and Reviews (Vol. 40, Issue 1). John Wiley and Sons Ltd. https://doi.org/10.1002/dmrr.3748

Wang, J. W. D. (2025). Naïve Bayes is an interpretable and predictive machine learning algorithm in predicting osteoporotic hip fracture in-hospital mortality compared to other machine learning algorithms. PLOS Digital Health, 4(1). https://doi.org/10.1371/journal.pdig.0000529

Zhao, Y., Qie, R., Han, M., Huang, S., Wu, X., Zhang, Y., Feng, Y., Yang, X., Li, Y., Wu, Y., Liu, D., Hu, F., Zhang, M., Sun, L., & Hu, D. (2021). Association of BMI with cardiovascular disease incidence and mortality in patients with type 2 diabetes mellitus: A systematic review and dose–response meta-analysis of cohort studies. Nutrition, Metabolism and Cardiovascular Diseases, 31(7), 1976–1984. https://doi.org/10.1016/j.numecd.2021.03.003

Downloads

Published

2026-02-07