TY - JOUR TI - Heart failure risk prediction using azure data lake architecture with automated machine learning and machine learning approaches AU - Alghamdi Ahmed Mohammed AU - Al Shehri Waleed AU - Almalki Jameel AU - Jannah Najlaa AU - Bahaddad Adel A AU - Bokhary Abdullah M JN - Thermal Science PY - 2024 VL - 28 IS - 6 SP - 5059 EP - 5069 PT - Article AB - Cardiovascular disease is a chronic disease that is a leading cause of death due to heart failure and blood stroke. The WHO records 17.9 million deaths yearly due to heart-related diseases. Heart failure occurs worldwide, especially having a significant impact in low and middle-income countries. Early diagnosis of heart disease is needed because a patient can face serious complexities if it is detected in the later stages of disease progression. In addition, if heart disease is identified early, it is likely to be cured. On the other hand, symptom identification of heart failure is necessary for an accurate and optimum solution. The model reported in this paper suggests a solution for the early diagnosis of heart disease. First, data analysis is performed, and pre-processing approaches are applied to prepare the dataset for model training. Raw data has noise and missing values, which are treated correctly before being passed to the model. Second, two types of algorithms are trained for the proposed solution. Traditional machine learning algorithms are used in the form of support vector machine, k-nearest neighbors, logistic regression, random forest, artificial neural networks, decision tree, xgboost, and catboost to train and test the model. In parallel, automated machine learning (AutoML) with an Azure machine learning cloud instance is used for model training and testing. Azure data lake cloud storage is utilized for model training and running the AutoML process. Finally, the performance of the models was evaluated using a University of California Irvine (UCI) machine learning open-source dataset for heart failure diagnosis. The AutoML outperformed when compared with traditional algorithms. The highest accuracy value obtained for the best machine learning algorithm was xgboost, with an accuracy of 82.22%, whereas the accuracy value obtained using AutoML was 88%. The proposed model can be used for clinical purposes due to its performance and the approach applied.