| Abstract: |
Road traffic accidents remain a persistent global public health crisis, disproportionately affecting developing nations including India, where over 461,000 accidents were recorded in 2022, claiming 168,491 lives. This empirical study investigates the application of machine learning algorithms to road accident data for severity classification, risk factor identification, and predictive modeling, using a compiled dataset of 52,814 accident records drawn from Indian national highway and urban road networks spanning 2018 to 2022. Seven machine learning algorithms were evaluated including logistic regression, decision trees, random forests, support vector machines, Naive Bayes, XGBoost, and long short-term memory neural networks, alongside an ensemble stacking model. Experimental results demonstrate that XGBoost achieved the highest individual accuracy of 91.3 percent with an AUC-ROC of 0.953, while the ensemble stacking model achieved 93.7 percent accuracy and an F1-score of 0.911, representing a statistically significant improvement over traditional statistical classifiers. Feature importance analysis identified vehicle speed, road surface condition, light condition, junction type, and alcohol impairment as the five most predictive variables for fatal accident outcomes. Severity analysis across road types revealed that rural roads carry the highest fatality rate at 11.3 percent despite lower overall accident volumes, underscoring the need for targeted rural safety interventions. Compared to analogous international studies, the ensemble model achieves competitive performance, placing it among the top three reported accuracy values in the published literature. These findings provide data-driven evidence for ML deployment within India's intelligent transportation infrastructure and offer a replicable empirical framework for road safety analysis in data-constrained environments. |