Analytical formulation of synthetic minority oversampling technique (SMOTE) for imbalanced learning

Main Article Content

Firuz Kamalov
Salah Eddine Choutri
Amir F. Atiya

Abstract

Imbalanced data is an issue that affects various applications in machine learning and data science. Synthetic minority oversampling technique (SMOTE) is a common method used to artificially balance the data. Despite the popularity of SMOTE, there is limited information about its analytical properties. In this paper, we develop a precise theoretical formulation of the sampling distribution of SMOTE in several important cases. We also examine the convergence of the SMOTE distribution to the underlying distribution in mean. The results provide a better understanding of SMOTE and other sampling algorithms. In addition, we uncover surprising connections to other fields such as information theory, Euler's constant, and compound distributions. Finally, we show that the SMOTE-generated distribution Z converges to that of the true underlying distribution X in mean.

Downloads

Download data is not yet available.

Article Details

How to Cite
Kamalov, F., Choutri, S. E., & Atiya, A. F. (2025). Analytical formulation of synthetic minority oversampling technique (SMOTE) for imbalanced learning. Gulf Journal of Mathematics, 19(1), 400-415. https://doi.org/10.56947/gjom.v19i1.2639
Section
Articles