Asymptotic behavior of SMOTE-generated samples using order statistics

Main Article Content

Firuz Kamalov

Abstract

Imbalanced datasets often lead to biased machine learning models that underperform on minority classes. The Synthetic Minority Over-sampling Technique (SMOTE) addresses this by generating synthetic samples for the minority class. Despite its empirical success, the theoretical properties of SMOTE remain underexplored. This paper investigates the asymptotic behavior of SMOTE-generated random variables using order statistics. We establish that, as the sample size n increases, the SMOTE-generated variable Z conditioned on the k-th order statistic X(k) converges in mean to X(k). Additionally, the expected value of Z converges to the expected value of the original random variable X as n approaches infinity. The results are derived under the assumption of left-bounded support for X. Our findings provide a theoretical foundation for SMOTE, demonstrating that it preserves key statistical properties of the original data distribution in large samples.

Downloads

Download data is not yet available.

Article Details

How to Cite
Kamalov, F. (2024). Asymptotic behavior of SMOTE-generated samples using order statistics. Gulf Journal of Mathematics, 17(2), 327-336. https://doi.org/10.56947/gjom.v17i2.2343
Section
Articles