Asymptotic behavior of SMOTE-generated samples using order statistics
Main Article Content
Abstract
Imbalanced datasets often lead to biased machine learning models that underperform on minority classes. The Synthetic Minority Over-sampling Technique (SMOTE) addresses this by generating synthetic samples for the minority class. Despite its empirical success, the theoretical properties of SMOTE remain underexplored. This paper investigates the asymptotic behavior of SMOTE-generated random variables using order statistics. We establish that, as the sample size n increases, the SMOTE-generated variable Z conditioned on the k-th order statistic X(k) converges in mean to X(k). Additionally, the expected value of Z converges to the expected value of the original random variable X as n approaches infinity. The results are derived under the assumption of left-bounded support for X. Our findings provide a theoretical foundation for SMOTE, demonstrating that it preserves key statistical properties of the original data distribution in large samples.