Faculty of Economics and Business Administration Publications Database

Shedding Light on the Role of Sample Sizes and Splitting Proportions in Out-of-Sample Tests: A Monte Carlo Cross-Validation Approach

Pages: 245 - 259
Link External Source: Online Version
Year: 2017
Keywords: Out-of-Sample Testing; Monte Carlo Cross-Validation; 2/3 Rule-of-thumb; Logistic Regression; Decision Tree Algorithm
Abstract: We examine whether the popular 2/3 rule-of-thumb splitting criterion used in out-of-sample evaluation of predictive econometric and machine learning models makes sense. We conduct simulations regarding the predictive performance of the logistic regression and decision tree algorithm when considering varying splitting points as well as sample sizes. Our non-exhaustive repeated random sub-sampling simulation approach known as Monte Carlo cross-validation indicates that while the 2/3 rule-of-thumb works, there is a spectrum of different splitting proportions that yield equally compelling results. Furthermore, our results indicate that the size of the complete sample has little impact on the applicability of the 2/3 rule-of-thumb. However, our analysis reveals that when considering relatively small and relatively large training samples in relation to the sample size, the variation of the predictive accuracy can lead to misleading results. Our results are especially important for IS researchers considering the usage of out-of-sample methods for evaluating their predictive models.