Nonsense Correlation and Biased Correlation due to Heterogeneous Samples

Thadewald, Thorsten
Volume: 52
Pages: 367 - 379
Year: 2003
Keywords: Hidden qualitative variable; Heterogenous groups; Pooled estimation
Abstract: The case of two variables is considered, where the sample consists of two heterogeneous groups. The behaviour of the pooled sample correlation coefficient is studied. The heterogeneity of the two groups may be interpreted as a hidden qualitative variable. It is shown that, even if the correlation is the same within both groups, the pooled correlation coefficient may be severely biased owing to heterogeneity of other group-specific parameters. In the case of uncorrelatedness, nonsensical correlation may arise from pooled estimation. These and further results are obtained and can be quantified or forecast from an asymptotic formula for the pooled sample correlation coefficient, which is well reproduced in finite sample computer experiments and illustrated with empirical examples.