Troglodyte 27 wrote:
Random errors have a Normal distribution. A skewed data report would likely resemble a Weibull distribution (a long tail on one side). It wouldn't take 800 of 1500 to reveal a bias or flaw. Easy to do a non-parametric analysis (Kolmogorov - Smirnoff test) of the cumulative distribution function of the data and compare to the cdf of Normal +/- K bands. In some of my research, I've had to back into the sample size needed to generate certain sensitivity / ability to discrminate between Normal and Weibulll. I have found that, if all the samples distribute Weibull, then it takes less than 25 samples to discriminate between the two distrinutions. However, we can not assume that every sample has been manipulated, and would thus distribute Weibull.
So, since the width of the Kolmogorov band "d(a,n)" is nearly the same for an infinite number of points as it is for 120 data points, we could probably get a decent indication using as few as 120 random samples nation wide.
The analysis of this could be done by any competent statistician, or you could just plug it into off-the-shelf software (e.g., MiniTab).