Not all Numbers are Mean

In [1] D. Stapel mentioned a test with 32 students with two attributes: secure / insecure and chose meat / chose vegetarian dish. He gave the following table:

60% || 40%
-----------
20% || 80%

The number theorist in me quickly noticed something funny: these percentages are not possible. For integers n1 + n2 = 32, 60% * n1 and 20% * n2 are never close enough to integers at the same time. I pointed this out to the authors, but got no reply. A week later Stapel’s fraud was published.

This is a simple example of a much more general phenomena for discrete distributions.  Not all values are possible for mean, st.dev., etc. I will give some examples using R-code.

Take N samples with replacement from a probability distribution on the numbers 1 to k. The sum of these samples ranges from N to k*N, so there are 1+(k-1)*N possibilities. On the other hand, the sample mean is a number between 1 and k, and if we write the mean with 2 decimals, there are 1+(k-1)*100 different numbers. If N is small, only N% of all these 2-decimals numbers are possible. In other words, for an integer n <= k are in the interval [n, n+1) precisely N numbers possible. Example test:

> m <- 4.13; N <- 16;
> round(round(m*N, 0)/N, 2)
[1] 4.12

Let’s try the situation of 16 students that score some items on a scale of 1 to 7 (a Likert scale, very popular in circles of social psychology).

v <- function(N, k, ...){sample(1:k, N, replace = TRUE, ...)}
u <- unique(sapply(1:10000, function(i){mean(v(16,7))}))
round(u[order(u)],2)

This will give a row of  numbers starting with 2.31 2.38 2.44 2.50 2.56 2.62 2.69 2.75 2.81 2.88 2.94 3.00 or similar. These are the plausible possibilities. Using the test above we can make a list of the small N’s that can result in the given mean value. Sixteen (16) is not among them:

> b[100:700] <- sapply(100:700, function(i){m <- round(i/100, 2); return((1:100)[m == round(round(m*(1:100), 0)/(1:100), 2)]) })
> fN <- function(m){b[100*m]}
> fN(4.13)
[[1]]
[1] 15 23 30 31 38 39 45 46 47 52 53 54 55 60 61
[16] 62 63 67 68 69 70 71 75 76 77 78 79 82 83 84
[31] 85 86 87 89 90 91 92 93 94 95 97 98 99 100

This is one of the “errors” Stapel made when he cheated: he gave values that were impossible given the small samples.

— o —

[1] R. Vonk 2011 circulated an unpublished note about “selfish” meat eaters by D. Stapel (e.a.) to the press (in Dutch) that contained faked numbers.