In [1] D. Stapel mentioned a test with 32 students with two attributes: secure / insecure and chose meat / chose vegetarian dish. He gave the following table:

` 60% || 40%`

-----------

20% || 80%

The number theorist in me quickly noticed something funny: these percentages are not possible. For integers** n1 + n2 = 32, **

**and**

`60% * n1`

**are never close enough to integers at the same time. I pointed this out to the authors, but got no reply. A week later Stapel’s fraud was published.**

`20% * n2`

This is a simple example of a much more general phenomena for discrete distributions. Not all values are possible for mean, st.dev., etc. I will give some examples using R-code.

Take *N* samples with replacement from a probability distribution on the numbers *1* to *k*. The sum of these samples ranges from* N* to *k*N*, so there are *1+(k-1)*N* possibilities. On the other hand, the sample mean is a number between* 1* and* k*, and if we write the mean with *2* decimals, there are *1+(k-1)*100* different numbers. If *N* is small, only *N%* of all these *2*-decimals numbers are possible. In other words, for an integer *n <= k *are in the interval *[n, n+1)* precisely* N* numbers possible. Example test:

`> m <- 4.13; N <- 16;`

> round(round(m*N, 0)/N, 2)

[1] 4.12

Let’s try the situation of *16* students that score some items on a scale of *1* to* 7* (a ** Likert scale**, very popular in circles of social psychology).

`v <- function(N, k, ...){sample(1:k, N, replace = TRUE, ...)}`

`u <- unique(sapply(1:10000, function(i){mean(v(16,7))}))`

round(u[order(u)],2)

This will give a row of numbers starting with ** 2.31 2.38 2.44 2.50 2.56 2.62 2.69 2.75 2.81 2.88 2.94 3.00 **or similar. These are the plausible possibilities. Using the test above we can make a list of the small N’s that can result in the given mean value. Sixteen (16) is not among them:

`> b[100:700] <- sapply(100:700, function(i){m <- round(i/100, 2); return((1:100)[m == round(round(m*(1:100), 0)/(1:100), 2)]) })`

> fN <- function(m){b[100*m]}

> fN(4.13)

[[1]]

[1] 15 23 30 31 38 39 45 46 47 52 53 54 55 60 61

[16] 62 63 67 68 69 70 71 75 76 77 78 79 82 83 84

[31] 85 86 87 89 90 91 92 93 94 95 97 98 99 100

This is one of the “errors” Stapel made when he cheated: he gave values that were impossible given the small samples.

— o —

[1] R. Vonk 2011 circulated an unpublished note about “selfish” meat eaters by D. Stapel (e.a.) to the press (in Dutch) that contained faked numbers.