I have a fairly simple SQL (MySQL):
SELECT foo FROM bar ORDER BY rank, RAND()
I notice that when I refresh the results, the randomness is suspiciously weak.
In the sample data at the moment there are six results with equal rank (integer zero). There are lots of tests for randomness but here is a simple one to do by hand: when run twice, the first result should be the same in both runs about one sixth of the time. This is certainly not happening, the leading result is the same at least a third of the time.
I want a uniform distribution over the permutations. I'm not an expert statistician but I'm pretty sure ORDER BY RAND()
should achieve this. What am I missing?
With MySQL, SELECT rand(), rand()
shows two different numbers, so I don't buy the "once per query" explanation
RAND()
is only executed once per query. You can verify this by looking at the result set.
If you're trying to get a randomized order, you should be using either NEWID()
or CHECKSUM(NEWID())
.
WITH T AS ( -- example using RAND()
SELECT 'Me' Name UNION SELECT 'You' UNION SELECT 'Another'
)
SELECT Name, RAND()
FROM T;
WITH T AS ( -- example using just NEWID()
SELECT 'Me' Name UNION SELECT 'You' UNION SELECT 'Another'
)
SELECT Name, NEWID()
FROM T;
WITH T AS ( -- example getting the CHECKSUM() of NEWID()
SELECT 'Me' Name UNION SELECT 'You' UNION SELECT 'Another'
)
SELECT Name, CHECKSUM(NEWID())
FROM T;
NEWID
as a source of randomness. NEWID
is for generating unique values. Unique is not the same as random! There's no guarantee that it's random (it doesn't have to be, it could just be sequentially allocated and still be unique). There's no guarantee that CHECKSUM(NEWID)
is random either. No, no, no a thousand times no - jason 2012-04-04 18:21
CHECKSUM()
which produces a pseudo-random integer. It may not be cryptographically strong, but it certainly results in a unique ordering each time the query is run - Yuck 2012-04-04 18:22
CHECKSUM
is for balancing hash tables. There is no guarantee it's random or pseudo-random. No, no, no, a million times no - jason 2012-04-04 18:23
CRYPT_GEN_RANDOM
is available but assume previous versions - Martin Smith 2012-04-04 18:29
The RAND()
can not be refresh for each row. A possible solution might be:
SELECT foo FROM bar ORDER BY rank, CHECKSUM(NEWID())
NEWID
for unique values, unique is not the same as random. CHECKSUM
is for balancing hash tables, not for generating random values. There is no guarantee that any of this is random. No, no, no - jason 2012-04-04 18:24