The test scans the whole number space in 32 bits and compares the different
functions with the reference that does one byte at a time. In 64-bit mode,
it picks 2^32 64-bit random numbers and tests that they the 64-bit functions
all produce the expected results when submitted such numbers.
It optionally takes an initial offset and step so that it can run on
multiple cores (or even machines), though the test is reasonably fast
on modern machines, around 10s per core.