OPTIM: tools: improve hash distribution using a better prime seed

During tests it was noticed that the current hash is not that good
on 4- and 5- bit hashes. About 7.5% of all the 32-bit primes were tested
as candidates for the hash function, by submitting them 128 arrangements
of N pointers among 40k extracted from haproxy's pools, and the average
fill rates for 1- to 12- bit hashes were measured and compared. It was
clear that some values do not provide great hashes and other ones are
way more resistant.

The current value is not bad at all but delivers 42.6% unique 2-bit
outputs, 41.6% 3-bit, 38.0% 4-bit, 38.2% 5-bit and 37.1% 10-bit. Some
values did perform significantly better, among which 0xacd1be85 which
does 43.2% 2-bit, 42.5% 3-bit, 42.2% 4-bit, 39.2% 5-bit and 37.3% 10-bit.

The reverse value used in the ptr2_hash() was really underperforming and
was replaced with 0x9d28e4e9 which does 49.6%, 40.4%, 42.6%, 39.1%, and
37.2% respectvely.

This should slightly improve the accuracy of the task and memory
profiling, and will be useful for pools.
This commit is contained in:
Willy Tarreau 2023-08-12 17:28:13 +02:00
parent 58946d44f8
commit f0d188f6ed
1 changed files with 9 additions and 10 deletions

View File

@ -1085,13 +1085,12 @@ static inline uint statistical_prng_range(uint range)
* to compute statistic buckets, in that it's fast and reasonably distributed
* thanks to mixing the bits via a multiplication by a prime number and using
* the middle bits on 64-bit platforms or remixing the topmost with lowest ones
* on 32-bit. It provides ~2491 unique values (~1354 non-colliding) for 2^12
* valid pointers at 12 bits, ~1454 (~941 non-colliding) for 2^11 valid ptrs
* at 11 bits, ~707 (~434 non-colliding) for 2^10 valid ptrs at 10 bits, ~347
* (212 non colliding) for 2^9 valid ptrs at 9 bits, and 165/99 for 2^8 ptrs
* at 8 bits, hence 1-1/e and 1/e respectively. It must be inlined so that
* <bits> is always a compile-time constant. It supports output sizes from 0
* to 32 bits.
* on 32-bit. The distribution is smooth enough for the hash to provide on
* average 1/e non-colliding entries per input, and use on average 1-1/e
* entries total. Thus for example hashing 1024 random valid pointers will
* result on average in ~647 distinct keys, 377 of which are unique. It was
* carefully selected to deliver optimal distribution for low bit counts so
* that hashing on 2,3,4 or 5 bits delivers good results.
*/
static forceinline uint ptr_hash(const void *p, const int bits)
{
@ -1100,7 +1099,7 @@ static forceinline uint ptr_hash(const void *p, const int bits)
if (!bits)
return 0;
x *= 0xc1da9653U;
x *= 0xacd1be85U;
if (sizeof(long) == 4)
x ^= x >> 32;
else
@ -1119,8 +1118,8 @@ static forceinline uint ptr2_hash(const void *p1, const void *p2, const int bits
if (!bits)
return 0;
x *= 0xc1da9653U;
y *= 0x96531cadU;
x *= 0xacd1be85U;
y *= 0x9d28e4e9U;
x ^= y;
if (sizeof(long) == 4)
x ^= x >> 32;