Use a better hash function for libsepol's avtab

This function, based on murmurhash3, has much better distribution than
the original. Using the current default of 4096 buckets, there are many
fewer collisions:

Before:
2893000 entries and 4096/4096 buckets used, longest chain length 1649
After:
2732000 entries and 4096/4096 buckets used, longest chain length 764

The difference becomes much more significant when buckets are increased.
A naive attempt to expand the current function to larger outputs doesn't
yield any significant improvement; so this function is a prerequisite
for increasing the bucket size.

Signed-off-by: John Brooks <john.brooks@jolla.com>
This commit is contained in:
John Brooks 2015-01-14 17:24:02 -07:00 committed by Stephen Smalley
parent 76ea9f8987
commit 798950e055
1 changed files with 36 additions and 3 deletions

View File

@ -49,10 +49,43 @@
#include "debug.h"
#include "private.h"
static inline int avtab_hash(struct avtab_key *keyp, uint16_t mask)
/* Based on MurmurHash3, written by Austin Appleby and placed in the
* public domain.
*/
static inline int avtab_hash(struct avtab_key *keyp, uint32_t mask)
{
return ((keyp->target_class + (keyp->target_type << 2) +
(keyp->source_type << 9)) & mask);
static const uint32_t c1 = 0xcc9e2d51;
static const uint32_t c2 = 0x1b873593;
static const uint32_t r1 = 15;
static const uint32_t r2 = 13;
static const uint32_t m = 5;
static const uint32_t n = 0xe6546b64;
uint32_t hash = 0;
#define mix(input) { \
uint32_t v = input; \
v *= c1; \
v = (v << r1) | (v >> (32 - r1)); \
v *= c2; \
hash ^= v; \
hash = (hash << r2) | (hash >> (32 - r2)); \
hash = hash * m + n; \
}
mix(keyp->target_class);
mix(keyp->target_type);
mix(keyp->source_type);
#undef mix
hash ^= hash >> 16;
hash *= 0x85ebca6b;
hash ^= hash >> 13;
hash *= 0xc2b2ae35;
hash ^= hash >> 16;
return hash & mask;
}
static avtab_ptr_t