haproxy/doc/internals/hashing.txt

84 lines
4.4 KiB
Plaintext
Raw Normal View History

2013/11/20 - How hashing works internally in haproxy - maddalab@gmail.com
This document describes how Haproxy implements hashing both map-based and
consistent hashing, both prior to versions 1.5 and the motivation and tests
[RELEASE] Released version 2.2-dev6 Released version 2.2-dev6 with the following main changes : - BUG/MINOR: ssl: memory leak when find_chain is NULL - CLEANUP: ssl: rename ssl_get_issuer_chain to ssl_get0_issuer_chain - MINOR: ssl: rework add cert chain to CTX to be libssl independent - BUG/MINOR: peers: init bind_proc to 1 if it wasn't initialized - BUG/MINOR: peers: avoid an infinite loop with peers_fe is NULL - BUG/MINOR: peers: Use after free of "peers" section. - CI: github actions: add weekly h2spec test - BUG/MEDIUM: mux_h1: Process a new request if we already received it. - MINOR: build: Fix build in mux_h1 - CLEANUP: remove obsolete comments - BUG/MEDIUM: dns: improper parsing of aditional records - MINOR: ssl: skip self issued CA in cert chain for ssl_ctx - MINOR: listener: add so_name sample fetch - MEDIUM: stream: support use-server rules with dynamic names - MINOR: servers: Add a counter for the number of currently used connections. - MEDIUM: connections: Revamp the way idle connections are killed - MINOR: cli: add a general purpose pointer in the CLI struct - MINOR: ssl: add a list of bind_conf in struct crtlist - REORG: ssl: move SETCERT enum to ssl_sock.h - BUG/MINOR: ssl: ckch_inst wrongly inserted in crtlist_entry - REORG: ssl: move some functions above crtlist_load_cert_dir() - MINOR: ssl: use crtlist_free() upon error in directory loading - MINOR: ssl: add a list of crtlist_entry in ckch_store - MINOR: ssl: store a ptr to crtlist in crtlist_entry - MINOR: ssl/cli: update pointer to store in 'commit ssl cert' - MEDIUM: ssl/cli: 'add ssl crt-list' command - REGTEST: ssl/cli: test the 'add ssl crt-list' command - BUG/MINOR: ssl: entry->ckch_inst not initialized - REGTEST: ssl/cli: change test type to devel - REGTEST: make the PROXY TLV validation depend on version 2.2 - CLEANUP: assorted typo fixes in the code and comments - BUG/MINOR: stats: Fix color of draining servers on stats page - DOC: internals: Fix spelling errors in filters.txt - MINOR: connections: Don't mark conn flags 0x00000001 and 0x00000002 as unused. - REGTEST: make the unique-id test depend on version 2.0 - BUG/MEDIUM: dns: Consider the fact that dns answers are case-insensitive - MINOR: ssl: split the line parsing of the crt-list - MINOR: ssl/cli: support filters and options in add ssl crt-list - MINOR: ssl: add a comment above the ssl_bind_conf keywords - REGTEST: ssl/cli: tests options and filters w/ add ssl crt-list - REGTEST: ssl: pollute the crt-list file - BUG/CRITICAL: hpack: never index a header into the headroom after wrapping - BUG/MINOR: protocol_buffer: Wrong maximum shifting. - CLEANUP: src/fd.c: mask setsockopt with DISGUISE - BUG/MINOR: ssl/cli: initialize fcount int crtlist_entry - REGTEST: ssl/cli: add other cases of 'add ssl crt-list' - CLEANUP: assorted typo fixes in the code and comments - DOC: management: add the new crt-list CLI commands - BUG/MINOR: ssl/cli: fix spaces in 'show ssl crt-list' - MINOR: ssl/cli: 'del ssl crt-list' delete an entry - MINOR: ssl/cli: replace dump/show ssl crt-list by '-n' option - CI: use better SSL library definition - CI: travis-ci: enable DEBUG_STRICT=1 for CI builds - CI: travis-ci: upgrade openssl to 1.1.1f - MINOR: ssl: improve the errors when a crt can't be open - CI: cirrus-ci: rename openssl package after it is renamed in FreeBSD - CI: adopt openssl download script to download all versions - BUG/MINOR: ssl/cli: lock the ckch structures during crt-list delete - MINOR: ssl/cli: improve error for bundle in add/del ssl crt-list - MINOR: ssl/cli: 'del ssl cert' deletes a certificate - BUG/MINOR: ssl: trailing slashes in directory names wrongly cached - BUG/MINOR: ssl/cli: memory leak in 'set ssl cert' - CLEANUP: ssl: use the refcount for the SSL_CTX' - CLEANUP: ssl/cli: use the list of filters in the crtlist_entry - BUG/MINOR: ssl: memleak of the struct cert_key_and_chain - CLEANUP: ssl: remove a commentary in struct ckch_inst - MINOR: ssl: initialize all list in ckch_inst_new() - MINOR: ssl: free instances and SNIs with ckch_inst_free() - MINOR: ssl: replace ckchs_free() by ckch_store_free() - BUG/MEDIUM: ssl/cli: trying to access to free'd memory - MINOR: ssl: ckch_store_new() alloc and init a ckch_store - MINOR: ssl: crtlist_new() alloc and initialize a struct crtlist - REORG: ssl: move some free/new functions - MINOR: ssl: crtlist_entry_{new, free} - BUG/MINOR: ssl: ssl_conf always set to NULL on crt-list parsing - MINOR: ssl: don't alloc ssl_conf if no option found - BUG/MINOR: connection: always send address-less LOCAL PROXY connections - BUG/MINOR: peers: Incomplete peers sections should be validated. - MINOR: init: report in "haproxy -c" whether there were warnings or not - MINOR: init: add -dW and "zero-warning" to reject configs with warnings - MINOR: init: report the compiler version in haproxy -vv - CLEANUP: assorted typo fixes in the code and comments - MINOR: init: report the haproxy version and executable path once on errors - DOC: Make how "option redispatch" works more explicit - BUILD: Makefile: add linux-musl to TARGET - CLEANUP: assorted typo fixes in the code and comments - CLEANUP: http: Fixed small typo in parse_http_return - DOC: hashing: update link to hashing functions
2020-04-17 12:19:38 +00:00
that were done when providing additional options starting in version 2.2
A note on hashing in general, hash functions strive to have little
correlation between input and output. The heart of a hash function is its
mixing step. The behavior of the mixing step largely determines whether the
hash function is collision-resistant. Hash functions that are collision
resistant are more likely to have an even distribution of load.
The purpose of the mixing function is to spread the effect of each message
bit throughout all the bits of the internal state. Ideally every bit in the
hash state is affected by every bit in the message. And we want to do that
as quickly as possible simply for the sake of program performance. A
function is said to satisfy the strict avalanche criterion if, whenever a
single input bit is complemented (toggled between 0 and 1), each of the
output bits should change with a probability of one half for an arbitrary
selection of the remaining input bits.
To guard against a combination of hash function and input that results in
high rate of collisions, haproxy implements an avalanche algorithm on the
result of the hashing function. In all versions 1.4 and prior avalanche is
always applied when using the consistent hashing directive. It is intended
to provide quite a good distribution for little input variations. The result
is quite suited to fit over a 32-bit space with enough variations so that
a randomly picked number falls equally before any server position, which is
ideal for consistently hashed backends, a common use case for caches.
In all versions 1.4 and prior Haproxy implements the SDBM hashing function.
However tests show that alternatives to SDBM have a better cache
distribution on different hashing criteria. Additional tests involving
alternatives for hash input and an option to trigger avalanche, we found
different algorithms perform better on different criteria. DJB2 performs
well when hashing ascii text and is a good choice when hashing on host
header. Other alternatives perform better on numbers and are a good choice
when using source ip. The results also vary by use of the avalanche flag.
The results of the testing can be found under the tests folder. Here is
a summary of the discussion on the results on 1 input criteria and the
methodology used to generate the results.
A note of the setup when validating the results independently, one
would want to avoid backend server counts that may skew the results. As
an example with DJB2 avoid 33 servers. Please see the implementations of
the hashing function, which can be found in the links under references.
The following was the set up used
(a) hash-type consistent/map-based
(b) avalanche on/off
(c) balanche host(hdr)
(d) 3 criteria for inputs
- ~ 10K requests, including duplicates
- ~ 46K requests, unique requests from 1 MM requests were obtained
- ~ 250K requests, including duplicates
(e) 17 servers in backend, all servers were assigned the same weight
Result of the hashing were obtained across the server via monitoring log
files for haproxy. Population Standard deviation was used to evaluate the
efficacy of the hashing algorithm. Lower standard deviation, indicates
a better distribution of load across the backends.
On 10K requests, when using consistent hashing with avalanche on host
headers, DJB2 significantly out performs SDBM. Std dev on SDBM was 48.95
and DJB2 was 26.29. This relationship is inverted with avalanche disabled,
however DJB2 with avalanche enabled out performs SDBM with avalanche
disabled.
On map-based hashing SDBM out performs DJB2 irrespective of the avalanche
option. SDBM without avalanche is marginally better than with avalanche.
DJB2 performs significantly worse with avalanche enabled.
Summary: The results of the testing indicate that there isn't a hashing
algorithm that can be applied across all input criteria. It is necessary
to support alternatives to SDBM, which is generally the best option, with
algorithms that are better for different inputs. Avalanche is not always
applicable and may result in less smooth distribution.
References:
Mixing Functions/Avalanche: https://papa.bretmulvey.com/post/124027987928/hash-functions
Hash Functions: http://www.cse.yorku.ca/~oz/hash.html