MINOR: intops: add a function to return a valid bit position from a mask

Sometimes we need to be able to signal one thread among a mask, without caring much about which bit will be picked. At the moment we use ffsl() for this but this sometimes results in imbalance at certain specific places where the same first thread in a set is always the same one that is selected. Another approach would consist in using the rank finding function but it requires a popcount and a setup phase, and possibly a modulo operation depending on the popcount, which starts to be very expensive. Here we take a different approach. The idea is an input bit value is passed, from 0 to LONGBITS-1, and that as much as possible we try to pick the bit matching it if it is set. Otherwise we look at a mirror position based on a decreasing power of two, and jump to the side that still has bits left. In 6 iterations it ends up spotting one bit among 64 and the operations are very cheap and optimizable. This method has the benefit that we don't care where the holes are located in the mask, thus it shows a good distribution of output bits based on the input ones. A long-time test shows an average of 16 cycles, or ~4ns per lookup at 3.8 GHz, which is about twice as fast as using the rank finding function. Just like for that one, the code was stored into tools.c since we don't have a C file for intops.
2022-06-21 20:19:54 +02:00 · 2022-06-21 20:19:54 +02:00 · c7a8a3c7bd
parent 0a012aa16b
commit c7a8a3c7bd
2 changed files with 55 additions and 0 deletions
--- a/include/haproxy/intops.h
+++ b/include/haproxy/intops.h
@ -47,6 +47,7 @@ unsigned int mask_find_rank_bit_fast(unsigned int r, unsigned long m,
 void mask_prep_rank_map(unsigned long m,
                        unsigned long *a, unsigned long *b,
                        unsigned long *c, unsigned long *d);
+int one_among_mask(unsigned long v, int bit);


 /* Multiply the two 32-bit operands and shift the 64-bit result right 32 bits.
--- a/src/tools.c
+++ b/src/tools.c
@ -3156,6 +3156,60 @@ void mask_prep_rank_map(unsigned long m,
 	*d = (*c + (*c >> 8)) & ~0UL/0x101;
 }

+/* Returns the position of one bit set in <v>, starting at position <bit>, and
+ * searching in other halves if not found. This is intended to be used to
+ * report the position of one bit set among several based on a counter or a
+ * random generator while preserving a relatively good distribution so that
+ * values made of holes in the middle do not see one of the bits around the
+ * hole being returned much more often than the other one. It can be seen as a
+ * disturbed ffsl() where the initial search starts at bit <bit>. The look up
+ * is performed in O(logN) time for N bit words, yielding a bit among 64 in
+ * about 16 cycles. Its usage differs from the rank find function in that the
+ * bit passed doesn't need to be limited to the value's popcount, making the
+ * function easier to use for random picking, and twice as fast. Passing value
+ * 0 for <v> makes no sense and -1 is returned in this case.
+ */
+int one_among_mask(unsigned long v, int bit)
+{
+	/* note, these masks may be produced by ~0UL/((1UL<<scale)+1) but
+	 * that's more expensive.
+	 */
+	static const unsigned long halves[] = {
+		(unsigned long)0x5555555555555555ULL,
+		(unsigned long)0x3333333333333333ULL,
+		(unsigned long)0x0F0F0F0F0F0F0F0FULL,
+		(unsigned long)0x00FF00FF00FF00FFULL,
+		(unsigned long)0x0000FFFF0000FFFFULL,
+		(unsigned long)0x00000000FFFFFFFFULL
+	};
+	unsigned long halfword = ~0UL;
+	int scope = 0;
+	int mirror;
+	int scale;
+
+	if (!v)
+		return -1;
+
+	/* we check if the exact bit is set or if it's present in a mirror
+	 * position based on the current scale we're checking, in which case
+	 * it's returned with its current (or mirrored) value. Otherwise we'll
+	 * make sure there's at least one bit in the half we're in, and will
+	 * scale down to a smaller scope and try again, until we find the
+	 * closest bit.
+	 */
+	for (scale = (sizeof(long) > 4) ? 5 : 4; scale >= 0; scale--) {
+		halfword >>= (1UL << scale);
+		scope |= (1UL << scale);
+		mirror = bit ^ (1UL << scale);
+		if (v & ((1UL << bit) | (1UL << mirror)))
+			return (v & (1UL << bit)) ? bit : mirror;
+
+		if (!((v >> (bit & scope)) & halves[scale] & halfword))
+			bit = mirror;
+	}
+	return bit;
+}
+
 /* Return non-zero if IPv4 address is part of the network,
 * otherwise zero. Note that <addr> may not necessarily be aligned
 * while the two other ones must.