OPTIM: tools: optimize my_ffsl() for x86_64

This call is now used quite a bit in the fd cache, to decide which cache
to add/remove the fd to/from, when waking up a task for a single thread
in __task_wakeup(), in fd_cant_recv() and in fd_process_cached_events(),
and we can replace it with a single instruction, removing ~30 instructions
and ~80 bytes from the inner loop of some of these functions.

In addition the test for zero value was replaced with a comment saying
that it is illegal and leads to an undefined behaviour. The code does
not make use of this useless case today.
This commit is contained in:
Willy Tarreau 2018-10-10 19:05:56 +02:00
parent 2325d8af93
commit 27346b01aa

View File

@ -802,13 +802,16 @@ static inline unsigned int my_popcountl(unsigned long a)
} }
/* Simple ffs implementation. It returns the position of the lowest bit set to /* Simple ffs implementation. It returns the position of the lowest bit set to
* one. */ * one. It is illegal to call it with a==0 (undefined result).
*/
static inline unsigned int my_ffsl(unsigned long a) static inline unsigned int my_ffsl(unsigned long a)
{ {
unsigned int cnt; unsigned long cnt;
if (!a) #if defined(__x86_64__)
return 0; __asm__("bsr %1,%0\n" : "=r" (cnt) : "rm" (a));
cnt++;
#else
cnt = 1; cnt = 1;
#if LONG_MAX > 0x7FFFFFFFL /* 64bits */ #if LONG_MAX > 0x7FFFFFFFL /* 64bits */
@ -837,6 +840,7 @@ static inline unsigned int my_ffsl(unsigned long a)
a >>= 1; a >>= 1;
cnt += 1; cnt += 1;
} }
#endif /* x86_64 */
return cnt; return cnt;
} }