I'm pretty sure this was giving inconsistent results across archs,
because bits would get shifted into the high 32 and then back again
on x86_64 but not x86_32.
When replica finds itself fully up to date (last_complete ==
last_update) it tells the primary. Primary checks the same.
If the primary find the min_last_complete_ondisk has changed,
it sends out a trim command.
This will let us drop huge pg logs out of memory after a recovery
without waiting for IO and the usual piggybacked trimming logic
to kick in.
Okay, do not rely on MDS to provide dentry positioning information,
since it is all relative to the start _string_ we provide, and that
can change directory position without notice.
Simplify readdir a bit wrt seeks. A seek to 0, a new frag, or
prior to the current chunk resets buffered state.
For each frag, we walk through chunks, always in order. We set
dentry positions/offsets based on the frag and position within our
sweep across the frag. Successive chunks are grabbed from the MDS
relative to a filename (not offset), so concurrent
insertions/removals don't bother us (although we will not see
insertions lexicographically prior to our position).
The final placement seed needs to factor in pool, but that can't be
fed into stable_mod or you get weird results (for example, 1.ff and
1.adff won't necessary map to the same thing because of the
stable_mod). Add pool to the stable_mod result, instead. The seed
itself doesn't need to be bounded; it's just an input for CRUSH.
Just so long as there are a limited number of such inputs for a given
pool.
Needs to factor in frag_is_leftmost to account for . and .., just
like the fi->offset calculation in readdir_prepopulate. Fixes the
problem where an ls on a large dir returns duplicate entries.
This is mainly just because /bin/ls will use the size, or blocks,
or blksize to decide how big of a buffer to allocate for getdents,
and the default of 4MB is unreasonably big. 64k seems like an
okay number, I guess.
We would get incorrect results if we calculated the same mapping
twice in a row in certain cases. Der. Also, the permutation
calculation was basically just wrong.