We were calling the reaper from the wait() loop. The problem is that
the OSD has two messengers, and only the first was in wait().. the second
wait() was only called after the first terminated (i.e, when the OSD was
shutting down).
Instead, launch a separate reaper thread when we bind, and close it out
on shutdown right after the accepter.
...even when the op came from another OSD. Not that that should happen
anyway, since we don't forward messages currently. (And can't, since the
OSD doesn't initiate connections to the client!)
If we take too big a bite of data to write in a single writev(2), we can
end up making performance worse, because everyone waits for the full write
to complete. Bigger writes mean better throughput but higher latency.
So, balance the two by placing some upper limit.
Hi
I got a trouble that mkcephfs will have wrong "maxosd" when you have
ceph.conf with OSD ids in random order like:
[osd2]
...
[osd0]
...
[osd1]
...
In this case, you will got "2" for the "maxosd", instead of 3.
After adding a sort, the problem seems solved.
Cheers,
CC Lien
Signed-off-by: CC Lien <cc_lien@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
This was broken by bd4188a02a. @pos needs to
be advanced (it is pass by reference) or else we just overwrite the same
bytes at the journal start over and over again.
Do msgr throttle after peer policy throttle. The msgr (dispatch) throttle
is shortlived and won't deadlock (unless dispatch blocks), so it's safe to
take last. In contrast, the policy throttle carries over the lifetime of
the message, and may block until replication completes or whatever else.
crush_do_rule can return <0 in certain error cases (e.g., forcefed device
does not exist in crush map). We should take that to mean an empty []
result instead of crashing.
Signed-off-by: Sage Weil <sage@newdream.net>
The client has a follows of 0 initially, which is correct (it does follow
0, and there are no prior snaps). But the inode has ->first of 2, which
is also fine. The follows here needs to be at least higher than the
inode first, though, or the caps cloning gets off...
In 551a12f52e we fixed a bug with cow_inode() where the
cap->client_follows didn't match last precisely. Instead, we compare
to first. But the == is too strict.. cap follows that is equal _or_older_
than the clone's first should be copied to the clone inode.
This fixes the simple test case
$ echo asdf > bar ; mkdir .snap/bar ; rm bar ; cat .snap/bar/bar
asdf
(Previously we would get nothing unless we waited for the cap to flush on
its own.)
This fixes pretty core behavior when doing recursion down the tree. I
suspect it was broken when changing the retry behavior.
Signed-off-by: Sage Weil <sage@newdream.net>
We may not want to recursively call crush_choose() if we start out with a
leaf. If that happens, we need to fill out the out2[] vector with
our result immediately.
Signed-off-by: Sage Weil <sage@newdream.net>
Fill in the out2 choose_leaf vector if it's defined. This is necessary
because we may not recursively call choose on out2 if the item we're on is
not a bucket (e.g., when chooseleaf is given the leaf type 0).
Signed-off-by: Sage Weil <sage@newdream.net>