We were calling the reaper from the wait() loop. The problem is that
the OSD has two messengers, and only the first was in wait().. the second
wait() was only called after the first terminated (i.e, when the OSD was
shutting down).
Instead, launch a separate reaper thread when we bind, and close it out
on shutdown right after the accepter.
...even when the op came from another OSD. Not that that should happen
anyway, since we don't forward messages currently. (And can't, since the
OSD doesn't initiate connections to the client!)
If we take too big a bite of data to write in a single writev(2), we can
end up making performance worse, because everyone waits for the full write
to complete. Bigger writes mean better throughput but higher latency.
So, balance the two by placing some upper limit.
Hi
I got a trouble that mkcephfs will have wrong "maxosd" when you have
ceph.conf with OSD ids in random order like:
[osd2]
...
[osd0]
...
[osd1]
...
In this case, you will got "2" for the "maxosd", instead of 3.
After adding a sort, the problem seems solved.
Cheers,
CC Lien
Signed-off-by: CC Lien <cc_lien@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
This was broken by bd4188a02a. @pos needs to
be advanced (it is pass by reference) or else we just overwrite the same
bytes at the journal start over and over again.
Do msgr throttle after peer policy throttle. The msgr (dispatch) throttle
is shortlived and won't deadlock (unless dispatch blocks), so it's safe to
take last. In contrast, the policy throttle carries over the lifetime of
the message, and may block until replication completes or whatever else.
crush_do_rule can return <0 in certain error cases (e.g., forcefed device
does not exist in crush map). We should take that to mean an empty []
result instead of crashing.
Signed-off-by: Sage Weil <sage@newdream.net>
The client has a follows of 0 initially, which is correct (it does follow
0, and there are no prior snaps). But the inode has ->first of 2, which
is also fine. The follows here needs to be at least higher than the
inode first, though, or the caps cloning gets off...
In 551a12f52e we fixed a bug with cow_inode() where the
cap->client_follows didn't match last precisely. Instead, we compare
to first. But the == is too strict.. cap follows that is equal _or_older_
than the clone's first should be copied to the clone inode.
This fixes the simple test case
$ echo asdf > bar ; mkdir .snap/bar ; rm bar ; cat .snap/bar/bar
asdf
(Previously we would get nothing unless we waited for the cap to flush on
its own.)
This fixes pretty core behavior when doing recursion down the tree. I
suspect it was broken when changing the retry behavior.
Signed-off-by: Sage Weil <sage@newdream.net>
We may not want to recursively call crush_choose() if we start out with a
leaf. If that happens, we need to fill out the out2[] vector with
our result immediately.
Signed-off-by: Sage Weil <sage@newdream.net>
Fill in the out2 choose_leaf vector if it's defined. This is necessary
because we may not recursively call choose on out2 if the item we're on is
not a bucket (e.g., when chooseleaf is given the leaf type 0).
Signed-off-by: Sage Weil <sage@newdream.net>
[ The following text is in the "UTF-8" character set. ]
[ Your display is set for the "iso-8859-1" character set. ]
[ Some characters may be displayed incorrectly. ]
inspired by the addition to
http://ceph.newdream.net/wiki/Snapshots about the snapdirname
option i've created a patch for the mount.ceph manpage
- Thomas
Signed-off-by: Sage Weil <sage@newdream.net>
This should avoid
#0 0x00007f41b1a18a75 in raise () from /lib/libc.so.6
#1 0x00007f41b1a1c5c0 in abort () from /lib/libc.so.6
#2 0x00007f41b22cd8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3 0x00007f41b22cbd16 in ?? () from /usr/lib/libstdc++.so.6
#4 0x00007f41b22cbd43 in std::terminate() () from /usr/lib/libstdc++.so.6
#5 0x00007f41b22cbe3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#6 0x00000000005b39f8 in ceph::__ceph_assert_fail (assertion=0x5ec3b2 "seq >= last_committed_seq", file=<value optimized out>, line=711, func=<value optimized out>) at common/assert.cc:30
#7 0x00000000005649e1 in FileJournal::committed_thru (this=0x1116310, seq=0) at os/FileJournal.cc:711
#8 0x000000000055d265 in JournalingObjectStore::commit_finish (this=0x1125740) at os/JournalingObjectStore.cc:186
#9 0x00000000005543f3 in FileStore::sync_entry (this=0x1125740) at os/FileStore.cc:1714
#10 0x00000000004ef93d in FileStore::SyncThread::entry() ()
#11 0x0000000000469a4a in Thread::_entry_func (arg=0x6315) at ./common/Thread.h:39
#12 0x00007f41b28ab9ca in start_thread () from /lib/libpthread.so.0
#13 0x00007f41b1acb6cd in clone () from /lib/libc.so.6
#14 0x0000000000000000 in ?? ()
Signed-off-by: Sage Weil <sage@newdream.net>