We may get EAGAIN if the osd happens to be down, for example due to
thrashing. Try a few times and then give up.
Note that the other place we try to scrub we don't even check the return
value as we are poking ever pg in the pool. And the scrub commands get
lost due to any peering event, etc.
Signed-off-by: Sage Weil <sage@inktank.com>
This was introduced in 4c99e978a7 and was
incorrect; boot_epoch is the previous epoch the osd booted in, not the
latest map epoch that the OSD currently has.
Signed-off-by: Sage Weil <sage@inktank.com>
If unspecified it is ruleset-root=default and will translate into
take default
when a ruleset is created for an erasure-code pool.
Signed-off-by: Loic Dachary <loic@dachary.org>
This reverts commit 957ac3cbe3.
It's important to assign these for all operations for cases where
g_lockdep isn't yet true when the constructor runs. This is true
for the HeartbeatMap rwlock, among other things, as that thread
is created during early startup before lockdep is enabled. All
of the lockdep hooks assume that they can assign ids on the fly
and not tracking them here breaks things.
Conflicts:
src/common/RWLock.h
test:
Add set_completion*PP() functions to cast arg to correct class
Add return_value checks
Add some reads with buffers larger than object size
Check buffer length on reads
librados:
Make sure *return_value() has bytes read in all cases
Signed-off-by: David Zafman <david.zafman@inktank.com>
Fix consistency of write, append, write_full, all return 0 on success
Include C (rados_*) variants, C++ ctx variants
and aio get_return_value() and rados_aio_get_return_value()
Signed-off-by: David Zafman <david.zafman@inktank.com>
It is possible we will have a dup OSDBoot message queued up in the mon
and will process it again after that osd was marked up and then down. If
that happens, we should ignore this message, not mark the osd back in with
the same address.
Fixes: #8062
Signed-off-by: Sage Weil <sage@inktank.com>
Prevoiusly we assumed that if we had snapset_obc set, !exists on the head
and if we got the snapdir lock we were good to take the head lock too.
This is no the case when:
- delete queued
- takes wr lock on both head and snapdir
- delete commits (but not yet applied)
- stat
- tries to take wr lock on head
- blocks, toggles w=1 state on *head only*
- copy-from
- tries to take wr lock on snapdir, succeeds
- tries to take wr lock on head, fails because w=1
- fails the assert(got)
The problem is that the read and write paths are taking different locks
and we are expecting them to operate in synchrony.
Fix this by using the same ordering for reads as well as write: if the
snapset_obc is defined, take the read lock on that too, just as we do with
a write.
Fixes: #8046
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
If the monitor has been marked as having been part of an existing quorum
and is no longer in the monmap, then it is safe to assume the monitor
was removed from the monmap. In that event, do not allow the monitor
to start, as it will try to find its way into the quorum again (and
someone clearly stated they don't really want them there), unless
'mon force quorum join' is specified.
Fixes: 6789
Backport: dumpling, emperor
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Current code allow importing non-auth caps when inode is being exported.
This can breaks message ordering because the corresponding cap import
messages are sent after the flush session messages. So they can arrive
at clients after clients have already received cap import messages from
new auth MDS of the inode.
The quick fix is ignore MExportCaps when inode is frozen.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
For across authority rename, the MDS first freezes the source inode's
authpin. It happens while the source dentry isn't locked. So when the
inode's authpin become frozen, the source dentry may have changed and
be linked to a different inode.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Use a new probe op to inform mons that they are missing features during
the earliest probe phase. This prevents them from getting as far as
the sync entirely if they are too old.
We still need to refuse to speak to them if they try to call an election,
which they could do based on their replies from other peers.
Note that old clients will assert on getting a message type string they
don't understand, so we need to be careful not to send the probe reply
to older clients. The feature bit we use is not precise in that it does
not cover recent dev releases, but it does work for dumpling and emperor.
Signed-off-by: Sage Weil <sage@inktank.com>
It's just used internally. Make it private in the subclasses since
there's just one level of inheritance.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
get_protocol(), build_request(), build_rotating_request(), and
build_authorizer() can all be declared const now.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
validate_tickets() updates internal state, as does
tickets.get_handler(). Move them into a new method called before
build_request() so build_request() can be declared const.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This allows methods using RWLock for reading to be declared const.
There might be cases where we'd want to take a write lock in a const
method, but right now that's unnecessary, and I'd rather get a compile
error.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
This never does anything since lockdep_register() assigns an id >= 0
in the RWLock constructor. This also prevents methods from being
declared const.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
For cephx, build_authorizer reads a bunch of state (especially the
current session_key) which can be updated by the MonClient. With no
locks held, Pipe::connect() calls SimpleMessenger::get_authorizer()
which ends up calling RadosClient::get_authorizer() and then
AuthClientHandler::bulid_authorizer(). This unsafe usage can lead to
crashes like:
Program terminated with signal 11, Segmentation fault.
0x00007fa0d2ddb7cb in ceph::buffer::ptr::release (this=0x7f987a5e3070) at common/buffer.cc:370
370 common/buffer.cc: No such file or directory.
in common/buffer.cc
(gdb) bt
0x00007fa0d2ddb7cb in ceph::buffer::ptr::release (this=0x7f987a5e3070) at common/buffer.cc:370
0x00007fa0d2ddec00 in ~ptr (this=0x7f989c03b830) at ./include/buffer.h:171
ceph::buffer::list::rebuild (this=0x7f989c03b830) at common/buffer.cc:817
0x00007fa0d2ddecb9 in ceph::buffer::list::c_str (this=0x7f989c03b830) at common/buffer.cc:1045
0x00007fa0d2ea4dc2 in Pipe::connect (this=0x7fa0c4307340) at msg/Pipe.cc:907
0x00007fa0d2ea7d73 in Pipe::writer (this=0x7fa0c4307340) at msg/Pipe.cc:1518
0x00007fa0d2eb44dd in Pipe::Writer::entry (this=<value optimized out>) at msg/Pipe.h:59
0x00007fa0e0f5f9d1 in start_thread (arg=0x7f987a5e4700) at pthread_create.c:301
0x00007fa0de560b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
and
Error in `qemu-system-x86_64': invalid fastbin entry (free): 0x00007ff12887ff20
*** ======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7ff3dea1fa46]
/usr/lib/librados.so.2(+0x29eb03)[0x7ff3e3d43b03]
/usr/lib/librados.so.2(_ZNK9CryptoKey7encryptEP11CephContextRKN4ceph6buffer4listERS4_RSs+0x71)[0x7ff3e3d42661]
/usr/lib/librados.so.2(_Z21encode_encrypt_enc_blIN4ceph6buffer4listEEvP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xfe)[0x7ff3e3d417de]
/usr/lib/librados.so.2(_Z14encode_encryptIN4ceph6buffer4listEEiP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xa2)[0x7ff3e3d41912]
/usr/lib/librados.so.2(_ZN19CephxSessionHandler12sign_messageEP7Message+0x242)[0x7ff3e3d40de2]
/usr/lib/librados.so.2(_ZN4Pipe6writerEv+0x92b)[0x7ff3e3e61b2b]
/usr/lib/librados.so.2(_ZN4Pipe6Writer5entryEv+0xd)[0x7ff3e3e6c7fd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8e)[0x7ff3ded6ff8e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7ff3dea99a0d]
Fix this by adding an rwlock to AuthClientHandler. A simpler fix would
be to move RadosClient::get_authorizer() into the MonClient() under
the MonClient lock, but this would not catch all uses of other
Authorizer, e.g. for verify_authorizer() and it would serialize
independent connection attempts.
This mainly matters for cephx, but none and unknown can have the
global_id reset as well.
Partially-fixes: #6480
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>