We have a request that is queued before a pool exists, there is one
epoch where it exists, and then the pool disappears again. The two maps
are processed at the same time. For the first we set needs_resend, map to
an osd, and remove from the homeless sessin. For the second, the pool
dne, we set osd back to -1, and send a map check request. Finally,
handle_osd_maps scans need_resend, sees the pool dne, and removes from
need_resend. The difference from the "usual" case is that we are neither
on the need_resend list nor on the homeless session.
Fix this by concluding immediately that the pool existed (briefly) and
then no longer exists.
Fixes: http://tracker.ceph.com/issues/19552
Signed-off-by: Sage Weil <sage@redhat.com>
If the PG moves we will reconnect and fail to time out.
Wait for longer so that we mask the effects of osd thrashing.
Fixes: http://tracker.ceph.com/issues/19433
Signed-off-by: Sage Weil <sage@redhat.com>
Stop the MgrClient callbacks to report PG stats at the
start of shutdown() so that we don't get a callback
during/after we are done shutting down.
Protect the cb update with the MgrClient's lock so that
we don't race with MgrClient::send_report() itself.
Fixes: http://tracker.ceph.com/issues/19638
Signed-off-by: Sage Weil <sage@redhat.com>
i have following error when compiling with gcc-6 on armhf
ceph/src/crush/CrushCompiler.cc: In member function 'int
CrushCompiler::decompile(std::ostream&)':
ceph/src/crush/CrushCompiler.cc:462:45: error: invalid initialization of
non-const reference of type 'std::pair<const long unsigned int,
crush_choo
se_arg_map>&' from an rvalue of type 'std::pair<const long unsigned int,
crush_choose_arg_map>'
int ret = decompile_choose_args(i, out);
Signed-off-by: Kefu Chai <kchai@redhat.com>
otherwise the bluestore tests will fail with failures like
qa/workunits/cephtool/test.sh:1343: test_mon_osd_pool: ceph osd pool set ec_test allow_ec_overwrites true
Error EINVAL: pool must only be stored on bluestore for scrubbing to work: osd.1 uses filestore
qa/workunits/cephtool/test.sh:1343: test_mon_osd_pool: return 1
Signed-off-by: Kefu Chai <kchai@redhat.com>
Previously at startup we saw contradictory status
that had a "no active mgr" health message and then
a line that said "active: x".
Mitigate that a bit by indicating that a !available
active daemon is starting up.
Signed-off-by: John Spray <john.spray@redhat.com>
Let's be consistent and push the m->put down into
the handle_(M<class>* m) functions the way
we do in the rest of the codebase.
Signed-off-by: John Spray <john.spray@redhat.com>
I think we got away with this when Client::init was
re-initialising the MonClient, but now that (bogus)
stuff isn't happening any more as Client doesn't
init/teardown monc/objecter itself.
Signed-off-by: John Spray <john.spray@redhat.com>
Run the python module calls in a finisher so that
they don't block the daemonserver lock and so that
they can call back into mgr stuff if they need to.
Fix passing through commands to python modules, this
was giving EINVAL because only things with a MgrCommand
were getting let in.
Also fix get_command_descriptions, which was not
including the output of the formatter in the response.
Signed-off-by: John Spray <john.spray@redhat.com>
This was causing mons to send far more digest messages than
should have been sent. Could have been responsible for
reports of very high CPU consumption on the mgr daemon.
Fixes: http://tracker.ceph.com/issues/18994
Signed-off-by: John Spray <john.spray@redhat.com>
Two Finishers should not be considered equivalent
for lockdep purposes: for example in mgr we have
our in-mgr Finisher, and then any finishers that
might be in libraries called by modules.
Signed-off-by: John Spray <john.spray@redhat.com>