If we short-circuit and bootstrap, cancel our timer. Otherwise it will
go off some time later when we are in who knows what state.
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
If we are probing and get (say) an election timeout that calls reset(),
cancel the timer. Otherwise, we assert later with a splat like
2013-06-24 01:09:33.675882 7fb9627e7700 4 mon.b@0(leader) e1 probe_timeout 0x307a520
2013-06-24 01:09:33.676956 7fb9627e7700 -1 mon/Monitor.cc: In function 'void Monitor::probe_timeout(int)' thread 7fb9627e7700 time 2013-06-24 01:09:43.675904
mon/Monitor.cc: 1888: FAILED assert(is_probing() || is_synchronizing())
ceph version 0.64-613-g134d08a (134d08a965)
1: (Monitor::probe_timeout(int)+0x161) [0x56f5c1]
2: (Context::complete(int)+0xa) [0x574a2a]
3: (SafeTimer::timer_thread()+0x425) [0x7059a5]
4: (SafeTimerThread::entry()+0xd) [0x7065dd]
5: (()+0x7e9a) [0x7fb966f62e9a]
6: (clone()+0x6d) [0x7fb9652f9ccd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Fixes: #5438
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
The create_initial() method may get called multiple times; make sure it
will unconditionally generate new/initial rotating keys. Move the block
up so that we can easily assert as much.
Broken by commit cd98eb0c65.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
ceph-mon recently started using Preforker to working around forking issues.
As a result, internal_safe_to_start_threads got set sooner and calls to
pick_addresses() which try to set string config values now fail because
there are no config observers for them.
Work around this by observing the change while we adjust the value. We
assume pick_addresses() callers are smart enough to realize that their
result will be reflected by cct->_conf and not magically handled elsewhere.
Fixes: #5195, #5205
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
I think I assumed no_reply() was releasing the references, but it is
not. Which is better, since send_reply() doesn't either. Fix the leaks
by dropping the message ref explicitly.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
We need to discard/cancel/free the failure report messages before we
cancel a report out. Assert in the dtor to ensure we didn't forget.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Previously we would just dump the command argument to our local log client
and reply immediately, which could lose the message if we then restarted.
Instead, commit directly and wait before replying.
Also, log as the actual client, not as the monitor processing the message.
Fixes: #5409
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
We can get random messages to stderror from socket reconnects and such;
discard those if we are looking at stderr in the test.
Signed-off-by: Sage Weil <sage@inktank.com>
When you get device names like sdaa you do not want to mistakenly conclude that
sdaa is a partition of sda. Use /sys/block/$device/$partition existence
instead.
Fixes: #5211
Backport: cuttlefish
Signed-off-by: Alexandre Maragone <alexandre.maragone@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
fadvise(DONTNEED) on XFS can break writeback ordering and zeroing; see
http://oss.sgi.com/archives/xfs/2013-06/msg00066.html
If we detect XFS, turn this option off.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
First of all, we must find a monmap to backup. The newest version.
Secondly, we must make sure we back it up before clearing the store.
Finally, we must make sure that we don't remove said backup while
clearing the store; otherwise, we would be out of a backup monmap if the
sync happened to fail (and if the monitor happened to be killed before a
new sync had finished).
This patch makes sure these conditions are met.
Fixes: #5256 (partially)
Backport: cuttlefish
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Always use the highest version amongst all the typically available
monmaps: whatever we have in memory, whatever we have under the
MonmapMonitor's store, and whatever we have backed up from a previous
sync. This ensures we always use the newest version we came across
with.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Otherwise, we will end up losing the monmap we backed up when we started
the sync, and the monitor may be unable to start if it is killed or
crashes in-between the sync abort and finishing a new sync.
Fixes: #5256 (partially)
Backport: cuttlefish
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The %ghost %dir ... line will make this get cleaned up but won't install
it.
Reported-by: Derek Yarnell <derek@umiacs.umd.edu>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
common/Preforker.h: In member function ‘int Preforker::signal_exit(int)’:
warning: common/Preforker.h:82:45: ignoring return value of ‘ssize_t safe_write(int, const void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
This is harder than it should be to fix. :(
http://stackoverflow.com/questions/3614691/casting-to-void-doesnt-remove-warn-unused-result-error
Whatever, I guess we can do something useful with this return value.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
client/Client.cc: In member function 'virtual void Client::ms_handle_remote_reset(Connection*)':
warning: client/Client.cc:7892:9: enumeration value 'STATE_NEW' not handled in switch [-Wswitch]
warning: client/Client.cc:7892:9: enumeration value 'STATE_OPEN' not handled in switch [-Wswitch]
warning: client/Client.cc:7892:9: enumeration value 'STATE_CLOSED' not handled in switch [-Wswitch]
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
If we get a reset during our attempt to open an MDS session, close out the
Connection* and retry to open the session, moving the waiters over.
Fixes: #5379
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
The optional epoch argument was missing from the command spec.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>