If someone is syncing from us and there is an election, they currently get
reset and have to restart their sync. This can lead to situations where
they can never finish, e.g., when the load from them syncing makes us time
out commits and call elections.
There is nothing that changes during bootstrap that would prevent a sync
from proceeding. The only time we need to stop providing is when we
ourselves decide to sync from someone else; modify that reset call to
reset provider state. All other resets become requester resets.
Signed-off-by: Sage Weil <sage@inktank.com>
Fixes: #5793
Beforehand all remote copies were going to the master region
which was awfully wrong.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Fixes: #5789
This was fixed before, however, might have been lost due to
recent refactoring + merge.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Need to do the action through the bucket instance handler
and not through the bucket handler, otherwise it's wrongly
recorded (and wrongly replayed, ouch).
Fixes: #5791
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
We now keep the bucket instance oid in rgw_bucket. The reason
we need it is that the bucket might have been created before
the entrypoint / bucket instance separation.
Fixes: #5790
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Use the correct type for the dumpcontents arg. Fixes the dump_pools_json
output and avoids these errors:
2013-07-29 13:09:14.089188 7fa0c5d21700 -1 0x7fa0c5d1e7a8
2013-07-29 13:09:16.306560 7fa0c5d21700 -1 bad boost::get: key dumpcontents is not type std::vector<std::string, std::allocator<std::string> >
2013-07-29 13:09:16.317104 7fa0c5d21700 -1 0x7fa0c5d1e7a8
2013-07-29 13:09:16.317136 7fa0c5d21700 -1 bad boost::get: key dumpcontents is not type std::vector<std::string, std::allocator<std::string> >
Fixes: #5786
Signed-off-by: Sage Weil <sage@inktank.com>
When trying to establish if the old acting set is either empty or
smaller than the min_size of the osdmap,
pg_interval_t::check_new_interval compares with the min_size of the
new osdmap. Since the goal is to try to determine if the previous
interval may have been writeable, it should not enter the if when
there were not enough osds in the acting set ( i.e. < min_size ). But
it may enter it anyway if min_size was decremented in the new osdmap.
A complete set of unit tests were added to cover the logic of
check_new_interval. The parameters are prepared to describe a
situation where the function returns false (i.e. no new
interval). Each case is described in a separate bloc that introduces
the minimal changes to demonstrate the intended test case.
Because a number of cases have the same output while implementing a
different logic, the debug output is parsed to differentiate between them.
A test case demonstrating the problem ( check_new_interval must
compare old acting with old osdmap ) is added, with a link to the bug
number for future reference. The problem is fixed. The text of two
debug messages are slightly changed to make the maintenance of the
test that match them easier.
http://tracker.ceph.com/issues/5780 refs #5780
Signed-off-by: Loic Dachary <loic@dachary.org>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
If N op_tp threads are configured, and recovery_max_active
is set to a sufficiently large number, all N op_tp threads
might grab a MOSDPGPush op off of the queue for the same PG.
The last thread to get the lock will have waited
N*time_to_handle_push before completing its item and pinging
the heartbeat timeout. If that time exceeds the timeout
and there are enough ops waiting, each thread subsequently
will end up exceeding the timeout before completeing an
item preventing the OSD from heartbeating indefinitely.
We prevent this by suspending the timeout while we try to
get the PG lock. Even if we do block for an excessive
period of time attempting to get the lock, hopefully,
the thread holding the lock will cause the threadpool
to time out.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
If there is a gap in our map history, get the full range of maps that
the mon has. Make sure the first one is a full map.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
If we have map 250, and monitor's first is 251, but sends 260, we can
request the intervening range.
Fixes: #5784
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Added implemented but not listed commands to the help/usage text:
* -g shortcut for --gen-key
* -a shortcut for --add-key
* -u/--set-uid to set auid
* --gen-print-key
* --import-keyring
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
A single counter ( waiting ) accurately reflects the number of
waiters, regardless of the method waiting. It is enough to allow
unit tests to synthetise all situations, including:
T1: x = lookup_or_create(0)
T1: release x part 1 (weak_ptrs now fail to lock)
T2: y = lookup_or_create(0)
T2: block in lookup_or_create (waiting == 1)
T1: z = lookup_or_create(1) (does not block because the key is different)
while holding the lock it waiting++ and waiting == 2
and before returning it waiting-- and waiting is back to == 1
T1: complete release x
T2: complete lookup_or_create(0) (waiting == 0)
The unit tests are modified to add a lookup on an unrelated key to
demonstrate that it does not reset waiting counter.
http://tracker.ceph.com/issues/5527 refs #5527
Signed-off-by: Loic Dachary <loic@dachary.org>
Covers 100% of the LOC and all the expected behavior, including thread
safety.
The sharedptr_registry is made friend of the test class so that it can
synthetize race conditions. The lookup and lookup_or_create methods
set the new in_method data member before calling cond.Wait() so that
the caller knows it is waiting.
http://tracker.ceph.com/issues/5527 refs #5527
Signed-off-by: Loic Dachary <loic@dachary.org>
OMG libfcgi is annoying with shutdown and signals. You need to close
the fd *and* resend a signal to ensure that you kick the accept loop
hard enough to make it shut down.
Document this, and switch to the async signal handlers. Put them
tightly around the runtime loop as we do with other daemons.
Signed-off-by: Sage Weil <sage@inktank.com>
Hidden commands have no help string. Make this consistent: the m_help
entry is always there, but may be empty.
Signed-off-by: Sage Weil <sage@inktank.com>