The lockfile class relies on file system trickery to get safe mutual
exclusion. However, the unix syscalls do this for us. More
importantly, the unix locks go away when the owning process dies, which
is behavior that we want here.
Fixes: #5387
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
When you get device names like sdaa you do not want to mistakenly conclude that
sdaa is a partition of sda. Use /sys/block/$device/$partition existence
instead.
Fixes: #5211
Backport: cuttlefish
Signed-off-by: Alexandre Maragone <alexandre.maragone@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
We take the fdcache_lock while holding onto index objects
elsewhere in the code.
Fixes: #5389
Reviewed-by: David Zafman <david.zafman@inktank.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
PGLog::rewind_divergent_log is dereferencing iterator "p" though it is
already past the end of its container. When entering the loop for the
first time, p is log.log.end() and must not be dereferenced.
mark_dirty_from must only be called after p--. It
will not rewind past begin() because of the
if (p == log.log.begin())
test above.
http://tracker.ceph.com/issues/5398fixes#5398
Signed-off-by: Loic Dachary <loic@dachary.org>
The tests covers 100% of the LOC of proc_replica_log. It is broken down
in 7 cases to enumerate all the situations it must address. Each case
is isolated in a independant code block where the conditions are
reproduced.
All tests are done on omissing and oinfo because they are the only
data structures that can be modified by proc_replica_log.
The first case is a noop and checks that only last_complete gets
updated when there are no logs.
The following case includes entries that are supposed to be ignored (
x7, x8 and xa ), however this is not an actual proof that the code
ignoring them is actually run : it only shows in the code coverage
report.
The log entry (1,3) modifies the object x9 but the olog entry
(2,3) deletes it : log is authoritative and the object is added
to missing. x7 is divergent and ignored. x8 has a more recent
version in the log and the olog entry is ignored. xa is past
last_backfill and ignored.
The other cases are a variation of the first case with minimal changes
to make them easier to understand and adapt. For instance most of them
start with a tail that is the same ( object with hash x5 and both at
version 1,1 ).
http://tracker.ceph.com/issues/5213 refs #5213
Signed-off-by: Loic Dachary <loic@dachary.org>
The function is made const by replacing a single call to log.objects[]
with log.objects.find. The olog argument is also a const and does not
require any change.
http://tracker.ceph.com/issues/5213 refs #5213
Signed-off-by: Loic Dachary <loic@dachary.org>
Treat this as an extension of the recovery process, e.g.
RECOVERING -> ACTIVE
or
RECOVERING -> UPDATING_PREVIOUS -> ACTIVE
and we are not active until we get to "the end" in both cases.
Signed-off-by: Sage Weil <sage@inktank.com>
- make states mutually exclusive (an enum)
- rename locked -> updating_previous
- set state prior to begin() to simplify things a bit
Signed-off-by: Sage Weil <sage@inktank.com>
If we are re-proposing a previously accepted value from a previous quorum,
we should not consider it readable, because it is possible it was exposed
to clients as committed (2/3 accepted) but not recored to be committed, and
we do not want to expose old state as readable when new state was
previously readable.
Signed-off-by: Sage Weil <sage@inktank.com>
The update_from_paxos() methods occasionally like to trigger new activity.
As long as they check is_readable() and is_writeable(), they will defer
until we go active and that activity will happen in the normal callbacks.
This fixes the problem where we active but is_writeable() is still false,
triggered by PGMonitor::check_osd_map().
Signed-off-by: Sage Weil <sage@inktank.com>
Do the paxos refresh inside finish_proposal, ordered *after* the leader
assertion so that MonmapMonitor::update_from_paxos() calling bootstrap()
does not kill us.
Also, remove unnecessary finish_queued_proposal() and move the logic inline
where the bad leader assertion is obvious.
Signed-off-by: Sage Weil <sage@inktank.com>
The refresh() will do this when the state changes; no need to
opportunistically call this method all of the time.
Signed-off-by: Sage Weil <sage@inktank.com>
Instead of opportunistically calling each service's update_from_paxos(),
instead explicitly refresh all in-memory state whenever we know the
paxos state may have changed. This is simpler and less fragile.
Signed-off-by: Sage Weil <sage@inktank.com>
fadvise(DONTNEED) on XFS can break writeback ordering and zeroing; see
http://oss.sgi.com/archives/xfs/2013-06/msg00066.html
If we detect XFS, turn this option off.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
client/Client.cc: In member function 'int Client::_read_sync(Fh*, uint64_t, uint64_t, ceph::bufferlist*)':
warning: client/Client.cc:5831:13: comparison between signed and unsigned integer expressions [-Wsign-compare]
Signed-off-by: Sage Weil <sage@inktank.com>
First of all, we must find a monmap to backup. The newest version.
Secondly, we must make sure we back it up before clearing the store.
Finally, we must make sure that we don't remove said backup while
clearing the store; otherwise, we would be out of a backup monmap if the
sync happened to fail (and if the monitor happened to be killed before a
new sync had finished).
This patch makes sure these conditions are met.
Fixes: #5256 (partially)
Backport: cuttlefish
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Always use the highest version amongst all the typically available
monmaps: whatever we have in memory, whatever we have under the
MonmapMonitor's store, and whatever we have backed up from a previous
sync. This ensures we always use the newest version we came across
with.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Otherwise, we will end up losing the monmap we backed up when we started
the sync, and the monitor may be unable to start if it is killed or
crashes in-between the sync abort and finishing a new sync.
Fixes: #5256 (partially)
Backport: cuttlefish
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The %ghost %dir ... line will make this get cleaned up but won't install
it.
Reported-by: Derek Yarnell <derek@umiacs.umd.edu>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
lfn_open() is called with indexes locked, so we cannot lock
and index under fdcache_lock.
Fixes: #5389
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>