Use blkid to give us the GPT partition type. This lets us distinguish
between dmcrypt and non-dmcrypt partitions. Fake it if blkid doesn't
give us what we want and try with sgdisk. This isn't perfect (it can't
tell between dmcrypt and not dmcrypt), but such is life, and we are better
off than before.
Signed-off-by: Sage Weil <sage@redhat.com>
This reverts commit 673394702b.
This appears to break things when the journal and data disk are *not* the same.
And I can't seem to reproduce the original failure...
Signed-off-by: Sage Weil <sage@redhat.com>
We only need to verify that partitions aren't in use when we want to
consume the whole device (osd data), not when we want to create an
additional partition for ourselves (osd journal).
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
Take the self-aliveness checks out of require_same_or_newer_map() and use
the new function for that and for require_up_osd_peer().
Signed-off-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit e179e9227b)
This checks both that a Message originates from an OSD, and that the OSD
is up in the given map epoch.
We use it in handle_replica_op so that we don't inadvertently add operations
from down peers, who might or might not know it.
Signed-off-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit ccd0eec501)
If you call erase() on a multiset it will delete all instances of a value;
we only want to delete one of them. Fix this by passing an iterator.
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
The dout_prefix for OSDService uses get_osdmap() to grab a shared_ptr for
the epoch printout. The OSD one does not, and is not safe to run in all
thread contexts.
In particular, update_osd_stat() is run by the heartbeat thread and can
race with the shared_ptr itself being updated with a new map.
Ironically, if this were simply an OSDMap*, there would be no race since
the pointer is a single word and updates atomically.
Fix this, and any similar issues, by moving the OSDService methods up in
OSD.cc so that they use the safe dout macro.
Fixes: #8998
Backport: firefly (in a minimal form, I think!)
Signed-off-by: Sage Weil <sage@redhat.com>
need_to_wait wasn't passed into processor->throttle_data(). This was
broken in fix for #8937.
CID 1229541: (PW.PARAM_SET_BUT_NOT_USED)
Backport: firefly
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit e93818df33)
Fixes: #8937
Following the fix to #8928 we end up accumulating pending data that
needs to be written. Beforehand it was working fine because we were
feeding it with the exact amount of bytes we were writing.
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 0553890e79)
This makes it deterministic whether we output
2014-08-03 20:59:45.482614 4036c80 -1 did not load config file, using default settings.
or not, and will make the unit tests stop intermittently failing.
Signed-off-by: Sage Weil <sage@redhat.com>
Avoid this deadlock:
- a fault
- delay thread entry gets a fast dispatch message
- drops delay_lock
- calls into fast_dispatch
- reaper tries to reap the pipe
- pipe->join()
- delay_thread->join()
- blocks waiting for delay_thread to exit
- delay thread / fast dispatch blocks on msgr->lock trying to mark_down
The solution is to drop the msgr lock while joining the thread. This will
allow the join() to complete. Adjust the reaper thread to recheck the
exit condition since the lock may have been dropped. The other two callers
do not care.
Fixes: #8891
Signed-off-by: Sage Weil <sage@redhat.com>
The get_priv() call returns a ref; make sure we drop it if it exists.
This doesn't happen on every run because usually it is NULL and we take
the other path; it's only after the OSD has been marked down that we reach
the second path.
Signed-off-by: Sage Weil <sage@redhat.com>
Give users a clue when cache pools are enabled but the hit_set is not
configured. Note that technically this will work, but not well, so for
now let's just steer them away.
Signed-off-by: Sage Weil <sage@redhat.com>
If there is no hit set for a PG, blindly evict objects. This avoids an
assert(hit_set) in agent_estimate_atime_temp().
Fixes: #8982
Signed-off-by: Sage Weil <sage@redhat.com>
It is probably not a good idea to try to run the tiering agent without a
hit_set to inform its actions, but it is technically possible. For
example, one could simply blindly evict when we reach the full point.
However, this doesn't work because the agent mode is guarded by a hit_set
check, even though agent_setup() is not. Fix that.
Signed-off-by: Sage Weil <sage@redhat.com>
If the client is old and doesn't understand tiering, don't let them use a
tiered pool. Reply with EOPNOTSUPP.
Fixes: #8714
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
Newer versions of json.tool remove the trailing ' ' after the comma. Add
it back in with sed so that the .t works on both old and new versions, and
so that we don't have to remove the trailing spaces from all of the test
cases.
Backport: firefly
Fixes: #8920
Signed-off-by: Sage Weil <sage@redhat.com>
Fixes: #8586
This fixes error handling, in accordance with commit 6af5a537 that fixed
the same issue for the S3 case.
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
To get back to the reformatting procedure that otherwise
occurs during MDLog::open, introduce an MDLog::reopen call
that MDS can use in the standbyreplay->standby transition
for the special case where the journal is old.
Fixes: #8869
Signed-off-by: John Spray <john.spray@redhat.com>
* Make boot_start private.
* Define boot stages in enum, replace int with type.
* Merge steps 0 and 1, 0 always fell through to 1.
* starting_done was only ever reached by a fall through
from the previous step, so call it directly from there.
Signed-off-by: John Spray <john.spray@redhat.com>
Refactor to:
* have somewhere to put some logic for doing
background recovery in future.
* trim a few lines from the oversized MDCache.cc
whereever we can.
Signed-off-by: John Spray <john.spray@redhat.com>