- fix the wait check for osds to come back up
- make sure they get marked back in, too
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
mon: fix check for primary-affinity feature bit, and fix a race in similar checks
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
For erasure pools, these may not match.
In the case of #7652, this caused pg_create messages to be send
indefinitely. register_pg() added it to the list for acting_primary, and
when we got the (non-creating) pg stat update we removed it from the list
for acting[0].
Fixes: #7652
Signed-off-by: Sage Weil <sage@inktank.com>
The check for OSD features may race with the boot of an OSD that does not
have the necessary features. Check the pending info too, and if there is
a missing feature, return -EAGAIN. In the callers, wait on -EAGAIN.
Signed-off-by: Sage Weil <sage@inktank.com>
Make sure all running OSDs support the feature before we start using it
(even if the config option is on!).
Fixes: #7642
Signed-off-by: Sage Weil <sage@inktank.com>
'rados cppool' copies the contents but that doesn't make the destination
pool an unmanaged snaps pool. Therefore, we must get an ENOTSUP when
we try to remove an unmanaged snap from a not-unmanaged pool.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Although we should allow creating unmanaged snaps on not-unamanaged pools,
as long as those pools don't have any managed snapshots in them, we cannot
allow removal -- because the pool will not have any unmanaged snapshots.
Fixes: 7210
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
We had an old invariant that agent_queue would have at least 1 entry in
it to simplify some other code paths, but it turns out that it is simpler
not to do that.
In particular, this was triggering a failed assertion on shutdown when we
assert that the queue is empty.
Dump offending items on shutdown if they are there, tho, to catch any
future bugs.
Fixes: #7637
Signed-off-by: Sage Weil <sage@inktank.com>
Each upstart/*-all-starter.conf use the same script to find the list of
daemons and their ids. Copy it over to the corresponding logrotate.conf
script instead of using a less reliable script based on initctl list
output.
If logrotate fails to run initctl reload on a daemon, it will keep
writing to the rotated log file, even after it is deleted and until it
fills the disk. By using the exact same shell snippet as the upstart
scripts used to start the daemon, all of them will be sent the HUP
signal and reopen the log file that was just rotated.
http://tracker.ceph.com/issues/7072fixes#7072
Signed-off-by: Loic Dachary <loic@dachary.org>
Otherwise, two objects with different namespaces but
the same object_t will end up clobbering each other's
contexts.
Fixes: #7634
Signed-off-by: Samuel Just <sam.just@inktank.com>
This wreaks havoc on our QA because it marks osds up and down and then
immediately after that we try to scrub and some osds are still down.
Adjust the CLI test to wait for all OSDs to come back up after thrashing.
Signed-off-by: Sage Weil <sage@inktank.com>
Previously, a _delete_head() followed by a recreation on an object in
the same transaction would result in num_dirty being decremented in
_delete_head() without the flag being cleared. make_writeable() would
then see exists and was_dirty and therefore not increment num_dirty
resulting in a mismatch. Rather than trying to maintain the num_dirty
number in _delete_head(), rollback_to(), and make_writeable(), it seems
simpler to do the adjustment once in make_writeable based on undirty,
ctx->obc->obs.oi, and ctx->new_obs->oi.
Fixes: 7393
Signed-off-by: Samuel Just <sam.just@inktank.com>
Otherwise, our attempt to sanitize object_size bytes of
data.object_contents will be doomed to memory corruption.
Fixes: #7610
Signed-off-by: Samuel Just <sam.just@inktank.com>
We only get EAGAIN if the object is missing. We also need the
clone to be readable if we are reading it.
The other find_object_context callers already require !degraded.
Fixes: #7624
Signed-off-by: Samuel Just <sam.just@inktank.com>
Unify the pool deletion safety checks into a single set of functions.
Make sure we check the committed state and error out if there is a problem.
Also check the pending state, if any, and delay+retry if there is a
problem there.
This ensures that we correctly verify that a pool is not in use when it
is deleted (by another tier or by cephfs). These checks are also now
applied to librados calls.
Fixes: #7590
Signed-off-by: Sage Weil <sage@inktank.com>
This reverts commit 3cd751b0a2.
A rados_release_read_op() has already been performed, but coverity
didn't recognize that as releasing memory.
Fixes: #7621
Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>