There were recent failures due to HEALTH_WARN check unrelated
to script, this fix basically bypasses this issue by using
osd pool default size: 1 in ceph.conf and now
checks for HEALTH_OK instead of HEALTH_WARN
It also adds the meta information to tasks which describes test
Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
The existing logic is to ceph-deploy osd create --zap-disk which will
zap the data device before preparing it. However it will not zap the
journal device (see http://tracker.ceph.com/issues/13291).
If ceph-deploy osd create fails, a fall back will zap both the data
device and the journal and try prepare again. This could work if
the device preparation and activation was synchronous and catch all
errors that could be caused by an unclean journal device. However,
the activation is asynchronous and it is entirely possible for a device
to be prepared successfully and fail to activate in the background.
The data and journal device are always zapped before calling ceph-deploy
osd create. The logic is simpler and the overhead is low.
http://tracker.ceph.com/issues/13000Fixes: #13000
Signed-off-by: Loic Dachary <loic@dachary.org>
The config paramter of download_ceph_deploy does not have a ceph-deploy
item, therefore the ceph-deploy-branch parameter is always assumed to be
master.
Signed-off-by: Loic Dachary <loic@dachary.org>
When keep_running is true, do not shutdown the cluster, leave it as it
is for other workunits or tasks to use. This effectively allows the
ceph-deploy task to be used as a helper to deploy clusters.
The call to build_ceph_cluster is simplified by giving it the whole
configuration dictionary instead of re-building one with selected arguments.
Signed-off-by: Loic Dachary <loic@dachary.org>
When ceph-deploy fails, run ceph report to get more information about
the state of the cluster at the time of the failure.
Signed-off-by: Loic Dachary <loic@dachary.org>
Looks like Sandon's and Sage's changes raced and there are now two
sites where we fetch overrides. One should be enough.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Now that service IDs are modified during run, we have
to avoid repeatedly evaluating first_mon for where
to run ceph_deploy, as the answer will change.
Fixes: #11495
Signed-off-by: John Spray <john.spray@redhat.com>
This test apparently had not been touched since
"fs new" was added. In addition to calling
Filesystem.create:
* modify the get_nodes_using_role
function to modify ctx.cluster.remotes so that the
service IDs match what ceph-deploy will set
* log exceptions during ceph_deploy setup, as otherwise
they can get lost if another exception occurs during
teardown (so that it's all easier to debug).
* default to passing --dev=master during install, so
that we don't error out horribly when run without
an explicit branch set (e.g. when run outside
scheduled suite)
Fixes: #11316
Signed-off-by: John Spray <john.spray@redhat.com>
But don't error if it fails, as this would mean that the monitors
are just taking longer to form quorum. Go and try the next block which will
wait up to 15 minutes for a successful gatherkeys to happen (that only works
if monitors have formed quorum).
Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>