This makes our behavior similar to kube: if you kill a pod, the operator
or controller will come along and create a new one (probably somewhere
else).
Signed-off-by: Sage Weil <sage@redhat.com>
Fixes: https://tracker.ceph.com/issues/44205
This does a couple of things:
* Change the way apply_$service() works:
Instead of triggering the deployment mechanism it will rather
transform the already passed ServiceSpec into a json representation
and save it in a persistent mon_store section.
`mgr/cephadm/service_spec/$service|daemon_type/service_name`
These locations will be periodically checked in the serve() thread.
This works since all the apply_$service_type functions are idempotent.
* Allow to save a config-like specification in the mon_store.
`ceph orch apply -i <service_spec_file.yaml>`
will read the specified services and save them in the mon store
section like mentioned above. The same serve() mechanism like above
also applies to deployment.
Signed-off-by: Joshua Schmid <jschmid@suse.de>
In cases where we normally use a pid for a nonce, fall back to a random
value when the pid == 1 (i.e., we're in a container). For the cases where
we use a random value, use the helper.
Signed-off-by: Sage Weil <sage@redhat.com>
Midway through the octopus cycle, we made stateless server more stateless
in the sense that it would not register incoming client connections. And,
in so doing, it would not enforce that client connections came from
unique addresses, by closing an existing connection from the same addr
when a new connection was accepted.
This turned out to cause out of order OSD ops because the OSD needed that
behavior. See https://tracker.ceph.com/issues/42328. We fixed that by
reverting to the old behavior for all but monitor connections, where we
needed it, in 507d213cc4.
This, in turn, breaks most OSD <-> OSD communication (and probably lots
of other things) with cephadm, because we make entity_addr_t unique with
a nonce that is populated by getpid()... and the containerized daemons
all have pid 1. When we finally merged the follow-on fixes for the change
above cephadm OSDs can't ping each other.
In my view, the 'anon' connection handling is a good idea in the general
case. So, let's adjust our fix for #42328 so that it is only the OSD
client-side interface that registers client connections and makes them
unique.
Fixes: https://tracker.ceph.com/issues/44358
Signed-off-by: Sage Weil <sage@redhat.com>
This was effectively a no-op, since the default policy was *also*
stateless_server.
This line originates from v0.24 (2010) when we added the cluster msgr.
Signed-off-by: Sage Weil <sage@redhat.com>
We construct a tcp connection to transport ib sync msg, if the
remote node is shutdown (shutdown by accident), the net.connect will be blocked until timeout
is reached, which cause the event center be blocked.
This bug may cause mon probe timeout and osd not reply, and so on.
Signed-off-by: Peng Liu <liupeng37@baidu.com>
* refs/pull/33634/head:
qa/workunits/cephadm/test_cephadm.sh: dump logs on exit
qa/workunits/cephadm/test_cephadm.sh: add `cleanup` function
Reviewed-by: Sage Weil <sage@redhat.com>
We need to define the module options and their default so that
_configure_logging can succeed.
Broken by 8ec3b3d3cc
Signed-off-by: Sage Weil <sage@redhat.com>
Allow setting default configuration per frontend config option. This allows
having defaults for part of the frontend config options and set others.
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Add the following variables:
$realm, $realm_id, $zonegroup, $zonegroup_id, $zone, $zone_id
Variables can also be passed in with curly braces, e.g., ${realm}
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
Also don't exit immediately if can't init ssl cert, only if cert is actually
needed (if ssl_port/ssl_endpoint is configured).
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
when we detect a duplicate common prefix, we need to loop until we get
the next unique candidate. we must add a new candidate for each shard,
or we won't visit it again and would miss later entries
Fixes: https://tracker.ceph.com/issues/44353
Signed-off-by: Casey Bodley <cbodley@redhat.com>
we may see the same common prefix from more than one shard. when we
detect a duplicate, we need to advance past it. otherwise, we may make
the wrong decision about is_truncated because the shards with
duplicates won't be at_end()
Fixes: https://tracker.ceph.com/issues/44353
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Using the aio_wait are unncessary, since all the async read
submission and completion happen in the same thread.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This is an attempt to bring the current state of the documentation more
into line with the current state of the cephadm code.
However, when I try to grab logs from a daemon on a host other than the
one where the daemon is running, I get an empty log...
References: https://tracker.ceph.com/issues/44354
Signed-off-by: Nathan Cutler <ncutler@suse.com>
without this tests written after test_mount_root() in the source
were failing with
cephfs.LibCephFSStateError: You cannot perform that operation on a
CephFS object in state initialized.
... since the test fiddles with the default mount which is root.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Synchronize ownership, permission and inode timestamp (access and
modification times) for all supported inode types. Note that inode
timestamps are synchronized upto seconds granularity.
Signed-off-by: Venky Shankar <vshankar@redhat.com>