`find_program()` raises `ValueError` when the executable hasn't been
found. It means we need to catch `ValueError` exception in
`command_check_host()` and raise `Error` instead of `RuntimeError` since
only `Error` is caught at the end.
Typical failure:
```
INFO:cephadm:/usr/bin/ceph:stderr Error ENOENT: New host mon1 failed check: ['INFO:cephadm:podman|docker (/bin/podman) is present', 'INFO:cephadm:systemctl is present', 'Traceback (most recent call last):', ' File "<stdin>", line 2820, in <module>', ' File "<stdin>", line 2434, in command_check_host', ' File "<stdin>", line 796, in find_program', 'ValueError: lvcreate not found']
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This allows for evaluation of more complex use cases where IgnorePublicACLs and
the like are set which need to be evaluated for GET/HEAD requests as well
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
This API returns whether the Bucket Policies/ACLs are public. There are a couple
of caveats:
- AWS currently returns PolicyNotFound error in case a bucket policy doesn't
exist, though a non existant bucket policy would mean the default ACLs apply
where the bucket is private, so error return here seems like an error
- the API spec mentions TRUE and FALSE as the response IsPublic element value,
however in practice both boto/aws clients and AWS S3 return/expect a lowercase
response.
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
Conflicts:
src/rgw/rgw_rest_s3.h
merge conflict after zipper rework, dropped a spurious newline in rgw_rest_s3.h
after get_obj_op decl.
src/rgw/rgw_common.h
src/rgw/rgw_rest_s3.cc
src/rgw/rgw_rest_s3.h:
merge conflict after bucket replication merge, trivial conflicts
Drop the unused RGWAccessControlPolicy::get_group_perm, make the ACL
get_group_perm as a const member function
Signed-off-by: Abhishek Lekshmanan <abhishek@suse.com>
When playing with cephadm, at multiple times, I've reached the max
number of attempt in `is_available()`
Increasing the `retry_max` helps to avoid failure like following:
```
INFO:cephadm:mgr not available, waiting (1/5)...
INFO:cephadm:mgr not available, waiting (2/5)...
INFO:cephadm:mgr not available, waiting (3/5)...
INFO:cephadm:mgr not available, waiting (4/5)...
INFO:cephadm:mgr not available, waiting (5/5)...
ERROR: mgr not available after 5 tries
```
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
This is friendlier to a human operator since they can immediately see
where an instance is located, as with the legacy scheme, while still
keeping the unique random suffix. Use a . to separate so that we can
set per-host options.
Signed-off-by: Sage Weil <sage@redhat.com>
In 114c65fc I posted a work-around to fix a heartbeat brain-split case
but it really looks to me now like I am missing some other cases where
an immediate attempt to rejoin is bad, like when the network actually
isn't working properly rather than being predictably manipulated by an
admin.
This patch instead slows the unconditionally rejoining attempt down,
especially make sure that we don't try to immediately rejoin the culster
when an osd has just been marked down by mon.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
so we don't have to do it in multiple places. Note that
we can't do it in the tick_without_osd_lock thread instead
because we we can not access it safely without the protection
of osd_lock.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
mgr/dashboard: Enable compiler options used by Angular --strict flag
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Stephan Müller <smueller@suse.com>
We want to avoid a situation like:
- host.A consists of OSDs from 0 to 10
- cut off network of host.A from the rest of the cluster
- osd.1 is marked down when enough votes have been
collected by mon
- osd.1 re-selects osd.0,2,3,..., and two extra
osds from two different hosts as heartbeat peers
- osd.1 has more than 1/3 heartbeat peers becoming pingable,
e.g., because they belongs to the same host.A, and will
try to mark itself as up again
which as a result may cause a longer client op latency now.
Fix by (always) trying to select as many heartbeat peers
from different subtrees as possible instead.
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>