Read balancing may now be managed automatically via the balancer
manager module. Users may choose between two new modes: ``upmap-read``, which
offers upmap and read optimization simultaneously, or ``read``, which may be used
to only optimize reads. Existing balancer commands have also been added to
contain more information about read balancing.
Run the following commands to test the new automatic behavior:
`ceph balancer on` (on by default)
`ceph balancer mode <read|upmap-read>`
`ceph balancer status`
Run the following commands to test the new supervised behavior:
`ceph balancer off`
`ceph balancer mode <read|upmap-read>`
`ceph balancer eval` | `ceph balancer eval <pool-name>`
`ceph balancer eval-verbose` | `ceph balancer eval-verbose <pool-name>`
`ceph balancer optimize <plan-name>`
`ceph balancer show <plan-name>`
`ceph balancer eval <plan-name>`
`ceph balancer execute <plan-name>`
In the balancer module, there is also a new "self_test" function which tests
the module's basic functionality. This test can be triggered with the following
commands:
`ceph mgr module enable selftest`
`ceph mgr self-test module balancer`
Related Trello: https://trello.com/c/sWoKctzL/859-add-read-balancer-support-inside-the-balancer-module
Signed-off-by: Laura Flores <lflores@ibm.com>
The scrub async reserver is not yet used. All requests are treated as
'legacy' requests, i.e. requests that expect an immediate grant/deny
reply.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Up-to-date primaries will set this flag when sending a reservation
request. The replica OSD, if too busy to handle the request immediately, will queue
it until such time that the number of concurrent reservations is below the
configured limit. The queued requests are honored in FIFO order.
Old primaries will not set this flag, and will receive the expected
grant or deny reply immediately.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
To be used when handling replica reservation requests from "old"
primaries, that expect an immediate grant/deny reply.
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Fix a tricky verb disagreement and rewrite a few sentences for what I
hope is greater clarity.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
Changes how the upmap balancer compares min_mon_release
to account for release names eventually wrapping around the alphabet.
Signed-off-by: Laura Flores <lflores@ibm.com>
CloudFlare engineers made some testing and realized that using
workqueues with encryption on flash devices has a bad effect.
See [1] for details.
With this patch it will make ceph-volume call crypsetup with
`--perf-no_read_workqueue` and `--perf-no_write_workqueue` options
when the device is not a rotational.
[1] https://blog.cloudflare.com/speeding-up-linux-disk-encryption/
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
Co-Authored-by: Stefan Kooman <stefan@kooman.org>
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
add the `in_query=true` argument to `url_decode()` to replace '+' with ' '
Fixes: https://tracker.ceph.com/issues/64189
Signed-off-by: Casey Bodley <cbodley@redhat.com>
crimson/osd: drop a foreign-copy to shard-0 for every pg operation
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
Reviewed-by: Matan Breizman <mbreizma@redhat.com>
RHEL8 is no longer supported in Squid. RHEL9 is not yet available in FOG.
Fixes: https://tracker.ceph.com/issues/64085
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Mapping rbd images to nbd devices using ioctl interface is not
robust. It was discovered that the device size or the md5 checksum
of the nbd device was incorrect immediately after mapping using
ioctl method. When using the nbd netlink interface to map RBD images
the issue was not encountered. Switch to using nbd netlink interface
for mapping.
Fixes: https://tracker.ceph.com/issues/64063
Signed-off-by: Ramana Raja <rraja@redhat.com>
This makes node-proxy collect the `LocationIndicatorActive`
property for storage component.
This can be needed for the Blinkenlight feature.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
the following messages get logged quite a lot while
this is not a very useful information in a normal situation:
```
2024-01-12 09:09:40,604 - reporter - INFO - data ready to be sent to the mgr.
2024-01-12 09:09:40,604 - reporter - INFO - no diff, not sending data to the mgr.
2024-01-12 09:10:15,022 - reporter - INFO - data ready to be sent to the mgr.
2024-01-12 09:10:15,022 - reporter - INFO - no diff, not sending data to the mgr.
...
```
This commit changes the log level to DEBUG.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
This `sleep(5)` should be initiated *after* the lock is released.
Otherwise, it can cause troubles with the reporter loop which can
never acquire the lock.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
while checking logs, I noticed the following message:
```
2024-01-12 09:08:03,751 - reporter - INFO - Reporter url set to https:10.10.10.11:7150/node-proxy/data
```
Although this is only a cosmetic issue as this variable
is only used for logging messages, let's fix it.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
The current implementation requires the inclusion of all the recent
modifications in the cephadm binary, which won't be backported.
Since we need the node-proxy code backported to reef, let's move the
code make it a separate daemon.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
Co-authored-by: Adam King <adking@redhat.com>
This renames the mgr's NodeProxyCache attribute from
`self.node_proxy` to `self.node_proxy_cache` and the
class `NodeProxy` in agent.py from `NodeProxy` to
`NodeProxyEndpoint` to make it clearer and avoid confusion.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
This commit updates the debug log messages in the BaseRedfishSystem
and Reporter classes. The adjustments made enhance the clarity and
precision of the messages by specifically identifying acquired
and released locks, detailing their context, thereby improving the
understanding of the control flow during locking operations
in these components.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
The current logic using `setattr()` makes mypy complain:
"NodeProxy" has no attribute "xxx"
Using `self.__dict['xxx']` addresses this mypy error but the
downside of this is that the code isn't clear and less readable.
Explicitly setting the different attributes makes the code clearer
and more readable.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
node-proxy requires this dependency so it needs to be added as
dependency for tox testing.
Typical failure:
```
ImportError while importing test module '/root/ceph/src/cephadm/tests/test_agent.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib64/python3.9/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_agent.py:10: in <module>
_cephadm = import_cephadm()
tests/fixtures.py:14: in import_cephadm
import cephadm as _cephadm
cephadm.py:32: in <module>
from cephadmlib.node_proxy.main import NodeProxy
cephadmlib/node_proxy/main.py:2: in <module>
from .redfishdellsystem import RedfishDellSystem
cephadmlib/node_proxy/redfishdellsystem.py:2: in <module>
from .baseredfishsystem import BaseRedfishSystem
cephadmlib/node_proxy/baseredfishsystem.py:2: in <module>
from .basesystem import BaseSystem
cephadmlib/node_proxy/basesystem.py:2: in <module>
from .util import Config
cephadmlib/node_proxy/util.py:2: in <module>
import yaml
E ModuleNotFoundError: No module named 'yaml'
```
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
Note that this won't be a true out of band management.
In the case where the host hangs, this won't work. The oob
management should be reached directly but most of the time
the oob network is isolated. The idea is to send queries to the
the tcp server exposed by the cephadm agent (MgrListener) so it
can send itself queries to the redfish API using the IP address
exposed on the OS.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
In order to address the following error:
```
cephadmlib/node_proxy/util.py:2: error: Library stubs not installed for "yaml" (or incompatible with Python 3.9)
cephadmlib/node_proxy/util.py:2: note: Hint: "python3 -m pip install types-PyYAML"
cephadmlib/node_proxy/util.py:2: note: (or run "mypy --install-types" to install all missing stub packages)
cephadmlib/node_proxy/util.py:2: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
```
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
This addresses a lot of flake8 errors in node-proxy tests:
E121 continuation line under-indented for hanging indent
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
Implementing this in the cephadm module doesn't follow the general idea
of the orchestrator interface. This is where the output formatting should
be done so let's move the logic to the orchestrator module.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>
The current logic supports str and bytes types for parameter
`data`. This doesn't make sense, let's drop this logic.
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>