mirror of
https://github.com/ceph/ceph
synced 2025-01-23 03:25:49 +00:00
7677651618
Signed-off-by: Dimitri Papadopoulos <3234522+DimitriPapadopoulos@users.noreply.github.com>
96 lines
3.2 KiB
ReStructuredText
96 lines
3.2 KiB
ReStructuredText
#############################################
|
|
Notes and Thoughts on Cephadm's scalability
|
|
#############################################
|
|
|
|
*********************
|
|
About this document
|
|
*********************
|
|
|
|
This document does NOT define a specific proposal or some future work.
|
|
Instead it merely lists a few thoughts that MIGHT be relevant for future
|
|
cephadm enhancements.
|
|
|
|
*******
|
|
Intro
|
|
*******
|
|
|
|
Current situation:
|
|
|
|
Cephadm manages all registered hosts. This means that it periodically
|
|
scrapes data from each host to identify changes on the host like:
|
|
|
|
- disk added/removed
|
|
- daemon added/removed
|
|
- host network/firewall etc has changed
|
|
|
|
Currently, cephadm scrapes each host (up to 10 in parallel) every 6
|
|
minutes, unless a refresh is forced manually.
|
|
|
|
Refreshes for disks (ceph-volume), daemons (podman/docker), etc, happen
|
|
in sequence.
|
|
|
|
With the cephadm exporter, we have now reduced the time to scan hosts
|
|
considerably, but the question remains:
|
|
|
|
Is the cephadm-exporter sufficient to solve all future scalability
|
|
issues?
|
|
|
|
***********************************************
|
|
Considerations of cephadm-exporter's REST API
|
|
***********************************************
|
|
|
|
The cephadm-exporter uses HTTP to serve an endpoint to the hosts
|
|
metadata. We MIGHT encounter some issues with this approach, which need
|
|
to be mitigated at some point.
|
|
|
|
- With the cephadm-exporter we use SSH and HTTP to connect to each
|
|
host. Having two distinct transport layers feels odd, and we might
|
|
want to consider reducing it to only a single protocol.
|
|
|
|
- The current approach of delivering ``bin/cephadm`` to the host doesn't
|
|
allow the use of external dependencies. This means that we're stuck
|
|
with the built-in HTTP server lib, which isn't great for providing a
|
|
good developer experience. ``bin/cephadm`` needs to be packaged and
|
|
distributed (one way or the other) for us to make use of a better
|
|
http server library.
|
|
|
|
************************
|
|
MON's config-key store
|
|
************************
|
|
|
|
After the ``mgr/cephadm`` queried metadata from each host, cephadm stores
|
|
the data within the mon's k-v store.
|
|
|
|
If each host would be allowed to write their own metadata to the store,
|
|
``mgr/cephadm`` would no longer be required to gather the data.
|
|
|
|
Some questions arise:
|
|
|
|
- ``mgr/cephadm`` now needs to query data from the config-key store,
|
|
instead of relying on cached data.
|
|
|
|
- cephadm knows three different types of data: (1) Data that is
|
|
critical and needs to be stored in the config-key store. (2) Data
|
|
that can be kept in memory only. (3) Data that can be stored in
|
|
RADOS pool. How can we apply this idea to those different types of
|
|
data.
|
|
|
|
*******************************
|
|
Increase the worker pool size
|
|
*******************************
|
|
|
|
``mgr/cephadm`` is currently able to scrape 10 nodes at the same time.
|
|
|
|
The scrape of a individual host takes the same amount of time persists.
|
|
We'd just reduce the overall execution time.
|
|
|
|
At best we can reach O(hosts) + O(daemons).
|
|
|
|
*************************
|
|
Backwards compatibility
|
|
*************************
|
|
|
|
Any changes need to be backwards compatible or completely isolated from
|
|
any existing functionality. There are running cephadm clusters out there
|
|
that require an upgrade path.
|