Commit Graph

608 Commits

Author SHA1 Message Date
Pere Diaz Bou
8e07fbd2ea
Merge pull request #48843 from rhcs-dashboard/expose_slow_ops
mgr/prometheus: expose daemon health metrics

Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
2022-12-20 12:25:32 +01:00
Pere Diaz Bou
5a2b7c25b6 mgr/prometheus: expose daemon health metrics
Until now daemon health metrics were stored without being used. One of
the most helpful metrics there is SLOW_OPS with respect to OSDs and MONs
which this commit tries to expose to bring fine grained metrics to find
troublesome OSDs instead of having a lone healthcheck of slow ops in the
whole cluster.

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
2022-12-20 09:44:49 +01:00
yaarith
3a8f2dcc51
Merge pull request #48214 from ljflores/wip-telemetry-bluestore-compression-mode
mgr/telemetry: add `basic_pool_options_bluestore` collection

Reviewed-by: Yaarit Hatuka <yaarit@redhat.com>
2022-12-13 14:34:45 -05:00
Adam King
08718ae283
Merge pull request #47708 from rkachach/fix_issue_57160
mgr/rgw: adding support for rgw multisite

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Ali Maredia <amaredia@redhat.com>
2022-12-07 10:43:44 -05:00
Redouane Kachach
dcde3df939
Addressing comments from the last review
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
2022-10-19 10:42:20 +02:00
Redouane Kachach
5b6e99de1e
Removing docs for unused commands + fixing style issues
Adding logic to modify the master zonegroup endpoints
Do no call pull realm when modifying zone
Only update the endpoints if the modified zone is master
Adding support to set custom endpoints when creating realm or zone

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
2022-10-13 18:17:59 +02:00
Redouane Kachach
83b0ef406d
Addressing some of Ali comments
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
2022-10-11 12:06:21 +02:00
Redouane Kachach
b94c2b685a
Addressing adking review comments
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
2022-10-11 10:10:16 +02:00
Redouane Kachach
7fdb145f1a
Addressing review comments
Signed-off-by: Redouane Kachach <rkachach@redhat.com>
2022-10-06 11:08:14 +02:00
John Mulligan
764ccf998b doc/mgr/nfs: document --sectype option for export create commands
Add documentation for the option to specify the sectype (for enabling kerberos)
when creating a new export.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
2022-10-05 10:25:06 -04:00
zdover23
9cfd351ab2
Merge pull request #44564 from zdover23/wip-doc-2022-01-13-44150-cleanup-grafana-data-source-name
doc/mgr: name data source in "Man Install & Config"

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by:  Laura Flores <lflores@redhat.com>
2022-10-05 19:51:03 +10:00
Laura Flores
b0650129e0 mgr/telemetry: add basic_pool_options_bluestore collection
Collects per-pool bluestore options, such as `bluestore_compression_mode`.

Signed-off-by: Laura Flores <lflores@redhat.com>
2022-10-04 18:15:22 +00:00
Redouane Kachach
d15a5dcfe2
mgr/rgw: Adding rgw multisite support
Fixes: https://tracker.ceph.com/issues/57160

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
2022-09-28 11:08:42 +02:00
Nizamudeen A
1acdb44108
Merge pull request #47265 from s0nea/wip-dashboard-redirect-fqdn
mgr/dashboard: add option to resolve ip addr

Reviewed-by: Pegonzal <NOT@FOUND>
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
2022-09-23 10:59:24 +05:30
Tatjana Dehler
2e15f0f0d2
mgr/dashboard: add option to resolve ip addr
Add the option `redirect_resolve_ip_addr` to the dashboard module.
If the option is set to `True`, try to resolve the IP address before
redirecting from the passive to the active mgr instance.
If the option is set to `False`, follow the already known behavior.

Fixes: https://tracker.ceph.com/issues/56699
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
2022-09-07 11:18:55 +02:00
Adam King
812be8465f
Merge pull request #47763 from phlogistonjohn/jjm-object-format-fixes
pybind/mgr: object_format.py decorator updates & docs

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Redouane Kachach <rkachach@redhat.com>
2022-09-01 13:54:13 -04:00
Yuri Weinstein
4b0182efda
Merge pull request #47184 from ljflores/wip-telemetry-memory-stats
mgr/telemetry: add `perf_memory_metrics` collection to telemetry

Reviewed-by: Yaarit Hatuka <yaarithatuka@gmail.com>
Reviewed-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
2022-09-01 08:29:25 -07:00
Zac Dover
fc70ccde75 doc/mgr: update prompts in dboard.rst includes
This PR adds unselectable prompts to three files that are
transcluded in the doc/mgr/dashboard.rst file. These three
files are:

 1. debug.inc.rst
 2. feature_toggles.inc.rst
 3. motd.inc.rst

The addition of unselectable prompts to these three files
completes the work begun in PR#47810 (d8064b4), which sought
to bring dashboard.rst into line with the unselectable prompt
standard introduced by Kefu Chai in 2020.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
2022-08-29 10:39:51 +10:00
Zac Dover
d8064b4681 doc/mgr: add prompt directives to dashboard.rst
This commit adds prompt directives (.. prompt:: bash $) to
the commands in dashboard.rst.

There are several ".. include::" directives in the dashboard.rst
file, which means that part of this page is sourced from elsewhere
than the dashboard.rst file. Because I have not yet added prompt
directives to those files, there is an inconsistency in the rendering
of this file. Most of the commands on this page have unselectable
prompts (unselectable prompts are the prompts that don't get added to
the buffer when you copy them to one of the clipboards). But the
commands on this page that come from those ".. include::" directives
do not yet have unselectable prompts.

This file is over 1600 lines long. It was perhaps not optimally wise
of me to have edited all of it in one fell swoop. It took many hours,
and carefully checking it will probably take at least one hour. I
suggest that whoever reviews this should not spend much time on it,
but should instead make a quick pass over the page and make sure that
it looks passable.

The English syntax on this page (and throughout the Dashboard doc-
umentation) will be tightened to remove ambiguity and to improve
readability in the near future, so hold all English-language-related
comments for a future pull request.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
2022-08-26 01:56:41 +10:00
Laura Flores
138eb5db67 doc/mgr: add perf_memory_metrics to the telemetry documentation
Signed-off-by: Laura Flores <lflores@redhat.com>
2022-08-24 22:07:14 +00:00
Nizamudeen A
79bbaa5553 docs: fix doc link pointing to master in dashboard.rst
Signed-off-by: Nizamudeen A <nia@redhat.com>
2022-08-24 16:11:00 +05:30
Zac Dover
2172b7ec98 doc/mgr: edit orchestrator.rst
This PR improves the English language in the "Orchestrator CLI"
section of the MGR documentation. It adds a couple of section
headers in order to signpost the information in the document
a bit more than had already been done, but it makes no major
structural changes to the presentation of the information here.

This PR was motivated by feedback from the 2022 Ceph User Survey
in which one of the respondents wrote "better ceph orch documen-
tation".

The final section on this page, "Current Implementation Status",
must be verified by someone who is familiar with the current state
of "ceph orch" and a date stamp should be applied to the top of
the section so that the word "current" has a meaningful referent.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
2022-08-24 13:21:25 +10:00
John Mulligan
2a2d044247 doc/mgr: add a tutorial-esque section on object_format python module
It doesn't cover everything but should get most use cases started.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
2022-08-23 13:01:45 -04:00
John Mulligan
30d3e5bab5 doc/mgr: fix quoting error in python example
Found by vim syntax highlighting. Thanks vim!

Signed-off-by: John Mulligan <jmulligan@redhat.com>
2022-08-23 13:01:45 -04:00
John Mulligan
481776becf doc/mgr: use subsections for two approaches to exposing commands
This makes the content for each approach clearer and prepares
for a future sub-section.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
2022-08-23 13:01:45 -04:00
Anthony D'Atri
f1235a8ee0 doc/mgr: Fix capitalization in orchestrator.rst
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
2022-07-05 19:23:36 -07:00
Anthony D'Atri
cf1415a2b2
Merge pull request #46919 from jsoref/spelling-docs
doc: Fix many spelling errors
2022-07-03 19:54:55 -07:00
Anthony D'Atri
fe2200527e
Merge pull request #46087 from rhcs-dashboard/update-centralized-logging-docs
doc: update docs for centralized logging
2022-07-03 16:32:59 -07:00
Josh Soref
8abce157f1 doc: Fix many spelling errors
* administrators
* allocated
* allowed
* approximate
* authenticate
* availability
* average
* behavior
* binaries
* bootstrap
* bootstrapping
* capacity
* cephadm
* clients
* combining
* command
* committed
* comparison
* compiled
* consequences
* continues
* convenience
* cookie
* crypto
* dashboard
* deduplication
* defaults
* delivered
* deployment
* describe
* directory
* documentation
* dynamic
* elimination
* entries
* expectancy
* explicit
* explicitly
* exporter
* github
* hard
* healthcheck
* heartbeat
* heavily
* http
* indices
* infrastructure
* inherit
* layout
* lexically
* likelihood
* logarithmic
* manually
* metadata
* minimization
* minimize
* object
* of
* operation
* opportunities
* overwrite
* prioritized
* recipe
* records
* requirements
* restructured
* running
* scalability
* second
* select
* significant
* specify
* subscription
* supported
* synonym
* throttle
* unpinning
* upgraded
* value
* version
* which
* with

Plus some line wrapping and additional edits...

Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
2022-07-02 23:38:18 -04:00
Aashish Sharma
4ac2a3e5f7 doc: update docs for centralized logging
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2022-06-28 16:22:24 +05:30
Redouane Kachach
df1aaacb7d
doc/cephadm: enhancing daemon operations documentation
Fixes: https://tracker.ceph.com/issues/54399

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
2022-06-20 11:44:49 +02:00
Konstantin Shalygin
4512270736 doc/mgr: Document wildcard to expose Prometheus metrics for all RBD pools and namespaces
Fixes: https://tracker.ceph.com/issues/47537

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
2022-06-02 14:23:38 +07:00
Ramana Raja
3adb70a24d doc/mgr/nfs: Add commands to check the statuses
.. of NFS and ingress services after creating/deleting a NFS cluster.
The `nfs cluster info` command is not sufficient to show that the
NFS cluster is created/deleted as expected.

Signed-off-by: Ramana Raja <rraja@redhat.com>
2022-04-27 12:12:36 -04:00
Neha Ojha
ab9546fd17
Merge pull request #44666 from s0nea/correct_metric_name
doc/mgr/prometheus: correct metric name

Reviewed-by: Patrick Seidensal <pseidensal@suse.com>
Reviewed-by: Paul Cuzner <pcuzner@redhat.com>
2022-04-07 13:57:52 -07:00
wangxinyu
1c326651d0 doc/mgr/prometheus.rst: fix spelling error
fix spelling error

Signed-off-by: wangxinyu <wangxinyu@inspur.com>
2022-03-22 18:51:57 +08:00
John Mulligan
b5b3e0bcb5 doc/mgr/nfs: document that nfs exports related mgr call requirements
A recent change in the mgr/nfs module should enable the functioning
of export management commands/API calls as long as the rados namespaces
and objects have been already established. Document this fact, noting
that now only the `ceph nfs cluster ...` calls *require* an
orchestration module.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
2022-02-23 16:33:48 -05:00
Yuri Weinstein
ddeec8d88a
Merge pull request #44781 from ljflores/wip-basic-channel-additions
mgr/telemetry: add `basic_pool_usage` and `basic_usage_by_class` collections to the telemetry module

Reviewed-by: Yaarit Hatuka <yaarit@redhat.com>
2022-02-14 09:06:00 -08:00
Yuri Weinstein
2624f51a72
Merge pull request #44588 from kamoltat/wip-ksirivad-disable-progress-by-default
pybind/mgr/progress: disable pg recovery event by default

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2022-02-11 14:49:17 -08:00
Laura Flores
f69cec5b70 mgr/telemetry: separate device class usage statistics into their own collection
The new collection is called `basic_usage_by_class`. This info should be separate
from `basic_pool_usage` since it doesn't involve pool statistics.

Signed-off-by: Laura Flores <lflores@redhat.com>
2022-02-08 00:45:02 +00:00
Laura Flores
c71a54ec1a mgr/telemetry: update basic_pool_usage collection desc
- Added the word "default" since we are only collecting
default pool applications

- Removed the word "data" since we are actually collecting
usage *statistics*

Signed-off-by: Laura Flores <lflores@redhat.com>
2022-02-08 00:42:37 +00:00
Nizamudeen A
27592b7561 cephadm: change shared_folder directory for prometheus and grafana
After https://github.com/ceph/ceph/pull/44059 the monitoring/prometheus
and monitoring/grafana/dashboards directories are changed to
monitoring/ceph-mixins. That broke the shared_folders in the cephadm
bootstrap script.

Changed all the instances of monitoring/prometheus and
monitoring/grafana/dashboards to monitoring/ceph-mixins

Also, renaming all the instances of prometheus_alerts.yaml to
prometheus_alerts.yml.

Fixes: https://tracker.ceph.com/issues/54176
Signed-off-by: Nizamudeen A <nia@redhat.com>
2022-02-07 16:34:37 +05:30
Kamoltat
f06da20dff pybind/mgr/progress: disable pg recovery event by default
The progress module disabled the pg recovery event by default
since the event is expensive and has interrupted other serviceis
when there is OSDs being marked in/out from the the cluster.

To turn the event on manually:

ceph config set mgr mgr/progress/allow_pg_recovery_event true

Updated qa/tasks/mgr/test_progress.py to enable
the pg recovery event when testing the progress module.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
2022-02-03 17:51:42 +00:00
Laura Flores
4a2b54c1f2 doc/mgr: update telemetry doc to reflect basic_pool_usage collection
Signed-off-by: Laura Flores <lflores@redhat.com>
2022-02-02 23:08:53 +00:00
Tatjana Dehler
eefcb0aeed
doc/mgr/prometheus: correct metric name
Replace the metric name `node_disk_bytes_written` by
`node_disk_written_bytes_total` to reflect changes made in node exporter
version 0.16.0
https://github.com/prometheus/node_exporter/releases/tag/v0.16.0 /
https://github.com/prometheus/node_exporter/blob/v0.16.0/docs/example-16-compatibility-rules.yml .

Fixes: https://tracker.ceph.com/issues/53932
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
2022-01-19 15:20:41 +01:00
Yuri Weinstein
f5b4f3f4d9
Merge pull request #44251 from yaarith/telemetry-opt-in
mgr/telemetry: introduce new design for varying report data

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2022-01-14 09:06:11 -08:00
Yaarit Hatuka
2d1550cf05 mgr/telemetry: add enable / disable channel all
Enable or disable all telemetry channels at once with:
    ceph telemetry enable channel all
    ceph telemetry disable channel all

Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
2022-01-13 21:54:07 +00:00
Yaarit Hatuka
77a032526d mgr/telemetry: improve output of ceph telemetry collection ls
STATUS column now indicates whether a collection is being reported, and
the reasons why it's not (either the user is not opted-in to this
collection, or its channel is off).

Also, removed the ENROLLED and DEFAULT columns due to potential
confusion they may cause.

In case a user is not opted-in to certain collections, a message will
appear above the table with the missing collections:

    New collections are available:
    ['basic_base', 'basic_mds_metadata', 'crash_base', 'device_base',
    'ident_base', 'perf_perf']
    Run `ceph telemetry on` to opt-in to these collections.

Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
2022-01-13 21:54:07 +00:00
Yaarit Hatuka
4c110ed2a5 doc/mgr/telemetry: document new commands
New commands:

  ceph telemetry enable channel <channel_name>
  ceph telemetry disable channel <channel_name>
  ceph telemetry channel ls
  ceph telemetry collection ls
  ceph telemetry collection diff
  ceph telemetry preview
  ceph telemetry preview-device
  ceph telemetry preview-all

Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
2022-01-13 21:53:47 +00:00
Patrick Seidensal
18d3a71618 mgr/prometheus: Fix regression with OSD/host details/overview dashboards
Fix issues with PromQL expressions and vector matching with the
`ceph_disk_occupation` metric.

As it turns out, `ceph_disk_occupation` cannot simply be used as
expected, as there seem to be some edge cases for users that have
several OSDs on a single disk.  This leads to issues which cannot be
approached by PromQL alone (many-to-many PromQL erros).  The data we
have expected is simply different in some rare cases.

I have not found a sole PromQL solution to this issue. What we basically
need is the following.

1. Match on labels `host` and `instance` to get one or more OSD names
   from a metadata metric (`ceph_disk_occupation`) to let a user know
   about which OSDs belong to which disk.

2. Match on labels `ceph_daemon` of the `ceph_disk_occupation` metric,
   in which case the value of `ceph_daemon` must not refer to more than
   a single OSD. The exact opposite to requirement 1.

As both operations are currently performed on a single metric, and there
is no way to satisfy both requirements on a single metric, the intention
of this commit is to extend the metric by providing a similar metric
that satisfies one of the requirements. This enables the queries to
differentiate between a vector matching operation to show a string to
the user (where `ceph_daemon` could possibly be `osd.1` or
`osd.1+osd.2`) and to match a vector by having a single `ceph_daemon` in
the condition for the matching.

Although the `ceph_daemon` label is used on a variety of daemons, only
OSDs seem to be affected by this issue (only if more than one OSD is run
on a single disk).  This means that only the `ceph_disk_occupation`
metadata metric seems to need to be extended and provided as two
metrics.

`ceph_disk_occupation` is supposed to be used for matching the
`ceph_daemon` label value.

    foo * on(ceph_daemon) group_left ceph_disk_occupation

`ceph_disk_occupation_human` is supposed to be used for anything where
the resulting data is displayed to be consumed by humans (graphs, alert
messages, etc).

    foo * on(device,instance)
    group_left(ceph_daemon) ceph_disk_occupation_human

Fixes: https://tracker.ceph.com/issues/52974

Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
2022-01-13 13:27:55 +01:00
Zac Dover
987713da33 doc/mgr: name data source in "Man Install & Config"
This PR specifies that the data source must be set to
be "Dashboard1" when you configure Grafana and Prometheus
manually.

This is a fixup of another PR which was created by Dr
Jake Grimmett. This is that PR:

Credit goes to Dr Jake Grimmett of Cambridge.

https://github.com/ceph/ceph/pull/44150/

Signed-off-by: Zac Dover <zac.dover@gmail.com>
2022-01-13 08:46:20 +10:00