Well, data is certainly unavailable, and may also be
degraded in the sense that we can't peer. I think
unavailable is the more severe of the two, though, so
let's put it there!
Signed-off-by: Sage Weil <sage@redhat.com>
First paragraph: explain what the error means.
Second or later paragraph: describe steps to fix or mitigate.
Signed-off-by: Sage Weil <sage@redhat.com>
- s/cephfs_data/cephfs_data_a
- s/cephfs_metadata/cephfs_metadata_a
- s#./rados df#bin/rados df
- update the 'bin/rados df' output
- remove the rbd pool, it isn't created by default anymore.
Signed-off-by: Zhu Shangzhong <zhu.shangzhong@zte.com.cn>
... in cluster log messages. Replaces the mixture
of "mds.foo", "mds daemon 'foo'", etc, with a standard
"daemon mds.foo".
Signed-off-by: John Spray <john.spray@redhat.com>
No longer output MDS_* versions as well as FS_* versions,
because it was noisy and the important message is about
the availability (or not) of the filesystem.
Revise the _FAILED check to only raise the message if
there are not suitable replacements available for failed
ranks. This avoids a spurious health check failure when
a rank has been failed (e.g. by the admin) but it will
be replaced at the next tick().
After this change, doing a "ceph mds fail" when a standby
is available just gives you a single FS_DEGRADED health
check from the point of the "fail" to when the replacement
is active.
Signed-off-by: John Spray <john.spray@redhat.com>
Add explicit messages, and demote the addr+state prints
to DEBUG level. At INFO level we now see
just a message when we decide to assign a rank, and
a message when the daemon is active, rather than messages
for each state the daemon progresses through.
Signed-off-by: John Spray <john.spray@redhat.com>
msg/async: fix the bug of inaccurate calculation of l_msgr_send_bytes
Reviewed-by: Pan Liu <wanjun.lp@alibaba-inc.com>
Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Haomai Wang <haomai@xsky.com>
- radosgw/s3/bucketops.rst: fix Malformed table.
- operations/health-checks.rst: Title underline too short
- rbd/rados-rbd-cmds.rst: Title underline too short
- rados/operations/index.rst: include health-checks in toc
Signed-off-by: Kefu Chai <kchai@redhat.com>
This does several little things that add up to big concurrency and safety
improvements:
* Switch to passing around PGRefs instead of raw pointers, which is
generally a good idea
* drop the pg_map_lock once we're done looking up the PGRefs, since
we don't need it and holding the PG pointer alive was the only previous
thing that might have made it necessary
* don't hold the recovery_lock since we don't need any OSD-level
synchronization
* make sure the PG is not being deleted before we do a force-change of its
state
Fixes: http://tracker.ceph.com/issues/20808
Signed-off-by: Greg Farnum <gfarnum@redhat.com>
This avoids crashing when older monitors do not support it.
Fixes: http://tracker.ceph.com/issues/20850
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
The cluster is expected to become degraded during reboot.
Fixes: http://tracker.ceph.com/issues/20731
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
It seems many are initially unclear as to how the current
implementations of mclock op queues work, so we need to document it to
avoid confusion.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Get rid of the undefined behavior of destroying condition variables
while they're being waited on.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
From: Zheng Yan <zyan@redhat.com>
Kick discover messages when transitioning from STATE_STARTING
to STATE_ACTIVE.
Fixes: http://tracker.ceph.com/issues/20799
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
The summary field can be extended with the fields
used to construct the message (e.g. including
the down osd count in the message about osds
being down).
The detail entries, similarly, can be extended
with machine-readable fields like the PG ID
for a damaged PG.
For the moment all the internal stuff is just
strings still, but we change the output format
so that we don't break it later when we
add things.
Signed-off-by: John Spray <john.spray@redhat.com>
This was working for setting values, but failing to call
the config observers, so some values didn't take effect.
Fixes: http://tracker.ceph.com/issues/20803
Signed-off-by: John Spray <john.spray@redhat.com>