From 7d954cefb1139011e90cd33e999aa357cbcdabb1 Mon Sep 17 00:00:00 2001 From: Dhairya Parmar Date: Thu, 18 Apr 2024 20:07:36 +0530 Subject: [PATCH] qa: add a YAML to ignore MGR_DOWN warning RCA showed that it is not the NFS code that lead to the warning since the warning occurred before the test cases started to execute, later on after some discussion with the venky and greg, it was found that there were some clog changes made recently which leads to this warning being added to the clog. Digging more further, it was found that the warning is generated when mgr fail is run when there is no mgr available. The reason for unavailability is when `setup_mgrs()` in class `MgrTestCase` stops the mgr daemons, sometimes the mgr just crashes - `mgr handle_mgr_signal *** Got signal Terminated ***` and after which `mgr fail` (again part of `setup_mgrs()`) is run and the `MGR_DOWN` warning is generated. This warning is only evident in nfs is because this is the only fs suite that makes use of class `MgrTestCase`. To support my analysis, I had ran about eight jobs in teuthology and I could not reproduce this warning. Since this is not harming the NFS test cases execution and the logs do mention that the mgr daemon did get restarted (`INFO:tasks.cephadm.mgr.x:Restarting mgr.x (starting--it wasn't running)...`), it is good to conclude that ignoring this warning is the simplest solution. Fixes: https://tracker.ceph.com/issues/65265 Signed-off-by: Dhairya Parmar --- qa/suites/fs/nfs/overrides/ignore_mgr_down.yaml | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 qa/suites/fs/nfs/overrides/ignore_mgr_down.yaml diff --git a/qa/suites/fs/nfs/overrides/ignore_mgr_down.yaml b/qa/suites/fs/nfs/overrides/ignore_mgr_down.yaml new file mode 100644 index 00000000000..fb407420562 --- /dev/null +++ b/qa/suites/fs/nfs/overrides/ignore_mgr_down.yaml @@ -0,0 +1,9 @@ +# When the NFS test class is constructed, the `MgrTestCase.setup_mgrs` invokes +# `mgr fail` to restart the MGR which sometimes crashes the daemon and the +# warning `MGR_DOWN` is generated. This is an intermittent failure which is +# irrelevant to the NFS suite, and therefore should be ignored. + +overrides: + ceph: + log-ignorelist: + - MGR_DOWN