Saw it in a teuthology run:
-5645> 2020-07-20 04:34:32.067 7f351e329700 5 osd.5 pg_epoch: 667 ... exit Started/Primary/Active/Backfilling
-5642> 2020-07-20 04:34:32.067 7f351e329700 5 osd.5 pg_epoch: 667 ... enter Started/Primary/Active/Recovered
-5633> 2020-07-20 04:34:32.067 7f351e329700 20 osd.5 pg_epoch: 667 ... _update_calc_stats shard 5 primary objects 0 missing 0
-5632> 2020-07-20 04:34:32.067 7f351e329700 20 osd.5 pg_epoch: 667 ... _update_calc_stats shard 3 objects -1 missing 1
-5631> 2020-07-20 04:34:32.067 7f351e329700 20 osd.5 pg_epoch: 667 ... _update_calc_stats shard 6 objects 0 missing 0
This will crash the choose_acting() procedure as it will mistakenly
think that peer 3 should continue to perform asynchronous recovery
(e.g., due to num_objects_missing = 1) in contrast to fully
backfill-recovered.
While I did not dig into the real cause, there are a couple of
possible explanations of how num_objects can be off. I think that
if a roll forward or log replay could delete something twice, maybe
there would be an undercount. Or maybe something as simple as a
corruption.
Since _update_calc_stats() is going to fix num_objects_missing
for that peer anyway, let's make sure it always starts with a
clean state.
Fixes: https://tracker.ceph.com/issues/46705
Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
This change adds new scrubber.req_scrub to track user
requested scrubs, deep_scrub or repair.
Fixes: https://tracker.ceph.com/issues/46275
Signed-off-by: David Zafman <dzafman@redhat.com>
because we don't bind both v1 and v2 addresses, when monitor returns a
v1 peer address, as the client side, crimson-osd just drops the
connection. but this failed attempt to learn the myaddr resets
`need_addr`. and this prevents crimson-osd from learning the v2 address
returned by monitor.
in this change, we reset need_addr only after it is learned from the
peer.
Signed-off-by: Kefu Chai <kchai@redhat.com>
* add --flavor option, which is "default" by default, so one can, for
example, pass "--flavor crimson" to ceph-debug-docker
* extract $repo_url to avoid repeating the shared bits between centos
and debian derivatives envs.
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/36136/head:
qa/tasks/nfs:Add test for relative and just '/' pseudo path
mgr/nfs: Check if pseudo path is absolute path or just '/'
Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Nathan Cutler <ncutler@suse.com>
seastar uses setjmp() and longjmp() to implement coroutine, but
longjmp() is defined as ____longjmp_chk() by GCC if _FORTIFY_SOURC is
defined. ____longjmp_chk() simply bails out with an error message if
the dest stack pointer is higher than the src stack pointer, or the dest
stack pointer is not in the sigaltstack. in the case of seastar, the dst
%sp is not necessarily higher than src stack pointer, and it's not
handling a signal for switching the thread context. that's why we have
the "longjmp causes uninitialized stack frame" error when running
crimson-osd on RHEL/CentOS 8 using the prebuilt rpm packages.
the optflags rpm macro adds -D_FORTIFY_SOURCE=2 to CFLAGS and CXXFLAGS,
so even seastar tries to pass -U_FORTIFY_SOURCE to GCC, there is chance
that cmake append CXXFLAGS at the end of the option list passed to GCC.
and this renders seastar's attempt to undefine _FORTIFY_SOURCE useless.
another way to address this issue is to undefine this macro in
seastar:src/core/thread.cc. but since seastar tries neutralize the macro
in its cmake script instead of source file, i assume they have their
considerations. let's drop it in the rpm recipe instead.
Signed-off-by: Kefu Chai <kchai@redhat.com>
* refs/pull/36203/head:
client: cleanup the fuse client code
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Zheng Yan <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
`--smp` and `--cpuset` have been passed to crimson-osd by vstart.sh, so
no need to pass them when launching vstart.sh
Signed-off-by: Kefu Chai <kchai@redhat.com>
example for deploying multiple specs via yaml was missing the service_id
Fixes: https://tracker.ceph.com/issues/46377
Signed-off-by: Michael Fritch <mfritch@suse.com>
service_id is required for iscsi, mds, nfs, osd, rgw.
any other service_type (mon, mgr, etc.) should not contain a service_id
Fixes: https://tracker.ceph.com/issues/46175
Signed-off-by: Michael Fritch <mfritch@suse.com>
Normally IO is tracked via the AioCompletion's async_op but the
scheduler will "complete" writes while the IO might be still
executing. Therefore, prior to shutting down this dispatch layer
we need to wait for all IO to complete.
Fixes: https://tracker.ceph.com/issues/46668
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
* default to centos:8, as we've moved to centos:8 now
* do not assume that the base image is centos:7, use centos:8 if it is
specified.
* install python3-* packages for centos:8 and install python36-*
packages for centos:7. as el8 is now a python3 distro, and
centos:7 now has python36.
* s/screen/tmux/. because screen is now offered by EPEL, while tmux
is in BaseOS.
Signed-off-by: Kefu Chai <kchai@redhat.com>
mgr/dashboard: increase API test coverage in API controllers
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puertat <epuertat@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>