We were using the ascii version of the gpg key which
was marked as an unsupported filetype by apt-get which
caused apt-get to not make use of the repo source we
were adding.
Additionally, added aomething to make sure we update the
package list after adding the source and key
Fixes: https://tracker.ceph.com/issues/44972
Fixes: https://tracker.ceph.com/issues/45009
Signed-off-by: Adam King <adking@redhat.com>
Agreed in #ceph-devel on 6/10. The current controlling
rationale is that the default value should be sufficient to
marshall a SHA-512 checksum.
Fixes: https://tracker.ceph.com/issues/51166
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like
handle_auth_bad_method server allowed_methods [2] but i only support [2]
in the case of ceph cli, the error would look like:
[errno 13] RADOS permission denied (error connecting to the cluster)
so, to address this issue, the EACCES error is ignored when waiting
for a quorum.
Signed-off-by: Kefu Chai <kchai@redhat.com>
doc/dev/cephadm: cephadm bootstrap --shared_ceph_folder
Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
so the rebuild paxos transaction won't be overwritten by the ones
created before recovery completes.
when the quorum is recovering, the leader will collect the paxos
transactions from peons. if the quorum accept the proposal for setting
the fingerprint, the peon will update the monitor with the paxos
transaction with a newer "last_committed" than the one created using
update_paxos() in ceph_monstore_tool.cc. the latter "last_committed" is
always 0.
so, to avoid this extra paxos proposal obsoleting the "rebuilding" paxos
transaction, we use a large enough number for {first,last}_committed.
Fixes: http://tracker.ceph.com/issues/38219
Signed-off-by: Kefu Chai <kchai@redhat.com>
*: stop using <experimental/filesystem> as an alternative
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
qa/standalone: Use osd op queue = wpq in activate_osd() within ceph-helpers.sh.
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
<charconv> is available since GCC-8, see https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017
> Elementary string conversions P0067R5 11.1 (integral types supported since 8.1) __has_include(<charconv>), __cpp_lib_to_chars >= 201611
since we always have the access to GCC-8.1 and up, there is no need to
detect the existence of <charconv> anymore.
also, because GCC-11 introduced the support of float types support,
update the comment to reflect the change.
Signed-off-by: Kefu Chai <kchai@redhat.com>
since there is no need to be compatible with GCC older than GCC-8, so
there is no need to use <experimental/filesystem> as an alternative of
<filesystem> anymore.
Signed-off-by: Kefu Chai <kchai@redhat.com>
reverts 9dedabde52
since there is no need to be compatible with GCC older than GCC-8, so
there is no need to use boost::filesystem as an alternative of
std::filesystem anymore.
Signed-off-by: Kefu Chai <kchai@redhat.com>
since we've dropped the support of GCC older than v8.0, there is no need
to detect <experimental/filesystem>
Signed-off-by: Kefu Chai <kchai@redhat.com>
for better C++17 support, for instance for a better std::filesystem
support.
the reason why 8.1 is required is that ubuntu focal provides GCC-8.1,
and RHEL/CentOS8 provides GCC-8.4.1. so we only test the build on
GCC-8.1 and up so far.
Signed-off-by: Kefu Chai <kchai@redhat.com>
The global recovery event progress calculations only
takes into account pgs with `reported_epoch < start_epoch_of_event`
but sometimes the pgs doesn't get move before or after the creation
of the global recovery event, therefore this might result in a bug
where the global event gets stuck forever unless there is another
event that specifically makes the pgs that get stuck moves and updates
its `reported_epoch`.
Therefore, we decided to disregard pgs that are in active+clean state
but has `reported_epoch < start_epoch_of_event`.
Fixes: https://tracker.ceph.com/issues/49988
Signed-off-by: Kamoltat <ksirivad@redhat.com>
It's yet another racing issue which happens when auth request
handling is performed during the `active_con` reset sequence.
It caused the following `nullptr` dereference at Sepia:
```
DEBUG 2021-06-09 10:27:24,059 [shard 0] ms - [osd.6(client) v2:172.21.15.170:6809/33397 >> client.? -@39840] GOT AuthRequestFrame: method=2, p
referred_modes={2, 1}, payload_len=174
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4977-g65cb255e/rpm/el8/BUILD/ceph-17.0.0-4977-g65cb255e/src/crimson/mon/MonClient.cc:595:26: runtime error: member call on null pointer of type 'struct Connection'
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4977-g65cb255e/rpm/el8/BUILD/ceph-17.0.0-4977-g65cb255e/src/crimson/mon/MonClient.cc:178:11: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
0# 0x0000563F9C00395F in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007F4A064D0B20 in /lib64/libpthread.so.0
4# crimson::mon::Connection::get_keys() in ceph-osd
5# crimson::mon::Client::handle_auth_request(seastar::shared_ptr<crimson::net::Connection>, seastar::lw_shared_ptr<AuthConnectionMeta>, bool, unsigned int, ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*) in ceph-osd
6# crimson::net::ProtocolV2::_handle_auth_request(ceph::buffer::v15_2_0::list&, bool) in ceph-osd
7# 0x0000563F9D007B39 in ceph-osd
8# 0x0000563F9D008C45 in ceph-osd
9# 0x0000563F95FF8D70 in ceph-osd
10# 0x0000563FA1A560BF in ceph-osd
11# 0x0000563FA1A5B600 in ceph-osd
12# 0x0000563FA1C0D66B in ceph-osd
13# 0x0000563FA176B0EA in ceph-osd
14# 0x0000563FA177520E in ceph-osd
15# main in ceph-osd
16# __libc_start_main in /lib64/libc.so.6
17# _start in ceph-osd
Fault at location: 0xb0
```
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
The documentation specifies this in [1] and yet we were using (I
believe) an older syntax:
ceph tell mds.foo:0 scrub start / recursive force
instead of
ceph tell mds.foo:0 scrub start / recursive,force
Oddly the former works at least as recently as in [2]:
2021-06-03T07:11:42.071 DEBUG:teuthology.orchestra.run.smithi025:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub start / recursive force
...
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout:{
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout: "return_code": 0,
2021-06-03T07:11:42.268 INFO:teuthology.orchestra.run.smithi025.stdout: "scrub_tag": "cf7a74b2-3eb2-4657-9274-ea504b1ebf8f",
2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout: "mode": "asynchronous"
2021-06-03T07:11:42.269 INFO:teuthology.orchestra.run.smithi025.stdout:}
[1] https://docs.ceph.com/en/latest/cephfs/scrub/
[2] /ceph/teuthology-archive/pdonnell-2021-06-03_03:40:33-fs-wip-pdonnell-testing-20210603.020013-distro-basic-smithi/6148097/teuthology.log
Fixes: https://tracker.ceph.com/issues/51146
See-also: https://tracker.ceph.com/issues/51145
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* "mds dump"
"mds getmap",
"mds stop",
"mds set_max_mds",
"mds set",
"mds rmfailed"
"mds add_data_pool"
"mds rm_data_pool"
"mds remove_data_pool"
the commands above were marked "OBSOLETE" back in
a8fc92933b, which was included in v13.0.1,
* "mds tell" was marked obsolete in
e0d1127205, which was included in v12.0.2,
* "mds deactivate" was marked obsolete in
c7bd6f02c7, which was included in v13.1.0,
* "mds newfs" was marked obsolete in
072c41e349, which was included by v12.0.2
so according to our command retirement policy proposed by
https://ceph-users.ceph.narkive.com/iUh4e0nj/rfc-deprecating-ceph-tool-commands
> Once two major releases go by, the command will then enter the OBSOLETE
> period. This would be one major release, during which the command would
> no longer work although still acknowledged. A simple message down the
> lines of 'This command is now obsolete; please check the docs' would
> suffice to inform the user.
since the next release will be v17. it's been long enough for retiring
these OBSOLETE commands.
Signed-off-by: Kefu Chai <kchai@redhat.com>
the "scrub" command was marked obsolete in
e9a5ce0897, which was included by
v15.1.0, but the next release will be v17, so it's long enough to retire
this command.
Signed-off-by: Kefu Chai <kchai@redhat.com>
This change is a follow-up to commit
b6e9c0903d that set the scheduler to wpq in
run_osd() and run_osd_filestore(). In addition, activate_osd() too has to
set the scheduler type to 'wpq' in order to be consistent and avoid test
failures.
The above is a temporary measure until all the standalone tests are
modified to run well with the mclock_scheduler.
Fixes: https://tracker.ceph.com/issues/51074
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
When read failed, ret can be taken as data len in FillInVerifyExtent, which should be avoided.
It may cause errors in crc repair or retry read because of the data len. In my case, we use FillInVerifyExtent for EC read,
when meet -EIO,we will try crc repair, which need read data from other shard accrding to data len.
And I meet assert in ECBackend.cc (loc: line 2288 ceph_assert(range.first != range.second) ), But it seems master branch not support EC crc repair.
In shot, when reuse the readop may cause unpredictable error.
Fixes: https://tracker.ceph.com/issues/51115
Signed-off-by: yanqiang-ux <yanqiang_ux@163.com>
mgr/dashboard: Include Network address and labels on Host Creation form
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>