Commit Graph

143552 Commits

Author SHA1 Message Date
luo rixin
0f660886dd CMakeLists: Modify CEPH_TEST_TIMEOUT from 3600s to 7200s
There are some older Arm server running pretty slow, the make
check jobs like `check-generated.sh` are killed as the job timeout.
Make CEPH_TEST_TIMEOUT more longer.

Signed-off-by: luo rixin <luorixin@huawei.com>
2024-02-20 10:48:56 +08:00
Patrick Donnelly
182f4c0f54
qa: test fuse/kclient for mds upgrade seq
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-02-19 21:20:08 -05:00
Patrick Donnelly
4bcaaa45eb
qa: ignore OSD_DOWN during cephadm upgrades
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-02-19 21:20:08 -05:00
Patrick Donnelly
75d76f97b0
qa: ignore warning "Replacing daemon"
This is expected for cephadm deployments where join_fs is configured, causing
affinity replacements.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-02-19 21:20:05 -05:00
Patrick Donnelly
560300f1c5
qa: ignore MDS_INSUFFICIENT_STANDBY
This is expected when bringing a volume and its mds up initially.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-02-19 21:20:04 -05:00
Patrick Donnelly
0e5e847f08
qa: remove ignorelist error parenthesis
Some messages are duplicated to the cluster log lookign like:

    2024-02-15T22:54:31.244 INFO:teuthology.orchestra.run.smithi033.stdout:2024-02-15T22:50:00.000263+0000 mon.smithi033 (mon.0) 558 : cluster 4 [ERR] MDS_ALL_DOWN: 1 filesystem is offline

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-02-19 21:20:04 -05:00
Patrick Donnelly
427ad7c0f9
mds: update comment on kclient decoding of MDSMap
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-02-19 21:20:04 -05:00
Patrick Donnelly
52c09aa1e5
qa: do upgrades from quincy and older reef minor releases
Fixes: https://tracker.ceph.com/issues/64441
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-02-19 21:20:04 -05:00
Patrick Donnelly
78abfeaff2
mds: reverse MDSMap encoding of max_xattr_size/bal_rank_mask
Commit e134c890 adds the bal_rank_mask with encoded (ev) version 17.  This was
merged into main Oct 2022 and made it into the reef release normally.

Commit 7b8def5c adds the max_xattr_size also with encoded (ev) version 17 but
places it before bal_rank_mask. This is problematic as there were no plans to
backport e134c890 to quincy or pacific so piggybacking on the ev 17 bump would
not work and otherwise would require the backports to be done as a set to
ensure consistency (including with the kernel client).

However, the real issue is that 7b8def5c was not merged until after reef was
already cut. This required 7b8def5c to be backported separately in [1] which
was not merged until after v18.2.1 (current reef HEAD as of this commit).
Ultimately, this means that there are reef versions (v18.2.[01]) in the wild
which expect bal_rank_mask to be encoded at ev17 and not (max_xattr_size,
bal_rank_mask). Adding to the complications, the kernel client has already
merged code [2] expecting max_xattr_size for ev17.

It was decided in a github discussion [3] to move bal_rank_mask to ev18 to
avoid updating the kernel client which was done in the main branch via 36ee8e7e
and update the reef max_xattr_size backport with the same change (d8cebd67).

Unfortunately, this breaks upgrades v18.2.[01] to newer reef versions or to
main.  The reason is that monitors will encode v17 with bal_rank_mask
(max_xattr_size is not merged yet) and send that to upgraded mgrs (which are
upgraded first). The mgr will attempt to decode bal_rank_mask as a uint64_t
(max_xattr_size) but fail because an empty (by default) bal_rank_mask is simply
encoded as a signed 32-bit integer. Consequently, the mgr will fail decoding
with:

    failed to decode message of type 45 v1: End of buffer [buffer:2]

Of course the problem does not stop there, even if the mgr were able to handle
this, the monitors/mds/clients would fail in similar fashion.

So the only choice left is to fix max_xattr_size to be encoded at ev18.
Fortunately, v18.2.2 has not been released nor has any max_xattr_size backport
to quincy/pacific been merged. The main downside will be that kernels will
wrongly decode ev17 (which is already true for ceph clusters running
v18.2.[01]). A follow-up kernel fix will be required.

[1] https://tracker.ceph.com/issues/59405
[2] linux.git d93231a6bc8a452323d5fef16cca7107ce483a27
[3] https://github.com/ceph/ceph/pull/53340#discussion_r1399255031

Fixes: https://tracker.ceph.com/issues/64440
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-02-19 21:20:04 -05:00
Patrick Donnelly
bbaadacd48
client: log debug message when requesting unmount
Importantly: do this before any locks are to be acquired.

Fixes: https://tracker.ceph.com/issues/64503
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
2024-02-19 21:16:19 -05:00
Casey Bodley
c66cb5ee3b
Merge pull request #55522 from cbodley/wip-63373
rgw/datalog: RGWDataChangesLog::add_entry() uses null_yield

Reviewed-by: Adam Emerson <aemerson@redhat.com>
2024-02-19 17:58:31 +00:00
Casey Bodley
d9f41f9c8d
Merge pull request #54554 from clwluvw/s3select-usage
rgw: add s3select usage to log usage

Reviewed-by: Gal Salomon <gsalomon@redhat.com>
2024-02-19 17:56:40 +00:00
Casey Bodley
fff82ac806
Merge pull request #55236 from tobias-urdin/keystone-invalidate-admin-token
rgw: invalidate and retry keystone admin token

Reviewed-by: Jiffin Tony Thottan <jthottan@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
2024-02-19 17:54:42 +00:00
Casey Bodley
888071f413
Merge pull request #55286 from BBoozmen/oozmen_decorate_lc_events
rgw/lc: decorating log events with more details

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
2024-02-19 17:53:28 +00:00
Casey Bodley
8196d69659
Merge pull request #55451 from pritha-srivastava/wip-rgw-admin-ops-user-info
rgw: code to display the complete user id that includes tenant, names…

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
2024-02-19 17:52:02 +00:00
Casey Bodley
7682fae85e
Merge pull request #55508 from cbodley/wip-59186
rgw/user: add 'active' flag to RGWAccessKey

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
2024-02-19 17:50:59 +00:00
Casey Bodley
3585245b8c
Merge pull request #55509 from ivancich/wip-display-manifest
rgw: add new `object manifest` sub-command

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
2024-02-19 17:49:13 +00:00
zdover23
91acea6410
Merge pull request #55637 from zdover23/wip-doc-2024-02-19-cephfs-add-remove-mds-warning-notes
doc/cephfs: edit add-remove-mds

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2024-02-20 02:22:17 +10:00
Zac Dover
39ad6264aa doc/cephfs: edit add-remove-mds
Disambiguate a note in doc/cephfs/add-remove-mds.rst to help readers
distinguish between cases in which they might want to use an automated
tool such as cephadm to deploy MDSes and cases in which they might want
to manually deploy MDSes.

See: https://github.com/ceph/ceph/pull/45639

Tracker: https://tracker.ceph.com/issues/54551

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-02-20 01:56:37 +10:00
Ivo Almeida
8aaa0bca47 mgr/dashboard: fix subvolume group edit
Fixes: https://tracker.ceph.com/issues/64487
Signed-off-by: Ivo Almeida <ialmeida@redhat.com>
2024-02-19 12:57:04 +00:00
Matan Breizman
68400ff545 crimson/osd/main: enable multicore client msgr
Taken from: f78e99c059

Co-authored-by: Yingxin Cheng <yingxin.cheng@intel.com>
Co-authored-by: Chunmei Liu <chunmei.liu@intel.com>
Co-authored-by: Xinyu Huang <xinyu.huang@intel.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2024-02-19 12:22:16 +00:00
luo rixin
450f269235 debian/control: add new dependencies protobuf for crismon
As PR https://github.com/ceph/ceph/pull/55444 update seastar version,
the new seastar imports new dependencies protobuf.

Fixes: https://tracker.ceph.com/issues/64420

Signed-off-by: luo rixin <luorixin@huawei.com>
2024-02-19 17:15:58 +08:00
Venky Shankar
98242a71bf Merge PR #55471 into main
* refs/pull/55471/head:
	qa: verify labelled replication perf metrics
	qa: test per-client labelled perf counters
	mds: export per-client metrics as labelled perf counters
	cephfs_mirror: add labeled replication performance metrics
	cephfs-mirror: typo ending bracket

Reviewed-by: Robin H. Johnson <robbat2@orbis-terrarum.net>
2024-02-19 14:17:48 +05:30
zdover23
dfd63293ce
Merge pull request #55633 from zdover23/wip-doc-2024-02-18-man-ceph-objectstore-tool
doc/man: edit "manipulating the omap key"

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2024-02-19 18:00:06 +10:00
Casey Bodley
0eebbc3f02 rgw: RGWSI_SysObj_Cache::remove() invalidates after successful delete
invalidating the cache before the librados delete means that a racing call
to `RGWSI_SysObj_Cache::read()` may succeed and repopulate the cache. in
that case, subsequent reads will continue to return cached data even after
the librados delete succeeds

Fixes: https://tracker.ceph.com/issues/64480

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2024-02-18 10:30:00 -05:00
Zac Dover
44ec668d43 doc/man: edit "manipulating the omap key"
Edit the section "Manipulating the Object Map Key" in
doc/man/8/ceph-objectstore-tool.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-02-18 16:01:46 +10:00
Kefu Chai
34933f5120
Merge pull request #54586 from xxhdx1985126/wip-pglog-concatenate
tools/ceph_objectstore_tool: add op "expand-log"

Reviewed-by: Samuel Just <sjust@redhat.com>
2024-02-18 10:31:55 +08:00
zdover23
ad0692a163
Merge pull request #55626 from zdover23/wip-doc-2024-02-17-rados-operations-placement-groups-basic-definition
doc/rados: add PG definition

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2024-02-18 12:28:08 +10:00
Zac Dover
39c809b33f doc/rados: add PG definition
Add a definition of Placement Groups to
doc/rados/operations/placement-groups.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2024-02-18 12:10:57 +10:00
Oguzhan Ozmen
b4b1868a00 rgw/http/client-side: disable curl path normalization
test_multi.py:test_object_sync is updated to reproduce the issue.
Without the fix, objects "." and ".." are not replicated and the test
fails (times out).

Fixes: https://tracker.ceph.com/issues/64366
Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>
2024-02-17 19:57:57 -05:00
Lucian Petrut
e1452649b8 common/tracer: fix decoding when jaeger tracing is disabled
We aren't currently using jaeger tracing on Windows. The issue is
that Windows hosts (or any other host that doesn't use jaeger)
are experiencing message decoding failures after a recent change [1].

This change updates the tracer encoding so that messages from
non-jaeger hosts may be decoded by services that use jaeger.

[1] https://github.com/ceph/ceph/pull/47457

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>

This commit rebrings 3701ffa673 which
got reverted due to an implicit dependency with other revert. Please
see https://github.com/ceph/ceph/pull/52114#issuecomment-1950288188.

Conflicts:
	src/common/tracer.h
	  formatting conflict with 7179ac0037
2024-02-17 19:31:06 +00:00
Kefu Chai
6850bc28ae
Merge pull request #54963 from DimStar77/cmake328
cmake: Ensure git exists before executing it

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
2024-02-17 23:18:23 +08:00
Venky Shankar
f29dd57cd0 qa: verify labelled replication perf metrics
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2024-02-16 20:06:34 -05:00
Venky Shankar
36e24585d5 qa: test per-client labelled perf counters
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2024-02-16 20:06:34 -05:00
Venky Shankar
164c547edc mds: export per-client metrics as labelled perf counters
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2024-02-16 20:06:34 -05:00
Venky Shankar
658ee6c401 cephfs_mirror: add labeled replication performance metrics
Fixes: http://tracker.ceph.com/issues/63945
Signed-off-by: Jos Collin <jcollin@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2024-02-16 20:06:34 -05:00
Venky Shankar
4c14f143b5 cephfs-mirror: typo ending bracket
Signed-off-by: Venky Shankar <vshankar@redhat.com>
2024-02-16 20:06:34 -05:00
Samuel Just
e84518e638 unittest-seastar-socket: tolerate connection_reset in test_unexpected_down
Fixes: https://tracker.ceph.com/issues/64457
Signed-off-by: Samuel Just <sjust@redhat.com>
2024-02-16 21:40:46 +00:00
Ronen Friedman
0ed764e649
Merge pull request #55605 from ronen-fr/wip-rf-warns0224
osd: clean compiler warnings

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2024-02-16 20:57:45 +02:00
Radoslaw Zarzynski
6e39aa0eff
Merge pull request #54922 from pponnuvel/disable_network_stats
mon, mgr: do not output network ping stats

Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2024-02-16 17:54:02 +01:00
Adam Kupczyk
6de5b2b842
Merge pull request #53565 from pereman2/shared-blob-to-blob
os/bluestore: optional SharedBlob on Blob structure
2024-02-16 17:26:54 +01:00
Laura Flores
af013c6d75
Merge pull request #55570 from rzarzynski/wip-bug-64192
osd: always send returnvec-on-errors for client's retry
2024-02-16 10:23:21 -06:00
Yuri Weinstein
b9b81d2cf8
Merge pull request #55602 from ceph/wip-yuriw-add-squid-main
qa/tests: added squid option

Reviewed-by: Laura Flores <lflores@redhat.com>
2024-02-16 07:11:52 -08:00
Casey Bodley
737a3f98e4
Merge pull request #55582 from cbodley/wip-63642
rgw/putobj: RadosWriter uses part head object for multipart parts

Reviewed-by: Mark Kogan <mkogan@ibm.com>
Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
2024-02-16 14:59:31 +00:00
Ronen Friedman
7b3e330787
Merge pull request #55453 from ronen-fr/wip-rf-0741-logs
osd/scrub: improve scheduling decision logs

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
2024-02-16 16:24:13 +02:00
Casey Bodley
bf3a294f6f
Merge pull request #54856 from linuxbox2/wip-accept-new-awssigv4
rgw: cumulatively fix 6 AWS SigV4 request failure cases

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2024-02-16 13:52:34 +00:00
Ponnuvel Palaniyappan
bede018636 mon, mgr: do not output network ping stats
When doing PG dump using 'ceph pg dump --format json-pretty'
the output is extremely big that the command hangs and also
the ceph-mgr hangs and eventuall fails over.

The exact size depends on the number of OSDs in the cluster
and the number of peers for each OSD.

In tests, it's been identified that the network ping times
is the largest component in terms of size which is removed
from the output now so as to limit the overall size.

Fixes https://tracker.ceph.com/issues/57460

Signed-off-by: Ponnuvel Palaniyappan <pponnuvel@gmail.com>
2024-02-16 11:03:57 +00:00
Ilya Dryomov
fda8b5acbd
Merge pull request #55579 from idryomov/wip-64423
librbd: fix split() for SparseExtent and SparseBufferlistExtent

Reviewed-by: Mykola Golub <mgolub@suse.com>
2024-02-16 10:24:02 +01:00
Ilya Dryomov
aface5ab14
Merge pull request #55530 from trociny/wip-64376
tools/rbd: make 'children' command support --image-id

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2024-02-16 10:22:50 +01:00
Ronen Friedman
a1cf175ee1 test/osd: fix test_scrub_sched following scrubber changes
Replacing PgScrubber::determine_scrub_time() with a local copy,
as a stop-gap measure to keep the test running.
The scrub scheduling refactoring will remove the need for
this function, and the test will be updated accordingly.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2024-02-16 00:34:25 -06:00