Commit Graph

141726 Commits

Author SHA1 Message Date
Casey Bodley
07614d51c2
Merge pull request #54599 from cbodley/wip-crush-test-warnings
crush: remove unused variables

Reviewed-by: Ronen Friedman <rfriedma@redhat.com>
2023-11-22 13:20:25 +00:00
Ilya Dryomov
0e0b229496
Merge pull request #53291 from petrutlucian94/unicode
common: Windows Unicode support

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2023-11-22 12:03:59 +01:00
Lucian Petrut
2cef5b3ce4 test: fix Windows ::_creat
The Windows Universal C Runtime (ucrt) "_creat" function is no
longer POSIX compatible and requires Windows specific mode flags.

We got admin socket test failures after switching from msvcrt to
uscrt.

We'll address the issue with some platform checks.

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
2023-11-22 09:15:00 +00:00
Lucian Petrut
750a9483d1 common: Windows Unicode CLI support
Windows CLI arguments use either ANSI (main()) or UTF-16 (wmain()).
Meanwhile, Ceph libraries expect UTF-8 and raise exceptions when
trying to use Unicode CLI arguments or log Unicode output:

  rbd.exe create test_unicode_șțăâ --size=32M
  terminate called after throwing an instance of 'std::runtime_error'
    what():  invalid utf8

We'll use a Windows application manifest, setting the "activeCodePage"
property [1][2]. This enables the Windows UCRT UTF-8 mode so that
functions that receive char* arguments will expect UTF-8 instead of ANSI,
including main(). One exception is CreateProcess, which will need the
UTF-16 form (CreateProcessW).

Despite the locale being set to utf-8, we'll have to explicitly set
the console output to utf-8 using SetConsoleOutputCP(CP_UTF8).

In order to use the UTF-8 locale, we'll have to switch the mingw-llvm
runtime from msvcrt to ucrt.

This also fixes ceph-dokan crashes that currently occur when non-ANSI
paths are logged.

[1] https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activecodepage
[2] https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page

Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
2023-11-22 09:14:49 +00:00
Brad Hubbard
e877333f07
Merge pull request #54566 from badone/wip-python-version-fedora-39
do_cmake.sh: set python version for Fedora 39

Reviewed-by: Kefu Chai <tchaikov@gmail.com>
2023-11-22 09:44:22 +10:00
J. Eric Ivancich
eb4b542976
Merge pull request #54447 from ceph/wip-fix-flight-load-bucket
rgw: fix flight load_bucket call

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2023-11-21 16:36:53 -05:00
J. Eric Ivancich
0af251b856
Merge pull request #47208 from 5cs/fix-lambda-capture-by-ref
rgwlc: lock_lambda overwrites ret val

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
2023-11-21 16:36:27 -05:00
zdover23
0bd8b17bfa
Merge pull request #54598 from zdover23/wip-doc-2023-11-22-rados-troubleshooting-mon-recovering-broken-monmap
doc/rados: edit "recovering broken monmap"

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2023-11-22 05:56:33 +10:00
Zac Dover
6ccb9f3ca1 doc/rados: edit "recovering broken monmap"
Edit the section "Recovering a monitor's broken monmap" in
doc/rados/troubleshooting/troubleshooting-mon.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2023-11-22 05:33:37 +10:00
Casey Bodley
c94166de2a rgw: fix RGWPeriod encoding after removing realm_name
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-11-21 14:25:14 -05:00
Casey Bodley
9cab99bae8 crush: remove unused variables
[161/715] Building CXX object src/crush/CMakeFiles/crush_objs.dir/CrushTester.cc.o
ceph/src/crush/CrushTester.cc:478:7: warning: variable 'num_devices_active' set but not used [-Wunused-but-set-variable]
  int num_devices_active = 0;
      ^
1 warning generated.
[165/715] Building CXX object src/crush/CMakeFiles/crush_objs.dir/CrushWrapper.cc.o
ceph/src/crush/CrushWrapper.cc:1579:9: warning: variable 'local_changed' set but not used [-Wunused-but-set-variable]
    int local_changed = 0;
        ^

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-11-21 13:44:41 -05:00
Casey Bodley
338440ae2c rgw: non-multipart uploads serve entire range on partNumber=1
and omit the x-amz-mp-parts-count response header

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-11-21 11:09:23 -05:00
Casey Bodley
c02129eb8c ReleaseNotes: document support for partNumber
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-11-21 11:09:23 -05:00
Casey Bodley
6fc57159ef rgw/rados: RadosReadOp::prepare only updates object instance
when called on a versioned object, prepare() may follow olh and look up
a different object instance

but when called on a multipart part, we should not overwrite the
original object name with the part's object name (of the form
mymultipart.2~_XLFNqOW0NuiALg7q4-Hi_7hdtAkZUH.1)

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-11-21 11:09:23 -05:00
Casey Bodley
01d8b4c38b rgw/rados: part support for RGWRados::Object::Read
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-11-21 11:09:23 -05:00
Casey Bodley
8ae61ca506 rgw/rados: add obj_find_part() to RGWObjManifest
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-11-21 11:09:23 -05:00
Casey Bodley
a308e3a1d5 rgw/rados: add get_obj_state() overload for RGWObjStateManifest
add an overload to expose the manifest storage to callers of
get_obj_state(). the existing RGWObjState+RGWObjManifest overload
just calls the RGWObjStateManifest one

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-11-21 11:09:21 -05:00
Casey Bodley
60dadf3c8d rgw/rados: remove get_obj_state() overload for follow_olh=true
and just add the follow_olh=true argument to callers

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-11-21 10:24:31 -05:00
Casey Bodley
3354b67fbd rgw/s3: add part param and response to GetObj
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2023-11-21 10:24:31 -05:00
Casey Bodley
b0b050d31c
Merge pull request #52813 from cbodley/wip-59424
qa/rgw: run s3tests against keystone ec2

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
2023-11-21 13:33:58 +00:00
Guillaume Abrioux
bc868d9a4f
Merge pull request #53798 from asm0deuz/track_58812
ceph-volume: fixes fallback to stat in is_device and is_partition
2023-11-21 12:25:25 +01:00
Samuel Just
1ea87baab6
Merge pull request #54513 from Matan-B/wip-matanb-crimson-snaptrimevent-lifetime
crimson/osd/osd_operations/snaptrim_event: lifetime fixes

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
2023-11-20 19:16:37 -08:00
Brad Hubbard
838489f6b1 do_cmake.sh: set python version for Fedora 39
If do_cmake.sh is being exeuted on Fedora 39 set Python version to 3.12.
Remove versions for anything earlier than Fedora 37

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
2023-11-21 10:44:33 +10:00
John Mulligan
7d48e8aa25 cephadm: add a custom template not found exception with diagnostic info
Add a new exception based on jinja2's template not found exception for
the case where the template was not found in the zip(app). We've been
having sporadic failures with this in CI & testing and hopefully
the additional information will help pinpoint the cause.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
2023-11-20 16:50:25 -05:00
Ronen Friedman
fd4e52b042 tools: modify ceph_dedup_tool to maintain Clang 15 compatibility
Adding 'typename' in two instances, where version 15 of Clang
still requires it. P0634R3, which made those 'typename' redundant,
is only supported starting Clang 16.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2023-11-20 12:44:17 -06:00
zdover23
40f55c30e5
Merge pull request #54574 from zdover23/wip-doc-2023-11-21-rados-troubleshooting-mon-understanding-mon-status
doc/rados: edit "understanding mon_status"

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2023-11-21 04:42:49 +10:00
zdover23
d1c85afcb9
Merge pull request #54565 from zdover23/wip-doc-2023-11-20-radso-troubleshooting-mon-admin-socket
doc/rados: edit "Using the Monitor's Admin Socket"

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2023-11-21 04:36:21 +10:00
Zac Dover
08c16aa113 doc/rados: edit "understanding mon_status"
Edit the section "Understanding mon_status" in
doc/rados/troubleshooting/troubleshooting-mon.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2023-11-21 03:18:17 +10:00
Ilya Dryomov
7d4651e9d6
Merge pull request #54571 from lxbsz/wip-63586-debuglog
osd: log the number of extents for sparse read

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2023-11-20 16:36:59 +01:00
zdover23
df051c8917
Merge pull request #54561 from zdover23/wip-doc-2023-11-20-documenting-ceph-url
doc/start: update release names

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2023-11-21 01:24:27 +10:00
Xiubo Li
1cf5ecb93f osd: add more debug logs for sparse read
This will be very important to get to know what exactly has happened
when client get a very large number of extents.

URL: https://tracker.ceph.com/issues/63586
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2023-11-20 22:04:30 +08:00
Tongliang Deng
c094f1a909 rgwlc: lock_lambda overwrites ret val
`lock_lambda` captures `ret` by reference, it will overwrites
returned value of `bucket_lc_process` when `wait_backoff` is called.

Fixes: c069eb7ff0.

Signed-off-by: Tongliang Deng <dengtongliang@gmail.com>
2023-11-20 08:54:02 -05:00
barakda
06b3d30814
Merge pull request #54564 from barakda/nvmeof_bump_latest_version
nvmeof bump latest version
2023-11-20 15:19:08 +02:00
Teoman ONAY
52ca4a61d5 ceph-volume: fixes fallback to stat in is_device and is_partition
os.stat (or lstat) cannot distinguish a block device from
a partition.

Fixes: https://tracker.ceph.com/issues/58812

Signed-off-by: Teoman ONAY <tonay@ibm.com>
2023-11-20 09:50:05 +01:00
Aashish Sharma
39fea8f71c
Merge pull request #51340 from Javlopez/feature/12087-upgrade-and-generate-grafana-dashboards
monitoring: add new dashboards

Fixes: https://tracker.ceph.com/issues/63592

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
2023-11-20 11:33:07 +05:30
Yingxin Cheng
93ef23d90a crimson/osd: allow to send messages concurrently
The ordering is now guaranteed upon calling send(), so there is no
reason to couple the crosscore send future with the operation phases --
exclusive phases will limit the send concurrency, potentially causing
OSD starvation.

Decouple the crosscore send futures in the IO pathes, mostly in
ClientRequest and OSDSingletonState::send_to_osd().

Issue-identified-by: Chunmei Liu <chunmei.liu@intel.com>
see PR53934.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2023-11-20 10:55:36 +08:00
Yingxin Cheng
6ebf9cd367 crimson/net: preserve the ordering upon the calls to Connection::send()/keepalive()
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2023-11-20 10:44:53 +08:00
Yingxin Cheng
77e66ad098 crimson/common/smp_helpers: generalize crosscore_ordering_t
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
2023-11-20 10:44:53 +08:00
Zac Dover
83ff8f2b67 doc/start: update release names
Update "Quincy" to "Reef" and "Pacific" to "Quincy" in the section
"Viewing Old Ceph Documentation" in /doc/start/documenting-ceph.rst.

Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com>
Signed-off-by: Zac Dover <zac.dover@proton.me>
2023-11-20 06:44:46 +10:00
Zac Dover
f627445806 doc/rados: edit "Using the Monitor's Admin Socket"
Edit the section "Using the Monitor's Admin Socket" in
doc/rados/troubleshooting/troubleshooting-mon.rst.

Signed-off-by: Zac Dover <zac.dover@proton.me>
2023-11-20 06:32:09 +10:00
barakda
e2e18a5722 nvmeof bump latest version
Signed-off-by: barakda <barak.davidov@gmail.com>
2023-11-19 22:24:09 +02:00
Yuval Lifshitz
329e2a1e04
Merge pull request #54528 from yuvalif/wip-yuval-63532
rgw/notifications: cleanup all coroutines after sending the notification

reviewed-by: cbodley
2023-11-19 21:25:39 +02:00
zdover23
b9b9ec8b7d
Merge pull request #54545 from zdover23/wip-doc-2023-11-17-start-intro-osd-glossary
doc/start: explain "OSD"

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2023-11-19 23:48:12 +10:00
Matan Breizman
abceb16522 crimson/osd/osd_operations/snaptrim_event: lifetime fixes
```
    // SnapTrimEvent is a background operation,
    // it's lifetime is not guarnteed since the caller
    // returned future is being ignored. We should capture
    // a self reference thourhgout the entire execution
    // progress (not only on finally() continuations).
    // See: PG::on_active_actmap()
```

Sanitized backtrace:
```
DEBUG 2023-11-16 08:42:48,441 [shard 0] osd - snaptrim_event(id=21122, detail=SnapTrimEvent(pgid=3.1 snapid=3cb needs_pause=1)): interrupted crimson::common::actingset_changed (acting set changed

kernel callstack:
    #0 0x55e310e0ace7 in seastar::shared_mutex::unlock() (/usr/bin/ceph-osd+0x1edd0ce7)
    #1 0x55e313325d9c in auto seastar::futurize_invoke<crimson::OrderedConcurrentPhaseT<crimson::osd::SnapTrimEvent::WaitSubop>::ExitBarrier<crimson::OrderedConcurrentPhaseT<crimson::osd::SnapTrimEvent::WaitSubop>::BlockingEvent::Trigger<crimson::osd::SnapTrimEvent> >::exit()::{lambda()#1}&>(crimson::OrderedConcurrentPhaseT<crimson::osd::SnapTrimEvent::WaitSubop>::ExitBarrier<crimson::OrderedConcurrentPhaseT<crimson::osd::SnapTrimEvent::WaitSubop>::BlockingEvent::Trigger<crimson::osd::SnapTrimEvent> >::exit()::{lambda()#1}&) (/usr/bin/ceph-osd+0x212ebd9c)
    #2 0x55e3133260ef in _ZN7seastar20noncopyable_functionIFNS_6futureIvEEvEE17direct_vtable_forIZNS2_4thenIZN7crimson23OrderedConcurrentPhaseTINS7_3osd13SnapTrimEvent9WaitSubopEE11ExitBarrierINSC_13BlockingEvent7TriggerISA_EEE4exitEvEUlvE_S2_EET0_OT_EUlDpOT_E_E4callEPKS4_ (/usr/bin/ceph-osd+0x212ec0ef)
0x61500013365c is located 92 bytes inside of 472-byte region [0x615000133600,0x6150001337d8)
freed by thread T2 here:
    #0 0x7fb345ab73cf in operator delete(void*, unsigned long) (/lib64/libasan.so.6+0xb73cf)
    #1 0x55e313474863 in crimson::osd::SnapTrimEvent::~SnapTrimEvent() (/usr/bin/ceph-osd+0x2143a863)

previously allocated by thread T2 here:
    #0 0x7fb345ab6367 in operator new(unsigned long) (/lib64/libasan.so.6+0xb6367)
    #1 0x55e31183ac18 in auto crimson::OperationRegistryI::create_operation<crimson::osd::SnapTrimEvent, crimson::osd::PG*, SnapMapper&, snapid_t const&, bool const&>(crimson::osd::PG*&&, SnapMapper&, snapid_t const&, bool const&) (/usr/bin/ceph-osd+0x1f800c18)
SUMMARY: AddressSanitizer: heap-use-after-free (/usr/bin/ceph-osd+0x1edd0ce7) in seastar::shared_mutex::unlock()
```

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2023-11-19 12:43:06 +00:00
Matan Breizman
3b162d38b2 crimson/osd: avoid refcount mutations
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2023-11-19 12:41:16 +00:00
Matan Breizman
45312902f2 crimson/osd/pg: introduce clear_log_entry_maps()
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2023-11-19 12:31:39 +00:00
Matan Breizman
1d98e8dab6 crimson/osd/pg: move submit_error_log to do_osd_ops_execute
Previously, submit_error_log was chained to failure_func
returned future.
Now submit_error_log is called from within do_osd_ops_execute

Fixes: https://tracker.ceph.com/issues/61651

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2023-11-19 12:27:22 +00:00
Yuval Lifshitz
63e14893cc rgw/notifications: cleanup all coroutines after sending the notification
this is fixing a regression from: 6b6592f50b

Fixes: https://tracker.ceph.com/issues/63580

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
2023-11-19 11:19:16 +00:00
Matan Breizman
11d2c66757 crimson/osd/pg: add record_error bool to failure_func
```
submit_error_log records the result of an IO into the pg log so that we can return
the same error code if the client resends the request.
This should only be relevant for logical errors resulting from the target object state
-- for example, EEXIST returned on an exclusive create -- because there is application
logic built to rely on them.
In classic, the only such site is if the return value from do_osd_ops is negative
(or the transaction is empty) -- see PrimaryLogPG::prepare_transaction,
specifically where we set update_log_only to true.

We do not want to record space usage errors or errors specific to conditions on the primary
OSD such as IO errors -- submit_error_log isn't a catch-all error path.
```

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2023-11-19 09:58:50 +00:00
Matan Breizman
74965cb4dd crimson/osd/pg: do_osd_ops_execute assert error type handling
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2023-11-19 09:47:29 +00:00