Commit Graph

132184 Commits

Author SHA1 Message Date
Ernesto Puerta
bf5c7ff65e
Merge pull request #46433 from rhcs-dashboard/rbd-mirroring-replay
mgr/dashboard: move replaying images to Syncing tab

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: nSedrickm <NOT@FOUND>
2022-06-17 17:33:49 +02:00
Josh Durgin
7af94d1160
Merge pull request #39980 from badone/wip-ceph_test_lazy_omap_stats-improve-scrubbing-calls-2
test/lazy-omap-stats: Various enhancements

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2022-06-17 07:53:40 -07:00
Ilya Dryomov
95a0ec7b42 mgr/rbd_support: avoid losing a schedule on load vs add race
If load_schedules() (i.e. periodic refresh) races with add_schedule()
invoked by the user for a fresh image, that image's schedule may get
lost until the next rebuild (not refresh!) of the queue:

1. periodic refresh invokes load_schedules()
2. load_schedules() creates a new Schedules instance and loads
   schedules from rbd_mirror_snapshot_schedule object
3. add_schedule() is invoked for a new image (an image that isn't
   present in self.images) by the user
4. before load_schedules() can grab self.lock, add_schedule() commits
   the new schedule to rbd_mirror_snapshot_schedule object and adds it
   to self.schedules
5. load_schedules() grabs self.lock and reassigns self.schedules with
   Schedules instance that is now stale
6. periodic refresh invokes load_pool_images() which discovers the new
   image; eventually it is added to self.images
7. periodic refresh invokes refresh_queue() which attempts to enqueue()
   the new image; this fails because a matching schedule isn't present

The next periodic refresh recovers the discarded schedule from
rbd_mirror_snapshot_schedule object but no attempt to enqueue() that
image is made since it is already "known" at that point.  Despite the
schedule being in place, no snapshots are created until the queue is
rebuilt from scratch or rbd_support module is reloaded.

To fix that, extend self.lock critical sections so that add_schedule()
and remove_schedule() can't get stepped on by load_schedules().

Fixes: https://tracker.ceph.com/issues/56090
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-06-17 16:20:26 +02:00
Ilya Dryomov
ef3edd399a mgr/rbd_support: refresh schedule queue immediately after delay elapses
The existing logic often leads to refresh_pools() and refresh_images()
being invoked after a 120 second delay instead of after an intended 60
second delay.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-06-17 16:20:26 +02:00
Ilya Dryomov
7d1e644b62 mgr/rbd_support: bail from refresh_pools() when there is no schedule
Make refresh_pools() behave the same as refresh_images().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-06-17 16:20:26 +02:00
Ilya Dryomov
568345b475 mgr/rbd_support: add logs for when there is no schedule and for descheduling
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-06-17 16:20:26 +02:00
Ilya Dryomov
bd4af8201c mgr/rbd_support: disambiguate mirror snapshot and trash purge schedule logs
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-06-17 16:20:26 +02:00
Sarthak0702
2765e90333 mgr/dashboard: rbd force resync from fornt-end
Signed-off-by: Sarthak0702 <sarthak.dev.0702@gmail.com>
2022-06-17 19:07:25 +05:30
Sarthak0702
87230dbe42 mgr/dashboard: rbd force resync from fornt-end
Signed-off-by: Sarthak0702 <sarthak.dev.0702@gmail.com>
2022-06-17 19:05:49 +05:30
Radosław Zarzyński
a6fcb1ccbc msg: fix deadlock when handling existing but closed v2 connection
The deadlock is illustrated best by the following snippet
provided by jianwei zhang who also made the problem analysis
(many thanks!).

```
thread-35
AsyncMessenger::shutdown_connections         hold             AsyncMessenger::lock            std::lock_guard l{lock}
AsyncConnection::stop                         wait                AsyncConnection::lock            lock.lock()

thread-3
ProtocolV2::handle_existing_connection        hold                AsyncConnection::lock            std::lock_guard<std::mutex> l(existing->lock)
AsyncMessenger::accept_conn                wait                AsyncMessenger::lock            std::lock_guard l{lock}
```

Fixes: https://tracker.ceph.com/issues/55355
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
2022-06-17 14:27:50 +02:00
Guillaume Abrioux
ca1547c3d8
Merge pull request #46698 from guits/cv-hide-luks-key-in-log
ceph-volume: do not log sensitive details
2022-06-17 14:03:48 +02:00
Redouane Kachach
49b234ae74
doc/cephadm: Add post-upgrade section
Fixes: https://tracker.ceph.com/issues/54474

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
2022-06-17 11:54:12 +02:00
Cory Snyder
a46a61fdf7 osd/scrubber/pg_scrubber.cc: fix bug where scrub machine gets stuck
Fixes a scenario where the scrub machine gets stuck if starting
a deep scrub while the noscrub flag is set. It was dropping a
scrub reschedule op, without clearing scrub state, leaving the FSM
stuck in ActiveScrubbing,PendingTimer state.

Fixes: https://tracker.ceph.com/issues/54172
Signed-off-by: Cory Snyder <csnyder@iland.com>
2022-06-17 04:29:06 -04:00
Nizamudeen A
4eab00efaa mgr/dashboard: Error page cleanup
Some error page cleanups

Signed-off-by: Nizamudeen A <nia@redhat.com>
2022-06-17 13:15:43 +05:30
Nizamudeen A
fab6f37052 mgr/dashboard: configure rbd mirroring
One-click button in the case of an orch cluster for configuring the
rbd-mirroring when its not properly setup. This button will create an
rbd-mirror service and also an rbd labelled pool(replicated: size-3) (if they are not
existing)

Fixes: https://tracker.ceph.com/issues/55646
Signed-off-by: Nizamudeen A <nia@redhat.com>
2022-06-17 13:14:53 +05:30
Josh Durgin
4e94f27cf5
Merge pull request #45284 from tobias-urdin/doc-memory-profiling-valgrind-massif
doc: Add alternative memory profiling to doc

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2022-06-16 15:17:53 -07:00
zdover23
9a85f9567a
Merge pull request #46712 from zdover23/wip-doc-2022-06-15--master-to-main-dev-guide-basic-workflow-title
doc/dev: s/master/main/ in title

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2022-06-17 08:09:54 +10:00
Yuri Weinstein
125bce1697
Merge pull request #45690 from sleepinging/fix-fio-windows-crash
test/fio/fio_ceph_messenger: fix str_to_ptr() crash at windows

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2022-06-16 14:24:49 -07:00
Guillaume Abrioux
4b9cc6b303 ceph-volume: do not print the secret of osd keyring
during osd preparation, ceph-volume logs the secret of the osd keyring to file
```
[2022-06-15 12:31:17,466][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-0/keyring --create-keyring --name osd.0 --add-key AQAM0aliR5JvDRAAQBu0stWl9ZhZrcjijg2BIQ==
[2022-06-15 12:31:17,481][ceph_volume.process][INFO  ] stdout creating /var/lib/ceph/osd/ceph-0/keyring
added entity osd.0 auth(key=AQAM0aliR5JvDRAAQBu0stWl9ZhZrcjijg2BIQ==)
```

This shouldn't be logged nor printed on terminal.

Fixes: https://tracker.ceph.com/issues/56071

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-06-16 22:59:47 +02:00
Guillaume Abrioux
0d97a93faa ceph-volume: do not print luks key encryption
During osd activation, ceph-volume logs the luks key to its log file.

```
[2022-06-15 12:50:35,180][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph --cluster ceph --name client.osd-lockbox.51d0770d-403d-4f81-93e6-e99f627f246c --keyring /var/lib/ceph/osd/ceph-0/lockbox.keyring config-key get dm-crypt/osd/51d0770d-403d-4f81-93e6-e99f627f246c/luks
[2022-06-15 12:50:35,522][ceph_volume.process][INFO  ] stdout ut9NjMK6YtMh1BLMJZ/mE2A7zTNyrp9pW1kHV8F2ipfz1BIX9MkEWhdYB2Azm1JPZ1d7ahIjBMUbrC/Iqqr2jQhP3MIsDzUYj1enw+sw7LeVvGPf0qNUdKmEGu5tUmvtQ+5pbk4T/9PF36kT6vCHKfNML/3fL6nnY8FDySrI4LY=
[2022-06-15 12:50:35,522][ceph_volume.process][INFO  ] Running command: /usr/sbin/cryptsetup --key-size 512 --key-file - --allow-discards luksOpen /dev/ceph-83c307d3-710b-4197-8ecd-0484e17395e3/osd-block-51d0770d-403d-4f81-93e6-e99f627f246c a9HhDO-MiYD-DtYm-SKJf-nO1d-5O3u-FmcCrd
```

Fixes: https://tracker.ceph.com/issues/56066

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
2022-06-16 22:59:47 +02:00
Ronen Friedman
c344665ee2 scrub/osd: add clearer reminders that a scrub is blocked
Whenever a scrub session is waiting for an excessive length
of time for a locked object to be unlocked, the total
number of concurrent scrubs in the system is reduced.

The existing cluster warning issued on such occurrences is
easily overlooked. Here we add a constant reminder each time
the OSD tries to schedule scrubs.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
2022-06-16 12:29:28 +00:00
Ernesto Puerta
42a0ef838d
Merge pull request #45926 from ceph/dependabot-npm_and_yarn-src-pybind-mgr-dashboard-frontend-moment-2.29.3
mgr/dashboard: bump moment from 2.29.1 to 2.29.3 in /src/pybind/mgr/dashboard/frontend

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
2022-06-16 13:19:08 +02:00
Nizamudeen A
0695b6c67e mgr/dashboard: improve edit site name action in rbd-mirroring
Fixes: https://tracker.ceph.com/issues/55896
Signed-off-by: Nizamudeen A <nia@redhat.com>
2022-06-16 15:03:58 +05:30
Zac Dover
f5fd158bea doc/dev: s/master/main/ in title
This changes "master" to "main" in a title. If we lived in an
ideal world, this would have been a part of PR#46678.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
2022-06-16 15:57:16 +10:00
zdover23
87e27c89f5
Merge pull request #46705 from zdover23/wip-doc-2022-06-15--master-to-main-dev-guide-merging
doc/dev_guide: s/master/main in merging.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2022-06-16 11:50:31 +10:00
zdover23
5122cf02a2
Merge pull request #46678 from zdover23/wip-doc-2022-06-14-dev-guide-basic-workflow-master-to-main
doc/dev: s/master/main/ in basic workflow

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
2022-06-16 11:13:05 +10:00
Adam C. Emerson
31252a7818 rgw: radosgw-admin includes current time in most status commands
Support folk have asked if we can have a timestamp on the output of
multisite status commands so they can see at a glance how they relate
to other events and changes.

As such, we now have a status command added to any outputs where it
doesn't disrupt things. In practice this means anything whose output
isn't a single JSON array.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
2022-06-15 19:39:59 -04:00
Anthony D'Atri
3770618384
Merge pull request #46632 from anthonyeleven/anthonyeleven/osd-activate-typos
src/ceph-volume/ceph_volume/activate: Improve usage message text
2022-06-15 15:58:13 -07:00
Zac Dover
52da71f0ab doc/dev_guide: s/master/main in merging.rst
This changes the branch name "master" to the branch name
"main" in merging.rst.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
2022-06-16 07:54:31 +10:00
Laura Flores
f108f7de74
Merge pull request #44861 from Matan-B/wip-matanb-doc-gdb
doc/dev: Debuggging with gdb
2022-06-15 16:14:30 -05:00
Zack Cerza
d9e7c1b797
Merge pull request #46582 from ceph/rhel86
qa: Default to RHEL8.6 instead of 8.5
2022-06-15 15:05:55 -06:00
Yuri Weinstein
3c0ff7fb68
Merge pull request #46029 from kamoltat/wip-ksirivad-fix-notify-rank-removed
mon/Elector: notify_rank_removed erase rank from both live_pinging and dead_pinging sets for highest ranked MON

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2022-06-15 13:27:13 -07:00
Yuri Weinstein
1ced6813ab
Merge pull request #45858 from ganeshmaharaj/ganeshma/gcc-12-libcephsqlite
libcephsqlite: ceph-mgr crashes when compiled with gcc12

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
2022-06-15 13:26:45 -07:00
Yuri Weinstein
cac4b1256d
Merge pull request #45750 from m-ildefons/boost-b2-jobs
cmake: configure boost build with concurrent jobs

Reviewed-by: Kefu Chai <kchai@redhat.com>
2022-06-15 13:25:56 -07:00
David Galloway
b8c4488c55 qa: Default to RHEL8.6 instead of 8.5
Signed-off-by: David Galloway <dgallowa@redhat.com>
2022-06-15 14:13:35 -04:00
Adam King
beabb1fa11
Merge pull request #46364 from rkachach/fix_issue_54581
mgr/nfs: validate virtual_ip parameter

Reviewed-by: John Mulligan <jmulligan@redhat.com>
2022-06-15 13:47:28 -04:00
Matan Breizman
682b806efa doc/dev: Debuggging with gdb
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2022-06-15 17:46:37 +00:00
Zack Cerza
33a73e6f2d
Merge pull request #46648 from zmc/cephadm-sysctl-noop 2022-06-15 10:55:14 -06:00
Kefu Chai
8d3ec1f14e
Merge pull request #46664 from tchaikov/wip-doc-gantt
doc: render release with mermaid gantt

Reviewed-by: Laura Flores <lflores@redhat.com>
2022-06-15 22:25:16 +08:00
Kefu Chai
4c90ad9804
Merge pull request #46669 from tchaikov/wip-crimson-deferred
crimson/osd: use seastar::deferred_stop()

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
2022-06-15 22:24:36 +08:00
Neeraj Pratap Singh
4b26559858 qa: TestMDSMetrics.test_delayed_metrics failure
TestMDSMetrics.test_delayed_metrics is failing due to
the absence of omit_sudo parameter in the remote.run()
of set_inter_mds_block() in qa/tasks/cephfs/filesystem.py

Fixes: https://tracker.ceph.com/issues/56065
Signed-off-by: Neeraj Pratap Singh <neesingh@redhat.com>
2022-06-15 19:21:53 +05:30
Venky Shankar
228f0f48e7
Merge pull request #45614 from lxbsz/wip-54653
ceph-fuse: add dedicated snap stag map for each directory

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Reviewed-by: Neeraj Pratap Singh <neesingh@redhat.com>
2022-06-15 12:33:06 +05:30
Kefu Chai
f481f86aa3 doc: render release with mermaid gantt
for better readability

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
2022-06-15 00:38:22 -04:00
Kefu Chai
d61bd5af65 admin/doc-requirements: bump sphinx to 4.5.0
also pin sphinx-autodoc-typehints to 1.18.3

to address following error:

ERROR: sphinx-autodoc-typehints 1.18.3 has requirement Sphinx>=4.5, but you'll have sphinx 4.4.0 which is incompatible.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
2022-06-15 00:38:22 -04:00
Xiubo Li
7e4424a821 ceph-fuse: add dedicated snap stag map for each directory
This will fix the fino colliding bug, which is caused when the
snapid is later than 0xffff.

From mds 'mds_max_snaps_per_dir' option, we can see that the max
snapshots for each directory is 4_K, and in ceph-fuse we have
around 64_K, which is from 0xffff - 2, stags could be used to make
the fake fuse inode numbers for each directory.

Fixes: https://tracker.ceph.com/issues/54653
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2022-06-15 10:29:38 +08:00
Xiubo Li
a6e83d8dec ceph-fuse: return EINVAL if get invalid fino instead of assert
All the snap ids of the finos returned to libfuse from libcephfs
will be recorded in the map of 'stag_snap_map', and will never be
erased before unmounting. So if libfuse passes invalid fino the
ceph-fuse should return EINVAL errno instead of crash itself.

Fixes: https://tracker.ceph.com/issues/54653
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2022-06-15 10:29:38 +08:00
Xiubo Li
2349083a9d ceph-fuse: reserve stag number 1 for snapdirs
There have two stags will be reserved, 0 for CEPH_NOSNAP and 1 for
CPEH_SNAPDIR.

This will always make sure that for the nonsnap and snapdir inode
numbers to be consistent for all the ceph-fuse mounts.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
2022-06-15 10:29:38 +08:00
Xiubo Li
28d17ff81a mds-client: make the fake inos option unchangeable in runtime
If the flags is empty then in option.h in can_update_at_runtime()
it will return true. That means this opetion could be changed in
runtime, which is buggy. Because if this is false, ceph-fuse will
use its own fake inos instead of libcephfs'. If this is changed
during runtime, we will hit inos dosn't exist assert bugs.

Fixes: https://tracker.ceph.com/issues/54653
Signed-off-by: Xiubo Li <xiubli@redhat.com>
2022-06-15 10:29:38 +08:00
Anthony D'Atri
bcc078c8bf
Merge pull request #46629 from anthonyeleven/anthonyeleven/46195_formatting
doc/man/8: Tweak formatting and wording in ceph.rst
2022-06-14 18:03:05 -07:00
Zac Dover
f5cfc22445 doc/dev: s/master/main/ in basic workflow
This PR changes "master" to "main" in the
basic_workflow.rst file. I have even changed
"master" to "main" in some terminal output from
several years ago. This isn't historically ac-
curate, of course, but my hope is that this change
will prevent someone in the future from being con-
fused about why an antiquated branch name is ref-
erred to.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
2022-06-15 08:15:33 +10:00