Looks up the shard index of the corresponding bucket, and only
buckets in the corresponding shard are considered for processing.
This has a side effect of matching buckets by id, and also adds
support for --tenant.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Permit a --bucket option to be passed to radosgw-admin lc process,
and propagate the bucket name to lifecycle processing, and process
only the named bucket if one is provided.
Fixes: https://tracker.ceph.com/issues/53430
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Allow user to specify cookie of choice at the time of map
$ rbd device attach rbd-pool/image --device /dev/nbd0 \
--cookie 6f85d970-10b2-456b-8baf-676aa4d782e4 --options try-netlink
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
It will support total_ops/total_size metrics for read and write. and
the cephfs-top tool will show total io sizes for read/write.
Fixes: https://tracker.ceph.com/issues/49811
Signed-off-by: Xiubo Li <xiubli@redhat.com>
The `ceph-volume lvm migrate/new-db/new-wal` commands don't support
running on non systemd systems or within containers.
Like other ceph-volume commands (lvm activate/batch/zap or raw activate)
we also need to be able to use the --no-systemd flag.
Fixes: https://tracker.ceph.com/issues/51854
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The man page did not make it clear that multiple objects could be
specified, nor did it describe use of "--force-full".
Info displayed about "rm" with `rados --help` was poorly formatted and
the wording was adjusted.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Currently BlueStore keeps its allocation info inside RocksDB.
BlueStore is committing all allocation information (alloc/release) into RocksDB (column-family B) before the client Write is performed causing a delay in write path and adding significant load to the CPU/Memory/Disk.
Committing all state into RocksDB allows Ceph to survive failures without losing the allocation state.
The new code skips the RocksDB updates on allocation time and instead perform a full desatge of the allocator object with all the OSD allocation state in a single step during umount().
This results with an 25% increase in IOPS and reduced latency in small random-write workloads, but exposes the system to losing allocation info in failure cases where we don't call umount.
We added code to perform a full allocation-map rebuild from information stored inside the ONode which is used in failure cases.
When we perform a graceful shutdown there is no need for recovery and we simply read the allocation-map from a flat file where the allocation-map was stored during umount() (in fact this mode is faster and shaves few seconds from boot time since reading a flat file is faster than iterating over RocksDB)
Open Issues:
There is a bug in the src/stop.sh script killing ceph without invoking umount() which means anyone using it will always invoke the recovery path.
Adam Kupczyk is fixing this issue in a separate PR.
A simple workaround is to add a call to 'killall -15 ceph-osd' before calling src/stop.sh
Fast-Shutdown and Ceph Suicide (done when the system underperforms) stop the system without a proper drain and a call to umount.
This will trigger a full recovery which can be long( 3 minutes in my testing, but your your mileage may vary).
We plan on adding a follow up PR doing the following in Fast-Shutdown and Ceph Suicide:
Block the OSD queues from accepting any new request
Delete all items in queue which we didn't start yet
Drain all in-flight tasks
call umount (and destage the allocation-map)
If drain didn't complete within a predefined time-limit (say 3 minutes) -> kill the OSD
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
create allocator from on-disk onodes and BlueFS inodes
change allocator + add stat counters + report illegal physical-extents
compare allocator after rebuild from ONodes
prevent collection from being open twice
removed FSCK repo check for null-fm
Bug-Fix: don't add BlueFS allocation to shared allocator
add configuration option to commit to No-Column-B
Only invalidate allocation file after opening rocksdb in read-write mode
fix tests not to expect failure in cases unapplicable to null-allocator
accept non-existing allocation file and don't fail the invaladtion as it could happen legally
don't commit to null-fm when db is opened in repair-mode
add a reverse mechanism from null_fm to real_fm (using RocksDB)
Using Ceph encode/decode, adding more info to header/trailer, add crc protection
Code cleanup
some changes requested by Adam (cleanup and style changes)
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
Added option -i that allows to operate as specific osd.
It reads configuration options from monitor or ceph.conf.
In addition providing configuration option not accepted by OSD or ceph-bluestore-tool is now an error.
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
so user does not have to use virtualenv python package for creating a
virtualenv, the "venv" module in Python3 would suffice.
see also https://docs.python.org/3/library/venv.html
Signed-off-by: Kefu Chai <kchai@redhat.com>
as per
https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html
> Like py:currentmodule, this directive produces no output. Instead, it
> serves to notify Sphinx that all following option directives document
> options for the program called name.
> ...
> The program name may contain spaces (in case you want to document
> subcommands like svn add and svn commit separately).
and to avoid the warnings like:
doc/man/8/ceph-volume.rst:424: WARNING: Duplicate explicit target name:
"cmdoption-ceph-volume-h".
we should specify different "program" for different set of options.
Signed-off-by: Kefu Chai <kchai@redhat.com>
ceph-deploy is not actively maintained anymore, and it was replaced by
ceph-volume and other high-level tools.
so there is no point to package its manpage anymore.
Signed-off-by: Kefu Chai <kchai@redhat.com>
I found that the difference between "rbd cp" and "rbd deep cp",
i.e. what "deep" means in this context, is documented only in
the mailing list archive and in the Mimic reelase notes.
Let's make the difference explicit in the manpage and in rbd --help.
Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
This is a wrapper over ceph-bluestore-tool's bluefs-bdev-migrate command.
Primarily intended to introduce LVM tags manipulation which
ceph-bluestore-tool is lacking.
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
When adding more metrics the top line will be too long and maybe
wrapped with serval lines, which will make it hard to read.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Add description for options --id and --client_fs to the ceph-fuse manual
and move description for -d closer to -f since both options are similar.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
This reverts commit f635555fe7, reversing
changes made to d4d3d17b23.
This PR seems to be (indirectly?) responsible for
https://tracker.ceph.com/issues/49237
Also, it was causing the rados.py task's follow-up step to wait
for snap trimming to fail: it would time out a 'ceph osd dump --format=json'
command. :/
Signed-off-by: Sage Weil <sage@newdream.net>