Adds documentation how to change default rbd object size. With the
previous option `--order` it was easy to guess the config name for the
default value, with the current option name `--object-size` thats hard
to guess.
Also extends the documentation for rbd_default_order to include
* how object-size is derived from the configured value
* allowed range of the value
In the first version of this commit I also added min and max for this
parameter (12/25, matching the object size range in `man 8
rbd`/Striping/object-size), but this made some tests fail, since some
seem to set values outside this range (and probably are fine since
included for some time already). To have this a doc-change only, I
removed the range.
Signed-off-by: Mara Sophie Grosch <littlefox@lf-net.org>
Add a new --fs argument to cephfs-shell, so we can use it to mount named
filesystems. Add a blurb to the manpage for it, and alphebetize the
command-line flags.
Fixes: https://tracker.ceph.com/issues/50235
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Looks up the shard index of the corresponding bucket, and only
buckets in the corresponding shard are considered for processing.
This has a side effect of matching buckets by id, and also adds
support for --tenant.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Permit a --bucket option to be passed to radosgw-admin lc process,
and propagate the bucket name to lifecycle processing, and process
only the named bucket if one is provided.
Fixes: https://tracker.ceph.com/issues/53430
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Allow user to specify cookie of choice at the time of map
$ rbd device attach rbd-pool/image --device /dev/nbd0 \
--cookie 6f85d970-10b2-456b-8baf-676aa4d782e4 --options try-netlink
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
It will support total_ops/total_size metrics for read and write. and
the cephfs-top tool will show total io sizes for read/write.
Fixes: https://tracker.ceph.com/issues/49811
Signed-off-by: Xiubo Li <xiubli@redhat.com>
The `ceph-volume lvm migrate/new-db/new-wal` commands don't support
running on non systemd systems or within containers.
Like other ceph-volume commands (lvm activate/batch/zap or raw activate)
we also need to be able to use the --no-systemd flag.
Fixes: https://tracker.ceph.com/issues/51854
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The man page did not make it clear that multiple objects could be
specified, nor did it describe use of "--force-full".
Info displayed about "rm" with `rados --help` was poorly formatted and
the wording was adjusted.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Currently BlueStore keeps its allocation info inside RocksDB.
BlueStore is committing all allocation information (alloc/release) into RocksDB (column-family B) before the client Write is performed causing a delay in write path and adding significant load to the CPU/Memory/Disk.
Committing all state into RocksDB allows Ceph to survive failures without losing the allocation state.
The new code skips the RocksDB updates on allocation time and instead perform a full desatge of the allocator object with all the OSD allocation state in a single step during umount().
This results with an 25% increase in IOPS and reduced latency in small random-write workloads, but exposes the system to losing allocation info in failure cases where we don't call umount.
We added code to perform a full allocation-map rebuild from information stored inside the ONode which is used in failure cases.
When we perform a graceful shutdown there is no need for recovery and we simply read the allocation-map from a flat file where the allocation-map was stored during umount() (in fact this mode is faster and shaves few seconds from boot time since reading a flat file is faster than iterating over RocksDB)
Open Issues:
There is a bug in the src/stop.sh script killing ceph without invoking umount() which means anyone using it will always invoke the recovery path.
Adam Kupczyk is fixing this issue in a separate PR.
A simple workaround is to add a call to 'killall -15 ceph-osd' before calling src/stop.sh
Fast-Shutdown and Ceph Suicide (done when the system underperforms) stop the system without a proper drain and a call to umount.
This will trigger a full recovery which can be long( 3 minutes in my testing, but your your mileage may vary).
We plan on adding a follow up PR doing the following in Fast-Shutdown and Ceph Suicide:
Block the OSD queues from accepting any new request
Delete all items in queue which we didn't start yet
Drain all in-flight tasks
call umount (and destage the allocation-map)
If drain didn't complete within a predefined time-limit (say 3 minutes) -> kill the OSD
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
create allocator from on-disk onodes and BlueFS inodes
change allocator + add stat counters + report illegal physical-extents
compare allocator after rebuild from ONodes
prevent collection from being open twice
removed FSCK repo check for null-fm
Bug-Fix: don't add BlueFS allocation to shared allocator
add configuration option to commit to No-Column-B
Only invalidate allocation file after opening rocksdb in read-write mode
fix tests not to expect failure in cases unapplicable to null-allocator
accept non-existing allocation file and don't fail the invaladtion as it could happen legally
don't commit to null-fm when db is opened in repair-mode
add a reverse mechanism from null_fm to real_fm (using RocksDB)
Using Ceph encode/decode, adding more info to header/trailer, add crc protection
Code cleanup
some changes requested by Adam (cleanup and style changes)
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
Added option -i that allows to operate as specific osd.
It reads configuration options from monitor or ceph.conf.
In addition providing configuration option not accepted by OSD or ceph-bluestore-tool is now an error.
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
so user does not have to use virtualenv python package for creating a
virtualenv, the "venv" module in Python3 would suffice.
see also https://docs.python.org/3/library/venv.html
Signed-off-by: Kefu Chai <kchai@redhat.com>
as per
https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html
> Like py:currentmodule, this directive produces no output. Instead, it
> serves to notify Sphinx that all following option directives document
> options for the program called name.
> ...
> The program name may contain spaces (in case you want to document
> subcommands like svn add and svn commit separately).
and to avoid the warnings like:
doc/man/8/ceph-volume.rst:424: WARNING: Duplicate explicit target name:
"cmdoption-ceph-volume-h".
we should specify different "program" for different set of options.
Signed-off-by: Kefu Chai <kchai@redhat.com>
ceph-deploy is not actively maintained anymore, and it was replaced by
ceph-volume and other high-level tools.
so there is no point to package its manpage anymore.
Signed-off-by: Kefu Chai <kchai@redhat.com>
I found that the difference between "rbd cp" and "rbd deep cp",
i.e. what "deep" means in this context, is documented only in
the mailing list archive and in the Mimic reelase notes.
Let's make the difference explicit in the manpage and in rbd --help.
Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
This is a wrapper over ceph-bluestore-tool's bluefs-bdev-migrate command.
Primarily intended to introduce LVM tags manipulation which
ceph-bluestore-tool is lacking.
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
When adding more metrics the top line will be too long and maybe
wrapped with serval lines, which will make it hard to read.
Signed-off-by: Xiubo Li <xiubli@redhat.com>