Add additional explanation about stretch mode as suggested by Anthony,
rework a bit the layout of argument details.
Signed-off-by: Eneko Lacunza <elacunza@binovo.es>
When we set the proxy mode to remove a writeback cache according to
the ceph official documentation an error occurred:
[root@controller-1 root]# ceph osd tier cache-mode cachepool proxy
Invalid command: proxy not in writeback|readproxy|readonly|none
osd tier cache-mode writeback|readproxy|readonly|none [--yes-i-really-mean-it]:
specify the caching mode for cache tier
According to the description of the official website document: since
a writeback cache may have modified data, you must take steps to ensure
that you do not lose any recent changes to objects in the cache before
you disable and remove it. Change the cache mode to proxy so that new and
modified objects will flush to the backing storage pool.
Fixes: https://tracker.ceph.com/issues/54576
Signed-off-by: tan changzhi <544463199@qq.com>
Description: `!` - Exclamation mark can be used to interact with local
file system apart from Ceph File System too. This PR intends
to document it.
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
Also, the sample cephfs-top image in the doc is outdated. Update that!
Fixes: http://tracker.ceph.com/issues/48619
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Adds documentation how to change default rbd object size. With the
previous option `--order` it was easy to guess the config name for the
default value, with the current option name `--object-size` thats hard
to guess.
Also extends the documentation for rbd_default_order to include
* how object-size is derived from the configured value
* allowed range of the value
In the first version of this commit I also added min and max for this
parameter (12/25, matching the object size range in `man 8
rbd`/Striping/object-size), but this made some tests fail, since some
seem to set values outside this range (and probably are fine since
included for some time already). To have this a doc-change only, I
removed the range.
Signed-off-by: Mara Sophie Grosch <littlefox@lf-net.org>
Add a new --fs argument to cephfs-shell, so we can use it to mount named
filesystems. Add a blurb to the manpage for it, and alphebetize the
command-line flags.
Fixes: https://tracker.ceph.com/issues/50235
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Looks up the shard index of the corresponding bucket, and only
buckets in the corresponding shard are considered for processing.
This has a side effect of matching buckets by id, and also adds
support for --tenant.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Permit a --bucket option to be passed to radosgw-admin lc process,
and propagate the bucket name to lifecycle processing, and process
only the named bucket if one is provided.
Fixes: https://tracker.ceph.com/issues/53430
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Allow user to specify cookie of choice at the time of map
$ rbd device attach rbd-pool/image --device /dev/nbd0 \
--cookie 6f85d970-10b2-456b-8baf-676aa4d782e4 --options try-netlink
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
It will support total_ops/total_size metrics for read and write. and
the cephfs-top tool will show total io sizes for read/write.
Fixes: https://tracker.ceph.com/issues/49811
Signed-off-by: Xiubo Li <xiubli@redhat.com>
The `ceph-volume lvm migrate/new-db/new-wal` commands don't support
running on non systemd systems or within containers.
Like other ceph-volume commands (lvm activate/batch/zap or raw activate)
we also need to be able to use the --no-systemd flag.
Fixes: https://tracker.ceph.com/issues/51854
Signed-off-by: Dimitri Savineau <dsavinea@redhat.com>
The man page did not make it clear that multiple objects could be
specified, nor did it describe use of "--force-full".
Info displayed about "rm" with `rados --help` was poorly formatted and
the wording was adjusted.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Currently BlueStore keeps its allocation info inside RocksDB.
BlueStore is committing all allocation information (alloc/release) into RocksDB (column-family B) before the client Write is performed causing a delay in write path and adding significant load to the CPU/Memory/Disk.
Committing all state into RocksDB allows Ceph to survive failures without losing the allocation state.
The new code skips the RocksDB updates on allocation time and instead perform a full desatge of the allocator object with all the OSD allocation state in a single step during umount().
This results with an 25% increase in IOPS and reduced latency in small random-write workloads, but exposes the system to losing allocation info in failure cases where we don't call umount.
We added code to perform a full allocation-map rebuild from information stored inside the ONode which is used in failure cases.
When we perform a graceful shutdown there is no need for recovery and we simply read the allocation-map from a flat file where the allocation-map was stored during umount() (in fact this mode is faster and shaves few seconds from boot time since reading a flat file is faster than iterating over RocksDB)
Open Issues:
There is a bug in the src/stop.sh script killing ceph without invoking umount() which means anyone using it will always invoke the recovery path.
Adam Kupczyk is fixing this issue in a separate PR.
A simple workaround is to add a call to 'killall -15 ceph-osd' before calling src/stop.sh
Fast-Shutdown and Ceph Suicide (done when the system underperforms) stop the system without a proper drain and a call to umount.
This will trigger a full recovery which can be long( 3 minutes in my testing, but your your mileage may vary).
We plan on adding a follow up PR doing the following in Fast-Shutdown and Ceph Suicide:
Block the OSD queues from accepting any new request
Delete all items in queue which we didn't start yet
Drain all in-flight tasks
call umount (and destage the allocation-map)
If drain didn't complete within a predefined time-limit (say 3 minutes) -> kill the OSD
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
create allocator from on-disk onodes and BlueFS inodes
change allocator + add stat counters + report illegal physical-extents
compare allocator after rebuild from ONodes
prevent collection from being open twice
removed FSCK repo check for null-fm
Bug-Fix: don't add BlueFS allocation to shared allocator
add configuration option to commit to No-Column-B
Only invalidate allocation file after opening rocksdb in read-write mode
fix tests not to expect failure in cases unapplicable to null-allocator
accept non-existing allocation file and don't fail the invaladtion as it could happen legally
don't commit to null-fm when db is opened in repair-mode
add a reverse mechanism from null_fm to real_fm (using RocksDB)
Using Ceph encode/decode, adding more info to header/trailer, add crc protection
Code cleanup
some changes requested by Adam (cleanup and style changes)
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>