Commit Graph

16 Commits

Author SHA1 Message Date
Dimitri Papadopoulos
7677651618
doc,man: typos found by codespell
Signed-off-by: Dimitri Papadopoulos <3234522+DimitriPapadopoulos@users.noreply.github.com>
2021-12-15 12:04:36 +01:00
Gabriel BenHanokh
272160ab5e [BlueStore]: [Remove Allocations from RocksDB]
Currently BlueStore keeps its allocation info inside RocksDB.
BlueStore is committing all allocation information (alloc/release) into RocksDB (column-family B) before the client Write is performed causing a delay in write path and adding significant load to the CPU/Memory/Disk.
Committing all state into RocksDB allows Ceph to survive failures without losing the allocation state.

The new code skips the RocksDB updates on allocation time and instead perform a full desatge of the allocator object with all the OSD allocation state in a single step during umount().
This results with an 25% increase in IOPS and reduced latency in small random-write workloads, but exposes the system to losing allocation info in failure cases where we don't call umount.
We added code to perform a full allocation-map rebuild from information stored inside the ONode which is used in failure cases.
When we perform a graceful shutdown there is no need for recovery and we simply read the allocation-map from a flat file where the allocation-map was stored during umount() (in fact this mode is faster and shaves few seconds from boot time since reading a flat file is faster than iterating over RocksDB)

Open Issues:

There is a bug in the src/stop.sh script killing ceph without invoking umount() which means anyone using it will always invoke the recovery path.
Adam Kupczyk is fixing this issue in a separate PR.
A simple workaround is to add a call to 'killall -15 ceph-osd' before calling src/stop.sh

Fast-Shutdown and Ceph Suicide (done when the system underperforms) stop the system without a proper drain and a call to umount.
This will trigger a full recovery which can be long( 3 minutes in my testing, but your your mileage may vary).
We plan on adding a follow up PR doing the following in Fast-Shutdown and Ceph Suicide:

Block the OSD queues from accepting any new request
Delete all items in queue which we didn't start yet
Drain all in-flight tasks
call umount (and destage the allocation-map)
If drain didn't complete within a predefined time-limit (say 3 minutes) -> kill the OSD
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>

create allocator from on-disk onodes and BlueFS inodes
change allocator + add stat counters + report illegal physical-extents
compare allocator after rebuild from ONodes
prevent collection from being open twice
removed FSCK repo check for null-fm
Bug-Fix: don't add BlueFS allocation to shared allocator
add configuration option to commit to No-Column-B
Only invalidate allocation file after opening rocksdb in read-write mode
fix tests not to expect failure in cases unapplicable to null-allocator
accept non-existing allocation file and don't fail the invaladtion as it could happen legally
don't commit to null-fm when db is opened in repair-mode
add a reverse mechanism from null_fm to real_fm (using RocksDB)
Using Ceph encode/decode, adding more info to header/trailer, add crc protection
Code cleanup

some changes requested by Adam (cleanup and style changes)

Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
2021-08-11 16:53:09 +03:00
Adam Kupczyk
46603dab36 tools/ceph-bluestore-tool: Enable configuration options from monitor/ceph.conf
Added option -i that allows to operate as specific osd.
It reads configuration options from monitor or ceph.conf.
In addition providing configuration option not accepted by OSD or ceph-bluestore-tool is now an error.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
2021-08-05 11:51:37 +02:00
Kefu Chai
5757c69b06 doc/man: replace http://ceph.com/docs with https://docs.ceph.com
the former brings us to a 404 page

Signed-off-by: Kefu Chai <kchai@redhat.com>
2021-03-22 01:41:53 +08:00
Adam Kupczyk
882714e0c9 tools/bluestore: Add command 'show-sharding' to ceph-bluestore-tool
Add command 'show-sharding' to ceph-bluestore-tool.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
2021-01-19 15:07:16 +01:00
Ponnuvel Palaniyappan
d99d520493 doc: Fixed a number of typos in documentation
Signed-off-by: Ponnuvel Palaniyappan <pponnuvel@gmail.com>
2020-09-18 18:17:15 +01:00
Adam Kupczyk
d7a49b0005 os/bluestore: Add documentation for large bluefs log recovery
Adds additional paragraph to ceph-bluestore-tool documentation,
describing how to use *special* options --bluefs_replay_recovery
and --bluefs_replay_recovery_disable_compact to recover large
bluefs log.

Fixes: https://tracker.ceph.com/issues/46552
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
2020-07-15 19:19:35 +02:00
Adam Kupczyk
38ac896211 kv/RocksDBStore: Added resharding control
Added possibility to control batch size and iterator refresh time for resharding process.
Replaced getenv() with new control for resharding unittests.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
2020-05-14 22:43:56 +02:00
Kefu Chai
c3a914a8c6 doc/man: improve bluefs-bdev-expand option
Signed-off-by: Kefu Chai <kchai@redhat.com>
2020-01-10 20:16:01 +08:00
Adam Kupczyk
e7f5e53cde tools/ceph-bluestore-tool: add commands free-dump and free-score
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
2019-08-02 13:57:06 +02:00
Igor Fedotov
f5c12ee63f doc/ceph-bluestore-tool: add help for migrate and new DB/WAL commands.
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
2018-10-25 12:31:58 +03:00
Nathan Cutler
7c9229e787 doc: use :command: for subcommands in ceph-bluestore-tool manpage
Older versions of Sphinx, such as the one in CentOS 7, do not render "..
option::" lines correctly if the option contains a hyphen but does not start
with a hyphen. And ceph-bluestore-tool appears to be the only Ceph manpage
affected by this bug.

Fixes: http://tracker.ceph.com/issues/24800
Signed-off-by: Nathan Cutler <ncutler@suse.com>
2018-07-19 14:08:00 +02:00
Shengjing Zhu
2cbba835aa misc: fix various spelling errors
Signed-off-by: Shengjing Zhu <i@zhsj.me>
2018-03-10 23:39:20 +08:00
Xiaojun Liao
2b0afa7762 doc: remove duplicate line from ceph-authtool man page
Signed-off-by: Xiaojun Liao <xiaojunliao85@gmail.com>
2017-11-27 15:49:27 +08:00
Yao Zongyou
d418a04e9f ceph-bluestore-tool: the right action is prime-osd-dir not prime-osd-dev
Signed-off-by: Yao Zongyou <yaozongyou@vip.qq.com>
2017-10-28 18:22:27 +08:00
Sage Weil
7b91e50dbd doc/man/8/ceph-bluestore-tool: add man page
Signed-off-by: Sage Weil <sage@redhat.com>
2017-10-16 14:29:10 -05:00