Fix ceph-bluestore-tool reshard command.
Add conditions that allow to continue resharding that was previously terminated.
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Also filter out client-id's starting with "mirror" when
cleaning leftover auth-ids since teuthology would be
configured to create client.mirror and client.mirror_remote
clients before executing mirroring tests.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
there is chance that some monitor(s) is updated / upgraded in a single
monmap update without being removed from cluster state's metata first,
so, without this change, we will not update the metadata associated with
that monitor, hence the mgr modules which consumes the metadata is not
updated accordingly and keep reporting the stale information.
in this change, we always update the metadata associated with all monitor
included by the latest monmap. multiple "mon metadata" commands are sent
to monitor for retrieving their updated metadata, instead of sending a
single one, so that we can reuse "MetadataUpdate" to update the metadata
of a given daemon. as the number of monitors in a typical cluster is
relatively small, and the frequency of monmap update is low, so this
overhead should be fine.
unlike other places where we ask mon for metadata in Mgr class, the code
sending the mon command for updated monitor metata is located outside of
`cluster_state.with_monmap()` block, the reason is that `with_monmap()`
is guraded by the monc_lock under the hood, while `start_mon_command()`
also need to acquire the monc_lock, which is not a recursive lock. so we
have to do this out of the `with_monmap()` block.
Fixes: https://tracker.ceph.com/issues/48905
Signed-off-by: Kefu Chai <kchai@redhat.com>
Overriding osd_async_recovery_min_cost as part of enabling a built-in
mclock profile has the undesirable side effect of peers not choosing
the correct async recovery targets if osds are using mixed schedulers
(this could happen during upgrades or if "debug_random" is set for
osd_op_queue config option). Due to the above, osds get into a
"choose_acting" loop during peering.
The solution is to remove the override of osd_async_recovery_min_cost.
Fixes: https://tracker.ceph.com/issues/48906
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
this will provide a more detailed output, like
```yaml
...snip...
service_type: node-exporter
service_name: node-exporter
placement:
host_pattern: '*'
status:
created: '2021-01-18T11:21:56.024810Z'
last_refresh: '2021-01-18T11:23:24.477672Z'
running: 0
size: 1
events:
- "2021-01-18T11:23:09.602644Z service:node-exporter [ERROR] \"Failed while placing\
\ node-exporter.ubuntuon ubuntu: cephadm exited with an error code: 1, stderr:Deploy\
\ daemon node-exporter.ubuntu ...\nVerifying port 9100 ...\nTraceback (most recent\
\ call last):\n File \"<stdin>\", line 7274, in <module>\n File \"<stdin>\", line\
\ 1563, in _default_image\n File \"<stdin>\", line 3698, in command_deploy\n File\
\ \"<stdin>\", line 2338, in deploy_daemon\n File \"<stdin>\", line 1961, in create_daemon_dirs\n\
AssertionError\""
...snip...
```
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
pybind/mgr: disallow_untyped_defs=True for mgr_util
Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
Reviewed-by: Patrick Seidensal <pseidensal@suse.com>
Set and disable relevant recovery specific Ceph options for mclock
profiles to work as expected. Broadly,
- Set low value for recovery min cost
- High values for max active recoveries and max backfills
- Disable recovery sleep{_hdd, _ssd, _hybrid}
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Handle configuration changes to the mclock cost per io, the max
capacity options and the mclock profile. Handle the case where the
profile is changed from a built-in profile to the custom profile.
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Define config options to specify the cost per io for an osd (hdd & ssd).
- osd_mclock_cost_per_io_msec
- osd_mclock_cost_per_io_msec_hdd
- osd_mclock_cost_per_io_msec_ssd
Define config options to set max osd capacity (hdd & ssd) to be allocated
between clients of dmclock namely,
- osd_mclock_max_capacity_iops
- osd_mclock_max_capacity_iops_hdd
- osd_mclock_max_capacity_iops_ssd
Define config option "osd_mclock_profile" to specify the built-in profile
to enable.
Also, Set the number of op shards being used in the osd within the mclock
scheduler as well. This is necessary to calculate the per shard limits
within the mclock scheduler.
With the above information, enable the specified mclock profile by
calling the appropriate method to set the profile specific mclock
parameters and Ceph options.
Prior to enqueuing an op, the scheduler performs a calculation to scale
up or down the cost associated for the OpSchedulerItem. This calculation
is done based on the existing item cost, the max osd capacity provided
and an additional cost factor based on underlying device type(hdd/ssd).
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
The below ceph parameters are set automatically
while enabling the mclock scheduler with a built-in profile.
The user in this case will not be able to modify these
ceph specific config options during runtime.
a. osd_async_recovery_min_cost
b. osd_recovery_max_active{_hdd,_ssd}
c. osd_max_backfills
d. osd_recovery_sleep{_hdd,_ssd,_hybrid}
If the custom profile is enabled for the mclock scheduler,
the user can modify these parameters.
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* refs/pull/37297/head:
qa: add new client tests
test: add client tests
client: wire up alternate_name
mds: fix alternate_name durability
mds: add alternate_name feature support for dentries
mds: add static encode/decode helpers for remote in CDentry
mds: add "fscrypt" flag support for inode_t
mds: add new CDentry tags support
include: cleanup filepath cons
mds: add comment on linkage durability
client: constify some RWRef methods
common: add strescape helper
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
https://github.com/ceph/ceph/pull/30027 introduced a gap in osr
protection (in _deferred_queue()) which could cause improper deferred_pending value while
processing osr from _deferred_aio_finish().
As a result both segmentation fault in _deferred_aio_finish() or deadlock could occur.
Fixes: https://tracker.ceph.com/issues/48776
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
crimson/osd: make PGRecovery::start_primary_recovery_ops take objs recovered by UrgentRecovery into account
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Here we're exposing a public Client::walk (aka path_walk) so that the
user can inspect dentries (not something normally possible in POSIX).
We're going to skip exposing such an interface in libcephfs since
there's no reason to do that (who would use it?) except for testing.
Instead, a follow-up PR will add Client tests (for the first time, yay!)
that will exercise this code.
Also, ideally we'd also expose alternate_name via readdir results but
that is a bit more complicated since dirents do not normally refer to
external memory. So, just rely on Client::walk for testing for now.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This is a collection of fixes to Xiubo's prior work. Namely:
- Add new mds_alternate_name_max option to limit the size of
alternate_name. Otherwise a Client could trick the MDS into creating
an alternate_name of any size!
- Clean up how alternate_name is assigned to CDentry. In the general
case, this should be assigned as part of creating the dentry. We want
this value to be immutable for the life of the dentry. Even for the
very special case of rename(2) where the destination dentry already
exists. We explicitly check (after discussion with Jeff) that the
target dentry alternate_name already matches what the rename RPC is
giving.
- The MDS is now properly journaling the alternate_name.
- The MDS rejoin phase is properly transmitting each dentry's
alternate_name. I've discovered that this MMDSCacheRejoin message
actually wasn't versioned which I've raised in a tracker [1]. In the
mean time, we'll just bump CEPH_MDS_PROTOCOL as usual.
[1] https://tracker.ceph.com/issues/48886
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This will support the "alternate_name" filename support, and will save
an alternate name for each dentry. This alternate name is not used in
path lookup or for any other usual file system purpose. The name is
simply an added blob of metadata on the dentry that is distributed to
clients so that "long" file names may be supported for clients which
require them. In the case of an fscrypt enhanced kernel mount driver,
the long name may be the cyphertext (exceeding FILENAME_MAX) of a long
file name.
Because this affects only files with long file names, the use of this
feature should be rare but could be common for some unusual
applications.
The client mount should check the CEPHFS_FEATURE_ALTERNATE_NAME feature
bit first to check whether the MDS has support for this feature or not.
The alternate_name is transmitted as part of the message payload in
MClientRequest when setting the alternate_name. The LeaseStat structure
in MClientReply contains the alternate_name.
When executing a metadata mutation RPC, the client will set the
alternate_name (if it exists) as part of the operation. The MDS will
pick that up and set it on the new or mutated dentry.
Fixes: https://tracker.ceph.com/issues/47162
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
If the client has set the "encryption.ctx" attribute, the MDS side
will set the "fscrypt" flag to truth, false as default.
The clients could use this flag to get to know whether the current
inodes are under enscrypted or not.
Fixes: https://tracker.ceph.com/issues/47162
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This will add new tag 'i' for inode and new tag 'l' for remote link
for the CDentry. And at the same time will add one proper dentry
version, which will be helpful to add new features/members in future
for the CDentry.
Fixes: https://tracker.ceph.com/issues/47162
Signed-off-by: Xiubo Li <xiubli@redhat.com>