Now that we send these to the cluster log, we must
whitelist them in the tests that exercise those
unhealthy states.
Fixes: http://tracker.ceph.com/issues/19551
Signed-off-by: John Spray <john.spray@redhat.com>
there could be some pg(s) still being created when we are upgrading to
luminous, and the pools holding them are not changed in the sense of
pg_pool_t::last_change after the upgrade and before we scan for
creating pgs. in that case, the existing update_pending_creatings()
will fail to collect the pgs being created before the upgrade.
with this change, the creating_pgs in pgmap are also used for updating
the OSDMonitor's creating_pgs if it's updated.
but we should stopupdating the pgmap once the upgrade completes. i.e.
stop dispatching MSG_PGSTATS messages to PGMonitor if the quorum and all
osds are luminous.
Fixes: http://tracker.ceph.com/issues/19584
Signed-off-by: Kefu Chai <kchai@redhat.com>
Some of the finisher contexts would try to call into Objecter.
We mostly are protected from this by mds_lock+the stopping
flag, but at the Filer level there's no mds_lock, so in the
case of file size probing we have a problem.
Fixes: http://tracker.ceph.com/issues/19204
Signed-off-by: John Spray <john.spray@redhat.com>
This will just be whatever path we were looking
at at the point that damage was notified -- no
intention whatsoever of providing any up to date
path or resolution when there are multiple paths
to an inode.
Fixes: http://tracker.ceph.com/issues/18509
Signed-off-by: John Spray <john.spray@redhat.com>
Use this to get a nice human readable name
when available (also including the session id in
parentheses)
Signed-off-by: John Spray <john.spray@redhat.com>
The overhead of the whitespace is trivial and
makes the output somewhat human readable. Previously
I was always taking `damage ls` into a file and
parsing it out with python.
Signed-off-by: John Spray <john.spray@redhat.com>
We get ENOENT when a pool doesn't exist. This can
happen because we don't prevent people deleting
former cephfs data pools whose files may not have
had their metadata flushed yet.
http://tracker.ceph.com/issues/19401
Signed-off-by: John Spray <john.spray@redhat.com>
Added '--cluster' to all necessary commands
ex: radosgw-admin, rados, ceph, made sure
necessary checks were in place so that clients
can be read with our without a cluster_name
preceeding them
Made master_client defined in the config for
radosgw-admin task
Signed-off-by: Ali Maredia <amaredia@redhat.com>
use unique_ptr to manage the lifecycle of MgrPyModule and ServeThread,
it's easier and safer. without this chance, we don't free allocated
MgrPyModule if it fails to load().
Fixes: http://tracker.ceph.com/issues/19590
Signed-off-by: Kefu Chai <kchai@redhat.com>
If a readdir expire event turns out to be older than last_readdir,
just reschedule it (but actually, we should just discard it, as
another expire event must be in queue.
Fixes: http://tracker.ceph.com/issues/19625
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Previously, when we got a beacon that updated the health
metrics for an MDS, the user would just see mysterious-looking
cluster log messages indicating a rising fsmap epoch number.
It would be good to do this for health messages in general at
some point, but for now just do it for the MDS ones.
Fixes: http://tracker.ceph.com/issues/19551
Signed-off-by: John Spray <john.spray@redhat.com>
Were previously only tearing MgrClient down when not
holding a rank, leading to it trying to continue
to run after monclient was shut down.
Fixes: http://tracker.ceph.com/issues/19566
Signed-off-by: John Spray <john.spray@redhat.com>
Adjust readdir callback path for new nfs-ganesha chunked readdir,
including changes to respect the result of callback to not
continue.
Pending introduction of offset name hint, our caller will just be
completely enumerating, so it is possible to remove the offset map
and just keep a last offset.
Fixes: http://tracker.ceph.com/issues/19624
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
The new type hints optimize object type deduction, when the
rgw_lookup is called from an rgw_readdir callback.
Fixes: http://tracker.ceph.com/issues/19623
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Thanks to previous patch [1], no need to access RDMA resources before
the fork. Initialize Infiniband class only before a connection is
established or a listener is created. [1] is making sure that the call
to RDMAWorker::listen() is postponed till after the fork.
[1] - 7393db45644d ("msg/async: Postpone bind if network stack is not ready")
Issue: 995322
Change-Id: I8ea246b2e03c8c9533bc324b2b8d142eb3d1ed4d
Signed-off-by: Amir Vadai <amir@vadai.me>
RDMAStack shouldn't access hardware from the parent process.
The only reason to do so, is because bind is called before the fork.
After this patch the bind is postponed until the NetworkStack reports
that it is ready to bind.
For NetworkStack types will always return true, except the RDMAStack
which will return true only after the fork (after
AsyncMessenger::ready() is called).
This patch is based on a patch by Haomai Wang <haomai@xsky.com>
Issue: 995322
Change-Id: I1d0d0d52db0a339b9319680c18ee05cde87b2b64
Signed-off-by: Amir Vadai <amir@vadai.me>
If we don't set the luminous flag, we should not set the new luninous
fields or else we'll get a crc mismatch. (Funnily that happens in the
epoch where the flag is eventually set and the encoded map finally includes
the field we have set in memory.)
Signed-off-by: Sage Weil <sage@redhat.com>
This makes it tedious for teuthology health checks to proceed when we
deliberately run luminous osds without this flag.
Signed-off-by: Sage Weil <sage@redhat.com>
After I have set about 400 64KB xattr kv pair to a file,
mds is crashed. Every time I try to start mds, it will crash again.
The root reason is write_buf._len overflowed when doing
Journaler::append_entry().
This patch try to fix this problem through the following changes:
1. limit file/dir's xattr size
2. throttle journal entry append operations
Fixes: http://tracker.ceph.com/issues/19033
Signed-off-by: Yang Honggang joseph.yang@xtaotech.com
test: add explicit braces to avoid ambiguous ‘else’ and to silence warnings
Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
For "session evict" admin socket command return an error message when we
receive an invalid/missing client_id parameter rather than asserting.
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>