Fragmenting a non-auth dirfrag results several smaller dirfrags. Some
of the resulting dirfrags can be empty, which are not used to connected
to auth subtree.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
If dirfrags are subtree roots, mark the dirfragtreelock as scattered
dirty, otherwise journal the dirfragtree change.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
MDCache::handle_cache_expire() ignores mismatched dirfrags. this is
OK during normal operation because MDS doesn't trim replica inode
whose dirfrags are likely being fragmented (see commit 22535340).
During recovery, the recovering MDS can reveive survivor MDS' cache
expire message before it sends cache rejoin acks. In this case,
there still can be mismatched dirfrags, but nothing prevents the
survivor MDS to trim inode of these mismatched dirfrags. So there
can be unconnected dirfrags when the recovering MDS sends cache
rejoin acks.
The fix is, when mismatched dirfrag is encountered during recovery,
check if inode of the dirfrag is still replicated to the sender MDS.
If the inode is not replicated, remove the sender MDS from replica
maps of all child dirfrags.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
For slave rename and rmdir events, the MDS needs to preserve non-auth
dirfrag where the renamed inode originally lives in until slave commit
event is encountered. Current method to handle this is use MDCache::
uncommitted_slave_rename_olddir to track any non-auth dirfrag that
need to be preserved. This method does not works well if any preserved
dirfrag gets fragmented by log event (such as ESubtreeMap) between the
slave prepare event and the slave commit event.
The fix is tracking inode of dirfrag instead of tracking dirfrag that
need to preserved directly.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
The OSDs need to support this feature before we allow users to turn it
on. This is similar to what the erasure pool support does.
Signed-off-by: Sage Weil <sage@inktank.com>
A few cases:
- As we are working through the list, if we see a clone that is lower than
the next one we were expecting, we should be able to skip them.
- If we see a head, we can skip all of the rest of the clones.
- If we get to the end and next_clone was set, we can ignore it.
Signed-off-by: Sage Weil <sage@inktank.com>
- notice when we are missing a clone (that isn't at the end of the list)
- notice when we are missing a clone on the last object in the scrub map
- do not assert when we are missing a clone
There is still more we could do to improve this (like noticing one missing
clone but still checking the others), but we'll leave that aside for just
a moment...
Signed-off-by: Sage Weil <sage@inktank.com>
Trigger a scrub to verify that we can handle a cache tier that is missing
some clones. We rely on the test harness to notice the error, and we do
not confirm that the scrub happened. In practice this is plenty of time,
however.
Signed-off-by: Sage Weil <sage@inktank.com>
We don't need to worry about pidfile because that is done by the fork
functions, which ceph-conf doesn't call.
Signed-off-by: Sage Weil <sage@inktank.com>
If you are querying the conf for an osd and it has a log configured, we
should not generate any log activity.
This isn't super pretty, but it is much less intrusive that wiring a 'do
not log' flag down into CephContext and a zillion other places.
Fixes: #7849
Signed-off-by: Sage Weil <sage@inktank.com>
Fixes: #7876
Need to use the actual content length, not the pointer to the string.
This was probably working because there's correlation to when
content_length > 0 to whether s->length is not null.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
If the machine running make check has the required CPU features
available, load the SSE optimized plugin and check that it can encode /
decode a simple payload. If the CPU features are not available, only
test the generic plugin and display an informative message about the
tests that were skipped.
Signed-off-by: Loic Dachary <loic@dachary.org>
Test the selection of the plugin depending on the CPU features. The
prefix of the plugin is "jerasure" by default (jerasure_generic,
jerasure_sse3, jerasure_sse4) and can be modified with the
"jerasure-name" parameter. A test plugin is created for each
variant (test_jerasure_generic, test_jerasure_sse3, test_jerasure_sse4).
The flags set by ceph_probe are modified by the test to check if the
expected plugin suffix is appended.
Signed-off-by: Loic Dachary <loic@dachary.org>
The jerasure plugin is compiled with three sets of flags:
* jerasure_generic with no SSE optimization
* jerasure_sse3 with SSE2, SSE3 and SSSE3 optimizations
* jerasure_sse4 with SSE2, SSE3, SSSE3, SSE41, SSE42 and PCLMUL optimizations
The jerasure plugin loads the appropriate plugin depending on the CPU
features detected at runtime.
http://tracker.ceph.com/issues/7826fixes#7826
Signed-off-by: Loic Dachary <loic@dachary.org>
Rename SIMD to INTEL for clarity.
Instead of agregating all flags in INTEL_FLAGS, create individual flags
for each feature (INTEL_SSE2_FLAGS etc.) for finer control in the
makefiles.
Signed-off-by: Loic Dachary <loic@dachary.org>
To avoid confusion, the jerasure v1 branch that contains commits pending
review upstream is named v2-ceph and the gf-complete v2 branch is named
v2-ceph.
Signed-off-by: Loic Dachary <loic@dachary.org>
The Mutex scope is restricted to only protect the load() method and not
the factory() method. This allows a plugin to load another plugin from
within the factory() method.
Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
If we enable HitSet tracking, the OSD needs to know this, but clients do
not care. Setting the compat version is too heavyweight as it locks out
older kernels (*any* currents, currently) who are unaffected by the new
fields.
Signed-off-by: Sage Weil <sage@inktank.com>