If we fail to cancel the tick_event, we rely on tick() itself to clear
tick_event. I'm not quite sure how we got this wrong in the previous
commit, but this boils down to two cases:
1) shutdown() successfully cancels the event and clears tick_event. tick()
never runs. tick_event == NULL when we finish.
2) shutdown() fails to cancel the event because it has already started. In
this case tick itself is blocking (or about to block) waiting on the
rlock. When it does run it will clear tick_event itself, then see
initiazed == 0 and exit without rescheduling.
Fixes: #9873
Signed-off-by: Sage Weil <sage@redhat.com>
If we have safe_callbacks==false, the stopping flag may have changed while
we were doing our callback. Recheck it and exit to avoid a deadlock on
shutdown.
Signed-off-by: Sage Weil <sage@redhat.com>
Added stat filling helper function but only stat and lstat were updated.
This patch makes fstat use it. Crucially the fstat wasn't updating the
mode flags.
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Watch/notify ops need to be resent after a pg split occurs, as well as
a few other circumstances that the existing objecter checks did not
catch.
Refactor the check the OSD uses for this to add a version taking the
more basic types instead of the whole OSD map, and stash the needed
info when an op is sent.
Fixes: #9806
Backport: giant, firefly, dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
We no longer require that a lock on the FD be held for the duration of an
operation, only while accessing the actual index. We cannot, therefore, assume
that a racing read during lfn_unlink (backfill or scrub) does not still have a
reference to the fd. We want to remove the fd from the cache to prevent
subsequent operations from finding it while allowing such a racing read to
complete with its existing fd.
Fixes: #9480
Signed-off-by: Samuel Just <sam.just@inktank.com>
purge detaches the lru shared_ptr currently associated from
the key from the lru even if there are still references.
Signed-off-by: Samuel Just <sam.just@inktank.com>
The method contract specifies that we do not want to delete
value if we are not inserting it, so do not initialize val
at the top of the function to take over value. No current
users appear to trip over this problem (FDCache and
map_cache).
Signed-off-by: Samuel Just <sam.just@inktank.com>
We are dropping the requirement for MON_CAP_R for MMonGetMap.
Reason is simple enough: clients may need to contact the monitors and
obtain the latest monmap before authenticating. This happens, for
instance, when a client calls MonClient::get_monmap_privately(). The
osd uses this function during mkfs, prior to initializing a keyring or
even so much as existing.
Fixes: #9859
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
If we have a change in the prior set, but not in the up/acting set, we go back
through Reset in order to reset peering state. Previously, we would reset
last_peering_reset in the Reset constructor. This did not, however, reset the
flush_interval, which caused the eventual flush event to be ignored and the
peering messages to not be sent.
Instead, we will always reset_interval_flush if we are actually changing the
last_peering_reset value.
Fixes: #9821
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
Re-create and describe the situation that is fixed by
91a7e18f60bbc9acab3045baaa1b6505474ec4a9 which reworks the buffer
preparation function provided by ErasureCode::encode.
http://tracker.ceph.com/issues/9408 Refs: #9408
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
For test purposes and it will also be useful for plugins that must
ensure the chunk size is a multiple of SIMD_ALIGN.
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
The benchmark is supposed to measure the encoding/decoding speed and
not the overhead of buffer realignments.
Signed-off-by: Janne Grunau <j@jannau.net>
Requiring page aligned buffers and realigning the input if necessary
creates measurable oberhead. ceph_erasure_code_benchmark is between
10-20% faster depending on the workload.
Also prevents a misaligned buffer when bufferlist::c_str(bufferlist)
has to allocate a new buffer to provide continuous one. See bug #9408
Signed-off-by: Janne Grunau <j@jannau.net>
SIMD optimized erasure code computation needs aligned memory. Buffers
aligned to a page boundary are not needed though. The buffers used
for the erasure code computation are typical smaller than a page.
The typical alignment requirements SIMD operations are 16 bytes for
SSE2 and NEON and 32 bytes for AVX/AVX2.
Add new prototypes with an align argument, similar to the one enforcing
page alignment. The implementation is exactly the same, except for the
align parameter. The page alignment method are then implemented as calls
to the more generic methods.
The align parameter is an unsigned (same type as CEPH_PAGE_SIZE). The
CEPH_PAGE_MASK value ( ~(CEPH_PAGE_SIZE - 1) ) was only used as
~CEPH_PAGE_MASK, i.e. equivalent of (CEPH_PAGE_SIZE - 1) once the double
~~ is reduced. These occurrence are replaced with (align - 1). The type
of CEPH_PAGE_MASK is an unsigned long which probably because it was
~(CEPH_PAGE_SIZE). When using (align - 1) as a mask for both
CEPH_PAGE_SIZE and SIMD alignment there is no need to use an unsigned
long because there is no risk of overflowing the unsigned value.
The CYGWIN specific code is also modified but not tested.
Unit tests are added for the new methods.
Signed-off-by: Janne Grunau <j@jannau.net>
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
The regression was introduced in commit
4fc9fffc49. The problem is that the cache
thinks it's full (when it's not), so it defers the read. This change
frees up cache space if necessary and only defers the read if enough
space cannot be freed.
Fixes: 9513
Signed-off-by: Adam Crume <adamcrume@gmail.com>
(cherry picked from commit 82175ec94a)
Suites run with CEPH_TEST_CLI_DUP_COMMAND=1, which will send a duplicate
command for every command issued with the 'ceph' tool. Behavior is to
get a reply from the command and then send a duplicate, looking for the
same outcome (guaranteeing idempotency of the operations). However, it
so happens that if you remove the entity's own key from the keyring and
you happen to be unlucky enough so that the client's connection gets
failed (we also run tests with connection failure injections), the
'ceph' tool won't be able to reconnect to the cluster to send the
duplicate command (as it's entity no longer exists in the cluster's
keyring).
We rewrite the test instead of resorting to ugly hacks to work around
this behavior, simply having a new 'role-definer' added by the existing
'role-definer' (which we weren't testing anyway, so bonus points for
that) and then have one removing the other (to test the procedure) and
finally using 'client.admin' to remove the last 'role-definer'.
Fixes: #9820
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
interval_set::insert takes arguments start and len, not end.
Signed-off-by: Henry C Chang <henry.cy.chang@gmail.com>
(cherry picked from commit c95bb59434)
The || instead of && had it always installed. That was fixed in EPEL
already.
http://tracker.ceph.com/issues/9747Fixes: #9747
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
(cherry picked from commit 5ff4a850a0)