The objecter messenger is only used as a client to initiate client-side
connections to other OSDs. It doesn't need to bind to a port.
This was added in 558d9fc956 to push client
traffic to the cluster interface. This doesn't actually help/work because
we are still connecting to our peers' client-facing addresses.
Signed-off-by: Sage Weil <sage@redhat.com>
Keep the osd trash test to ensure it is a valid command but make it a
noop by giving it a zero argument (meaning thrash 0 OSD maps).
Remove the loops that were added after the command in an attempt to wait
for the cluster to recover and not pollute the rest of the tests. Actual
testing of osd thrash would require a dedicated cluster because it the
side effects are random and it is unnecessarily difficult to ensure they
are finished.
http://tracker.ceph.com/issues/9620Fixes: #9620
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Scientific Linux is a RHEL clone and needs to use partx.
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 5ca7ea5b53)
If we post an rx buffer and there is a timeout, the revocation can happen
while the reader has consumed the buffers but before it has decoded and
constructed the message. In particular, we calculate a crc32c over the
data portion of the message after we've taken the buffers and dropped the
lock.
Instead of fixing this race (for example, by reverifying rx_buffers under
the lock while calculating the crc.. bleh), just skip the rx buffer
optimization entirely when a timeout is present.
Note that this doesn't cover the op_cancel() paths, but none of those users
provide static buffers to read into.
Fixes: #9582
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
Create an encode_decode() helper method to be called from the
encode_decode test function with various object size arguments. The
helper method is a copy/paste of the previous test that was using a
single object of a fixed size. The test is slightly adapted to
accommodate for different object sizes but the logic is not modified.
The object sizes being tested are chosen to be under the size of the
required size alignment or on multiple pages, size aligned or not.
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Asserting on reaper_stop only made sense if the
messenger had ever been started: as it stood,
one couldn't create and destroy a messenger
without also starting and stopping it.
Signed-off-by: John Spray <john.spray@redhat.com>
The encode tests use the alignment constraints. It has been changed to
be aligned on a per chunk basis instead of computing a more expensive
object alignement constraint. The test function is modified to take the
change into account but the logic is otherwise unmodified.
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Because running valgrind with no libtool does not test the binary but
the enclosing shell script.
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Copy code from the jerasure plugin to enforce alignment constraints per
chunk instead of using the total object size. It is simpler and reduces
the size of the chunks. See
c7daaaf5e6
for more information.
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Also, explicitely maintain a max number of concurrently trimming
objects.
Fixes: 9113
Backport: dumpling, firefly, giant
Signed-off-by: Samuel Just <sam.just@inktank.com>
Otherwise, we might queue 30 pgs for backfill at 0.80 fullness
and then never check again filling the osd after pg 11.
Fixes: #9574
Backport: dumpling, firefly, giant
Signed-off-by: Samuel Just <sam.just@inktank.com>
Otherwise statfs may fail if mkfs hasn't been run yet or if the monitor
data directory does not exist. There are checks to account for the mon
data dir not existing and we should wait for them to clear before we go
ahead and check the fs stats.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
There are two new plugins (isa and lrc). When upgrading a cluster, there
must be a protection against the following scenario:
* the mon are upgraded but not the osd
* a new pool is created using plugin isa
* the osd fail to load the isa plugin because they have not been
upgraded
A feature bit is added : PLUGINS_V2. The monitor will only agree to
create an erasure code profile for the isa or lrc plugin if all OSDs
supports PLUGINS_V2. Once such an erasure code profile is stored in the
OSDMap, an OSD can only boot if it supports the PLUGINS_V2 feature,
which means it is able to load the isa and lrc plugins.
The monitors will only activate the PLUGINS_V2 feature if all monitors
in the quorum support it. It protects against the following scenario:
* the leader is upgraded the peons are not upgraded
* the leader creates a pool with plugin=lrc because all OSD have
the PLUGINS_V2 feature
* the leader goes down and a non upgraded peon becomes the leader
* an old OSD tries to join the cluster
* the new leader will let the OSD boot because it does not contain
the logic that would excluded it
* the old OSD will fail when required to load the plugin lrc
This is going to be needed each time new plugins are added, which is
impractical. A more generic plugin upgrade support should be added
instead, as described in http://tracker.ceph.com/issues/7291.
http://tracker.ceph.com/issues/9343 Refs: #9343
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
Reduces the noise caused by read-only operations via the admin socket.
RW commands are still logged at 'info' level.
Fixes: #9455
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
We must only expand the log file's channel meta variables upon requiring
a channel's log file. As we may have a 'default' channel that will
cover all channels, we must wait to expand channels as they come in and
do so if they haven't yet been expanded. Expanding the 'log_file' in
place would have the unfortunate side effect of expanding, say,
default=/tmp/whatever.$channel.log
to
default=/tmp/whatever.default.log
which would not be what we wanted upon receiving a message that should
go into channel 'foo' -- assuming we specified no such channel in the
options, channel 'foo' should go into '/tmp/whatever.foo.log'.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
Keeps backward compatibility when there are entities that do not know
what a channel is. This way we ensure that those messages are logged as
they were expected to be before channels were introduced: to the cluster
log.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
and relieve the DataStats struct from clutter by using
ceph_data_stats_t instead of multiple fields.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
If we cache_evict a head in a cache pool, we need to prevent
make_writeable() from cloning the head and finish_ctx() from
preserving the snapdir object.
Fixes: #8629
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
FileStore calls should_commit_now() to determine whether it should
loop and do a second sync (among other things). During shutdown, this
can force us into a livelock: the journal is shutting down, but the
sync_entry loop never completes and repeatedly syncs because the
journal is full. Since the journal is otherwise stopped, no expire
happens and we never become unfull, and we're stuck.
This seems to be triggered semi-reliably by the ceph_objectstore_tool
import function.
Fix by not requesting a sync while shutting down.
Fixes: #9545
Signed-off-by: Sage Weil <sage@redhat.com>