In lookup_pool and pool_delete, a lock is taken
before invoking wait_for_osdmap, but is not
released for the failure case of the call. Fixing the same.
Fixes: #9022
Signed-off-by: Pavan Rallabhandi <pavan.rallabhandi@sandisk.com>
Rational: I found I had created a series of OSD directories under "/dev/" when disks I thought existed did not exist.
Warning: This change will be noticed by end users and may effect deployment infrastructures.
Signed-off-by: Owen Synge <osynge@suse.com>
need_to_wait wasn't passed into processor->throttle_data(). This was
broken in fix for #8937.
CID 1229541: (PW.PARAM_SET_BUT_NOT_USED)
Backport: firefly
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit e93818df33)
Fixes: #8937
Following the fix to #8928 we end up accumulating pending data that
needs to be written. Beforehand it was working fine because we were
feeding it with the exact amount of bytes we were writing.
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 0553890e79)
The ceph_erasure_code_benchmark output is converted into a JSON series
suitable to display in HTML with the http://www.flotcharts.org/
library. A self contained copy of the HTML,JS,CSS files is included for
durability and can be used from the source tree with:
CEPH_ERASURE_CODE_BENCHMARK=src/ceph_erasure_code_benchmark \
PLUGIN_DIRECTORY=src/.libs \
qa/workunits/erasure-code/bench.sh fplot jerasure |
tee qa/workunits/erasure-code/bench.js
and display with:
firefox qa/workunits/erasure-code/bench.html
Signed-off-by: Loic Dachary <loic@dachary.org>
Remove partial list of contributions since Cloudwatt copyright has been
placed in the copyright notices of the files where works covered by
copyright have been included.
Signed-off-by: Loic Dachary <loic@dachary.org>
Expand the default suite to enumerate all cases that are relevant to the
current code base so that it is easier to consume. Namely it means
* iterating over object sizes of 4KB (what is used by default) and
1MB (what was previous benchmarked)
* grouping results in series that would make sense to plot to get the
behavior of a given technique for a series of K/M values and all
possible erasures.
Instead of specifying the iterations to run, set the size of the total
data set to be exercised and compute the iterations by dividing it by
the object size. Since the object size varies, it is impractical to
preset the number of iterations and get meaningful results.
The PARAMETERS environment variable is added to enable the caller to
inject --parameter jerasure-variant=generic, for instance.
The packets size is calculated based on the other parameters. The
options are limited when packets are small (4KB) and it would not make a
real difference to give control over it. The packet size is capped to
a maximum of 3100 bytes which is roughly what has been found to be an
optimal value for large packets (1MB).
Signed-off-by: Loic Dachary <loic@dachary.org>
The jerasure-variant parameter is interpreted as the name of the plugin
variant to be loaded regardless of the available CPU features. The
values can be sse3, sse4, generic. It is undocumented and meant for
benchmarking purposes, primarily to force the generic plugin to be
loaded when the sse4 would be chosen.
Signed-off-by: Loic Dachary <loic@dachary.org>
Only output a message about adjusting the buffer size when it is
adjusted, not when the size does not need adjustment.
Signed-off-by: Loic Dachary <loic@dachary.org>
jerasure expects chunk sizes that are aligned on the largest possible
vector size that could be used by SSE instructions, when available (
LARGEST_VECTOR_WORDSIZE == 16 bytes ).
For techniques derived from Cauchy, encoding and decoding is done by
subdividing the chunk into packets of packetsize bytes. The operations
are done w * packetsize bytes at a time. It follows that each chunk must
have a size that is a multiple of w * packetsize bytes.
For techniques derived from Vandermonde, it is enough for a chunk to be
a multiple of w * LARGEST_VECTOR_WORDSIZE.
ErasureCodeJerasure::get_alignment returns a size alignment constraint
that has to be enforced as a multiple of the object size. The resulting
object size then has to match the chunk constraints described above
although they have no relationship with K. For Cauchy, it leads to
excessive padding, making it impossible to set sensible parameters for
when the object size is small.
When the per_chunk_alignement data member is true, the semantic of
ErasureCodeJerasure::get_alignment is changed to return a size alignment
constraint to be enforced as a multiple of the chunk size. The
ErasureCodeJerasure::get_chunk_size method is modified to use the new
semantic when appropriate.
The jerasure-per-chunk-alignement parameter is parsed to set
per_chunk_alignement for the Vandermonde and Cauchy techniques.
The memory address of a chunk is implicitly aligned to a page boundary
because it is allocated with buffer::create_page_aligned.
http://tracker.ceph.com/issues/8475Fixes: #8475
Signed-off-by: Loic Dachary <loic@dachary.org>
Enforce the restriction at initialization time, the same way it is done
for Reed Solomon. Choosing a w value different from 8,16,32 will lead to
memory corruption that cannot easily be traced to the cause.
Signed-off-by: Loic Dachary <loic@dachary.org>
This makes it deterministic whether we output
2014-08-03 20:59:45.482614 4036c80 -1 did not load config file, using default settings.
or not, and will make the unit tests stop intermittently failing.
Signed-off-by: Sage Weil <sage@redhat.com>
We cannot guarantee that conf->osd_recovery_max_chunk don't change when
recoverying a erasure object.
If change between RecoveryOp::READING and RecoveryOp::WRITING, it can cause this bug:
2014-07-30 10:12:09.599220 7f7ff26c0700 -1 osd/ECBackend.cc: In function
'void ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,
RecoveryMessages*)' thread 7f7ff26c0700 time 2014-07-30 10:12:09.596837
osd/ECBackend.cc: 529: FAILED assert(pop.data.length() ==
sinfo.aligned_logical_offset_to_chunk_offset(
after_progress.data_recovered_to -
op.recovery_progress.data_recovered_to))
ceph version 0.83-383-g3cfda57
(3cfda577b1)
1: (ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,RecoveryMessages*)+0x1a50) [0x928070]
2: (ECBackend::handle_recovery_read_complete(hobject_t const&,
boost::tuples::tuple<unsigned long, unsigned long, std::map<pg_shard_t,
ceph::buffer::list, std::less<pg_shard_t>,
std::allocator<std::pair<pg_shard_t const, ceph::buffer::list> > >,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type>&, boost::optional<std::map<std::string,
ceph::buffer::list, std::less<std::string>,
std::allocator<std::pair<std::string const, ceph::buffer::list> > > >,
RecoveryMessages*)+0x90c) [0x92952c]
3: (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&)+0x121) [0x938481]
4: (GenContext<std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&)+0x9) [0x929d69]
5: (ECBackend::complete_read_op(ECBackend::ReadOp&,RecoveryMessages*)+0x63) [0x91c6e3]
6: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&,RecoveryMessages*)+0x96d) [0x920b4d]
7: (ECBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x17e)[0x92884e]
8: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,ThreadPool::TPHandle&)+0x23b) [0x7b34db]
9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x428)
[0x638d58]
10: (OSD::ShardedOpWQ::_process(unsigned int,ceph::heartbeat_handle_d*)+0x346) [0x6392f6]
11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ce)[0xa5caae]
12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa5ed00]
13: (()+0x8182) [0x7f800b5d3182]
14: (clone()+0x6d) [0x7f800997430d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
So we only get the get_recovery_chunk_size() at RecoverOp::READING and
record it using RecoveryOp::extent_requested.
Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
Avoid this deadlock:
- a fault
- delay thread entry gets a fast dispatch message
- drops delay_lock
- calls into fast_dispatch
- reaper tries to reap the pipe
- pipe->join()
- delay_thread->join()
- blocks waiting for delay_thread to exit
- delay thread / fast dispatch blocks on msgr->lock trying to mark_down
The solution is to drop the msgr lock while joining the thread. This will
allow the join() to complete. Adjust the reaper thread to recheck the
exit condition since the lock may have been dropped. The other two callers
do not care.
Fixes: #8891
Signed-off-by: Sage Weil <sage@redhat.com>
need_to_wait wasn't passed into processor->throttle_data(). This was
broken in fix for #8937.
CID 1229541: (PW.PARAM_SET_BUT_NOT_USED)
Backport: firefly
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>