Commit Graph

34729 Commits

Author SHA1 Message Date
xinxin shu
c09036aca2 enable info_log_level config option for rocksdb
Signed-off-by: xinxin shu <xinxin.shu@intel.com>
2014-08-06 02:11:38 +08:00
xinxin shu
b9b022e5de add annotation for rocksdb config option
Signed-off-by: xinxin shu <xinxin.shu@intel.com>
2014-08-05 06:24:44 +08:00
Sage Weil
541006c83d Merge pull request #1875 from dachary/wip-8437
erasure-code: benchmarking jerasure

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-04 17:41:53 -07:00
Sage Weil
0b445e0f6f Merge remote-tracking branch 'gh/next' 2014-08-04 13:56:24 -07:00
Sage Weil
047c18db6f doc/release-notes: make note about init-radosgw change
This changed back in 524aee6f95 but
was not mentioned in the release notes.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-04 13:48:06 -07:00
John Wilkins
354c4112a9 doc: Added 'x' to monitor cap.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-08-04 11:47:58 -07:00
Samuel Just
18b7a37c96 Merge pull request #2166 from majianpeng/bug-fix
os/FileJournal: When dump journal, using correctly seq avoid misjudging joural corrupt.

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-04 10:33:16 -07:00
Samuel Just
6878e8cdbf Merge pull request #2184 from majianpeng/fix2
ECBackend: Don't directyly use get_recovery_chunk_size() in RecoveryOp::WRITING state

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-04 10:32:07 -07:00
Samuel Just
6f05ff8f59 Merge pull request #2194 from majianpeng/fix1
osd/ECBackend: clean up assert(r==0) in continue_recovery_op.

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-04 10:31:18 -07:00
Samuel Just
3897f09a70 Merge pull request #2192 from ceph/wip-8891
msg/SimpleMessenger: drop msgr lock when joining a Pipe

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-04 10:30:25 -07:00
Yehuda Sadeh
7b2c8b3310 cls_rgw: fix object name of objects removed on object creation
Fixes: #8972
Backport: firefly, dumpling

Reported-by: Patrycja Szabłowska <szablowska.patrycja@gmail.com>
Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 0f8929a68a)
2014-08-04 10:04:55 -07:00
Yehuda Sadeh
8519e9ab06 rgw: need to pass need_to_wait for throttle_data()
need_to_wait wasn't passed into processor->throttle_data(). This was
broken in fix for #8937.

CID 1229541:    (PW.PARAM_SET_BUT_NOT_USED)

Backport: firefly

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit e93818df33)
2014-08-04 10:03:40 -07:00
Yehuda Sadeh
062062479a rgw: call processor->handle_data() again if needed
Fixes: #8937

Following the fix to #8928 we end up accumulating pending data that
needs to be written. Beforehand it was working fine because we were
feeding it with the exact amount of bytes we were writing.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
(cherry picked from commit 0553890e79)
2014-08-04 09:52:51 -07:00
Sage Weil
2f44d768e7 Merge pull request #2191 from ceph/wip-rgw-need-to-wait
rgw: need to pass need_to_wait for throttle_data()

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-04 09:51:43 -07:00
Loic Dachary
78b2c10aa8 Merge pull request #2195 from apeters1971/wip-ec-isa-fast-xor
EC-ISA: provide a 10% faster simple parity operation for (k, m=1)

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-08-04 18:41:05 +02:00
Loic Dachary
b22a8a9e60 Merge pull request #2193 from ceph/wip-ceph-conf
ceph-conf: flush log on exit

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-08-04 18:28:49 +02:00
Andreas-Joachim Peters
516101ae99 EC-ISA: provide a 10% faster simple parity operation for (k, m=1). Add simple parity unit test for k=4,m=1 2014-08-04 15:51:15 +02:00
Ma Jianpeng
985b7c2fc2 osd/ECBackend: clean up assert(r==0) in continue_recovery_op.
After the commit(d9106ce5e4), the assert(r==0) is no longer
necessary.
2014-08-04 18:02:12 +08:00
Loic Dachary
8363a94a60 erasure-code: HTML display of benchmark results
The ceph_erasure_code_benchmark output is converted into a JSON series
suitable to display in HTML with the http://www.flotcharts.org/
library. A self contained copy of the HTML,JS,CSS files is included for
durability and can be used from the source tree with:

    CEPH_ERASURE_CODE_BENCHMARK=src/ceph_erasure_code_benchmark  \
    PLUGIN_DIRECTORY=src/.libs \
        qa/workunits/erasure-code/bench.sh fplot jerasure |
        tee qa/workunits/erasure-code/bench.js

and display with:

    firefox qa/workunits/erasure-code/bench.html

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-08-04 11:42:05 +02:00
Loic Dachary
3cc723450a COPYING: Cloudwatt copyright is inline
Remove partial list of contributions since Cloudwatt copyright has been
placed in the copyright notices of the files where works covered by
copyright have been included.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-08-04 11:42:03 +02:00
Loic Dachary
e11c3fcc3b erasure-code: rework benchmark suite
Expand the default suite to enumerate all cases that are relevant to the
current code base so that it is easier to consume. Namely it means

 * iterating over object sizes of 4KB (what is used by default) and
   1MB (what was previous benchmarked)
 * grouping results in series that would make sense to plot to get the
   behavior of a given technique for a series of K/M values and all
   possible erasures.

Instead of specifying the iterations to run, set the size of the total
data set to be exercised and compute the iterations by dividing it by
the object size. Since the object size varies, it is impractical to
preset the number of iterations and get meaningful results.

The PARAMETERS environment variable is added to enable the caller to
inject --parameter jerasure-variant=generic, for instance.

The packets size is calculated based on the other parameters. The
options are limited when packets are small (4KB) and it would not make a
real difference to give control over it. The packet size is capped to
a maximum of 3100 bytes which is roughly what has been found to be an
optimal value for large packets (1MB).

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-08-04 11:42:01 +02:00
Loic Dachary
90592e9d0e erasure-code: properly indent ErasureCodePluginSelectJerasure.cc
Signed-off-by: Loic Dachary <loic@dachary.org>
2014-08-04 11:42:00 +02:00
Loic Dachary
be3e1e4097 erasure-code: control jerasure plugin variant selection
The jerasure-variant parameter is interpreted as the name of the plugin
variant to be loaded regardless of the available CPU features. The
values can be sse3, sse4, generic. It is undocumented and meant for
benchmarking purposes, primarily to force the generic plugin to be
loaded when the sse4 would be chosen.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-08-04 11:41:58 +02:00
Loic Dachary
5fb4354f35 erasure-code: reduce jerasure verbosity
Only output a message about adjusting the buffer size when it is
adjusted, not when the size does not need adjustment.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-08-04 11:41:56 +02:00
Loic Dachary
c7daaaf5e6 erasure-code: implement alignment on chunk sizes
jerasure expects chunk sizes that are aligned on the largest possible
vector size that could be used by SSE instructions, when available (
LARGEST_VECTOR_WORDSIZE == 16 bytes ).

For techniques derived from Cauchy, encoding and decoding is done by
subdividing the chunk into packets of packetsize bytes. The operations
are done w * packetsize bytes at a time. It follows that each chunk must
have a size that is a multiple of w * packetsize bytes.

For techniques derived from Vandermonde, it is enough for a chunk to be
a multiple of w * LARGEST_VECTOR_WORDSIZE.

ErasureCodeJerasure::get_alignment returns a size alignment constraint
that has to be enforced as a multiple of the object size. The resulting
object size then has to match the chunk constraints described above
although they have no relationship with K. For Cauchy, it leads to
excessive padding, making it impossible to set sensible parameters for
when the object size is small.

When the per_chunk_alignement data member is true, the semantic of
ErasureCodeJerasure::get_alignment is changed to return a size alignment
constraint to be enforced as a multiple of the chunk size. The
ErasureCodeJerasure::get_chunk_size method is modified to use the new
semantic when appropriate.

The jerasure-per-chunk-alignement parameter is parsed to set
per_chunk_alignement for the Vandermonde and Cauchy techniques.

The memory address of a chunk is implicitly aligned to a page boundary
because it is allocated with buffer::create_page_aligned.

http://tracker.ceph.com/issues/8475 Fixes: #8475

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-08-04 10:54:21 +02:00
Loic Dachary
3987ac2a41 erasure-code: cauchy techniques allow w 8,16,32
Enforce the restriction at initialization time, the same way it is done
for Reed Solomon. Choosing a w value different from 8,16,32 will lead to
memory corruption that cannot easily be traced to the cause.

Signed-off-by: Loic Dachary <loic@dachary.org>
2014-08-04 10:54:21 +02:00
Sage Weil
3230060f07 ceph-conf: flush log on exit
This makes it deterministic whether we output

2014-08-03 20:59:45.482614 4036c80 -1 did not load config file, using default settings.

or not, and will make the unit tests stop intermittently failing.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-03 21:00:51 -07:00
Ma Jianpeng
076f33afb3 ECBackend: Don't directly use get_recovery_chunk_size() in RecoveryOp::WRITING state.
We cannot guarantee that conf->osd_recovery_max_chunk don't change when
recoverying a erasure object.
If change between RecoveryOp::READING and RecoveryOp::WRITING, it can cause this bug:

2014-07-30 10:12:09.599220 7f7ff26c0700 -1 osd/ECBackend.cc: In function
'void ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,
RecoveryMessages*)' thread 7f7ff26c0700 time 2014-07-30 10:12:09.596837
osd/ECBackend.cc: 529: FAILED assert(pop.data.length() ==
sinfo.aligned_logical_offset_to_chunk_offset(
after_progress.data_recovered_to -
op.recovery_progress.data_recovered_to))

 ceph version 0.83-383-g3cfda57
(3cfda577b1)
 1: (ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,RecoveryMessages*)+0x1a50) [0x928070]
 2: (ECBackend::handle_recovery_read_complete(hobject_t const&,
boost::tuples::tuple<unsigned long, unsigned long, std::map<pg_shard_t,
ceph::buffer::list, std::less<pg_shard_t>,
std::allocator<std::pair<pg_shard_t const, ceph::buffer::list> > >,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type>&, boost::optional<std::map<std::string,
ceph::buffer::list, std::less<std::string>,
std::allocator<std::pair<std::string const, ceph::buffer::list> > > >,
RecoveryMessages*)+0x90c) [0x92952c]
 3: (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&)+0x121) [0x938481]
 4: (GenContext<std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&)+0x9) [0x929d69]
 5: (ECBackend::complete_read_op(ECBackend::ReadOp&,RecoveryMessages*)+0x63) [0x91c6e3]
 6: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&,RecoveryMessages*)+0x96d) [0x920b4d]
 7: (ECBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x17e)[0x92884e]
 8: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,ThreadPool::TPHandle&)+0x23b) [0x7b34db]
 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x428)
[0x638d58]
 10: (OSD::ShardedOpWQ::_process(unsigned int,ceph::heartbeat_handle_d*)+0x346) [0x6392f6]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ce)[0xa5caae]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa5ed00]
 13: (()+0x8182) [0x7f800b5d3182]
 14: (clone()+0x6d) [0x7f800997430d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

So we only get the get_recovery_chunk_size() at RecoverOp::READING and
record it using RecoveryOp::extent_requested.

Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
2014-08-04 11:09:32 +08:00
Sage Weil
98997f3b22 msg/SimpleMessenger: drop msgr lock when joining a Pipe
Avoid this deadlock:

- a fault
- delay thread entry gets a fast dispatch message
 - drops delay_lock
 - calls into fast_dispatch
- reaper tries to reap the pipe
 - pipe->join()
  - delay_thread->join()
   - blocks waiting for delay_thread to exit
- delay thread / fast dispatch blocks on msgr->lock trying to mark_down

The solution is to drop the msgr lock while joining the thread.  This will
allow the join() to complete.  Adjust the reaper thread to recheck the
exit condition since the lock may have been dropped.  The other two callers
do not care.

Fixes: #8891
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-03 18:26:34 -07:00
Sage Weil
e36babc825 os/MemStore: fix lock leak
CID 1228868 (#2-1 of 2): Missing unlock (LOCK)
12. missing_unlock: Returning without unlocking oc->lock.L.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-03 11:23:33 -07:00
Yehuda Sadeh
e93818df33 rgw: need to pass need_to_wait for throttle_data()
need_to_wait wasn't passed into processor->throttle_data(). This was
broken in fix for #8937.

CID 1229541:    (PW.PARAM_SET_BUT_NOT_USED)

Backport: firefly

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
2014-08-02 20:40:37 -07:00
Sage Weil
3de7b7c52c doc/release-notes: fix syntax error
Attempt 2...

ERROR: /srv/autobuild-ceph/gitbuilder.git/build/doc/release-notes.rst:22: Unknown target name: "leveldb".

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-01 21:19:26 -07:00
Sage Weil
3caf1e3f8e Merge pull request #2188 from wonzhq/obj-mtime
osd: add local_mtime to struct object_info_t

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-01 19:27:01 -07:00
Sage Weil
c95e91ef1a os/KeyValueStore: clean up operator<< for KVSuperBlock
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-01 19:24:26 -07:00
Sage Weil
e408f98d89 Merge pull request #2174 from yuyuyu101/kvstore-superblock
Kvstore superblock

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-01 19:23:35 -07:00
Sage Weil
3a05ff9257 Merge pull request #2169 from ceph/wip-double-pc
mon: s/%%/%/

Realized where these came from; it was an accident.
2014-08-01 18:01:43 -07:00
Sage Weil
79d1aff182 Merge branch 'wip-cache-second'
Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-01 15:37:33 -07:00
Signed-off-by: Zhiqiang Wang
1417eded65 ceph_test_rados_api_tier: test promote-on-second-read behavior
Signed-off-by: Zhiqiang Wang <wonzhq@hotmail.com>
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-01 15:37:22 -07:00
Zhiqiang Wang
0ed3adc1e0 osd: promotion on 2nd read for cache tiering
Signed-off-by: Zhiqiang Wang <wonzhq@hotmail.com>
2014-08-01 15:37:22 -07:00
Samuel Just
52c2182fe8 Merge pull request #2183 from majianpeng/master
ECBackend: Using ROUND_UP_TO to refactor function get_recovery_chunk_size()

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-01 13:31:08 -07:00
Samuel Just
dad092c6e5 Merge pull request #2175 from majianpeng/fix1
ReplicatedPG: For async-read, set the real result after completing read.

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-01 13:30:04 -07:00
Sage Weil
f752ff49fe Merge pull request #2180 from ceph/wip-ec-isa
osd: add support for intel ISA-L EC library
2014-08-01 10:00:23 -07:00
Samuel Just
f335c73b12 Merge pull request #2172 from ceph/wip-8714
osd: prevent old clients from using tiered pools

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-01 09:57:24 -07:00
Sage Weil
cd64a63db0 Merge remote-tracking branch 'gh/next' 2014-08-01 07:08:28 -07:00
Gregory Farnum
a00777f428 Merge pull request #2190 from ceph/wip-osd-leaks
osd: do not leak Session* ref in _send_boot()

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-08-01 10:08:16 -04:00
Gregory Farnum
fc2d18bb1e Merge pull request #2182 from ceph/wip-round
use llrintl when converting double to micro

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-08-01 08:11:08 -04:00
Zhiqiang Wang
13b9dc7084 osd: add local_mtime to struct object_info_t
This fixes a bug when the time of the OSDs and clients are not
synchronized (especially when client is ahead of OSD), and the cache
tier dirty ratio reaches the threshold, the agent skips the flush work
because it thinks the object is too young.

Signed-off-by: Zhiqiang Wang <wonzhq@hotmail.com>
2014-08-01 16:09:50 +08:00
Sage Weil
c2fc1a9429 Merge branch 'wip-rocksdb' 2014-07-31 21:11:44 -07:00
Sage Weil
57fd60cdd2 rocksdb: -Wno-portability
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-31 21:11:25 -07:00
Sage Weil
c574e653e4 autogen.sh: debug with -x
Signed-off-by: Sage Weil <sage@redhat.com>
2014-07-31 21:11:25 -07:00