Commit Graph

36366 Commits

Author SHA1 Message Date
Loic Dachary
beade63a17 qa/workunits/cephtool/test.sh: fix thrash (ultimate)
Keep the osd trash test to ensure it is a valid command but make it a
noop by giving it a zero argument (meaning thrash 0 OSD maps).

Remove the loops that were added after the command in an attempt to wait
for the cluster to recover and not pollute the rest of the tests. Actual
testing of osd thrash would require a dedicated cluster because it the
side effects are random and it is unnecessarily difficult to ensure they
are finished.

http://tracker.ceph.com/issues/9620 Fixes: #9620

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-29 13:47:06 +02:00
Loic Dachary
9ced13789c Merge pull request #2590 from dachary/wip-9592-librados-large-object
librados large object early check

Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-29 08:38:34 +02:00
Loic Dachary
5d1d9dba60 librados: cap the IoCtxImpl::{aio_}*{write,append} buffer length
If the value of the len parameter is greater than UINT_MAX/2,
IoCtxImpl::aio_write, IoCtxImpl::aio_write_full, IoCtxImpl::aio_append,
IoCtxImpl::write, IoCtxImpl::append will fail with E2BIG.

IoCtxImpl::write_full is the exception because it does not have a
length argument to check.

For more information see 33501d2426

http://tracker.ceph.com/issues/9592 Fixes: #9592

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-29 08:35:54 +02:00
Sage Weil
4db51bb882 Merge pull request #2400 from majianpeng/fix2
osd: Make RPGTransaction::get_bytes_written return the correct size.

Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-28 17:46:57 -07:00
Loic Dachary
becc1140ac librados: test s/E2BIG/TooBig/
Because E2BIG does not allow selection

./ceph_test_rados_api_aio --gtest_filter=LibRadosAio.E2BIG
Running main() from gtest_main.cc
Note: Google Test filter = LibRadosAio.E2BIG
[==========] Running 0 tests from 0 test cases.
[==========] 0 tests from 0 test cases ran. (0 ms total)
[  PASSED  ] 0 tests.

probably because it contains a number.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-28 10:37:21 +02:00
Loic Dachary
32195f948d librados: cap the rados_aio_*{write,append} buffer length
If the value of the len parameter is greater than UINT_MAX/2,
rados_aio_write, rados_aio_write_full and rados_aio_append will fail
with E2BIG.

For more information see 33501d2426

http://tracker.ceph.com/issues/9592 Fixes: #9592

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-28 10:26:23 +02:00
Jianpeng Ma
f777fc6ef3 osd: Make RPGTransaction::get_bytes_written return the correct size.
It record size larger than clien wrote. It should like
ECTransaction::get_bytes_written only return the size which clien
wrote. It should contain omap data.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
2014-09-28 15:01:46 +08:00
Sage Weil
7849d792a8 crushtool: add --show-location <id> command
Include some tests.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-09-27 07:51:24 -07:00
Sage Weil
8badd5a4dc Merge pull request #2584 from dachary/wip-9592-librados-large-object
librados: cap the rados*{write,append} buffer length

Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-27 05:58:28 -07:00
Loic Dachary
33501d2426 librados: cap the rados*{write,append} buffer length
When the caller submits a payload that will end up being rejected with

  rados.Error: Ioctx.write(rbd): failed to write hw: errno EMSGSIZE

it is stored in a bufferlist whose length is an unsigned int. If the
value of the len parameter is greater than UINT_MAX/2, rados_write,
rados_write_full and rados_append will fail with E2BIG.

Multiple calls to rados_write or rados_append can fill objects larger
than UINT_MAX/2.

http://tracker.ceph.com/issues/9592 Fixes: #9592

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-27 10:48:01 +02:00
Sage Weil
3f05fbf55b Merge pull request #2580 from cernceph/wip-scientific
ceph-disk: add Scientific Linux as a Redhat clone

Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-26 17:49:59 -07:00
Dan van der Ster
f8ac2248af ceph-disk: add Scientific Linux as a Redhat clone
Scientific Linux is a RHEL clone and needs to use partx.

Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 5ca7ea5b53)
2014-09-26 17:46:15 -07:00
Loic Dachary
9c3e01a217 Merge pull request #2568 from johnugeorge/wip-9492-crush
Crush: Ensuring at most num-rep osds are selected for any rule

Reviewed-by: Loic Dachary <loic-201408@dachary.org>
2014-09-27 00:25:48 +02:00
Johnu George
6b4d1aa997 Crush: Ensuring at most num-rep osds are selected
Crush temporary buffers are allocated as per replica size configured
by the user.When there are more final osds (to be selected as per
rule) than the replicas, buffer overlaps and it causes crash.Now, it
ensures that at most num-rep osds are selected even if more number of
osds are allowed by the rule.

Fixes: #9492

Signed-off-by: Johnu George <johnugeo@cisco.com>
2014-09-26 13:04:58 -07:00
Dan van der Ster
5ca7ea5b53 ceph-disk: add Scientific Linux as a Redhat clone
Scientific Linux is a RHEL clone and needs to use partx.

Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
2014-09-26 18:01:03 +02:00
John Spray
83fb32ca41 Merge pull request #2572 from ceph/wip-9562
osdc/Filer: drop probe/purge locks before calling objecter

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Yan, Zheng <ukernel@gmail.com>
2014-09-26 11:57:53 +01:00
Sage Weil
0a6f6a49b0 Merge pull request #2575 from ceph/wip-zafman-cleanup
osd: Remove unused PG functions queue_notify(), queue_info(), queue_log(...

Reviewed-by: Loic Dachary <loic@dachary.org>
2014-09-25 17:02:19 -07:00
Loic Dachary
7827e0035e os: io_event.res is the size written
And not an error code to be converted with cpp_strerror()

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-26 01:15:53 +02:00
Josh Durgin
b8562959de Merge pull request #2524 from ceph/wip-5768
rbd-fuse: Fix memory leak in enumerate_images

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2014-09-25 15:19:43 -07:00
Sage Weil
5c2984e6e1 Merge pull request #2531 from dachary/wip-9536-isa-alignment
erasure-code: isa plugin alignment fixes

Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-25 14:05:57 -07:00
Sage Weil
d851c3f233 osd: improve debug output for do_{notifies,queries,infos}
Hunting #9389

Signed-off-by: Sage Weil <sage@redhat.com>
2014-09-25 13:51:46 -07:00
Sage Weil
2ba5ed57b3 Merge pull request #2540 from ceph/wip-giant-messenger-fixes
giant messenger fixes

Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-25 13:01:38 -07:00
Sage Weil
126d0b30e9 osdc/Objecter: only post_rx_buffer if no op timeout
If we post an rx buffer and there is a timeout, the revocation can happen
while the reader has consumed the buffers but before it has decoded and
constructed the message.  In particular, we calculate a crc32c over the
data portion of the message after we've taken the buffers and dropped the
lock.

Instead of fixing this race (for example, by reverifying rx_buffers under
the lock while calculating the crc.. bleh), just skip the rx buffer
optimization entirely when a timeout is present.

Note that this doesn't cover the op_cancel() paths, but none of those users
provide static buffers to read into.

Fixes: #9582
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
2014-09-25 12:34:11 -07:00
Sage Weil
0115a55aa3 Merge pull request #2574 from ceph/wip-msgr-shutdown
msg: allow calling dtor immediately after ctor

Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-25 09:26:18 -07:00
Loic Dachary
ba02a5e638 erasure-code: test isa encode/decode with various object sizes
Create an encode_decode() helper method to be called from the
encode_decode test function with various object size arguments. The
helper method is a copy/paste of the previous test that was using a
single object of a fixed size. The test is slightly adapted to
accommodate for different object sizes but the logic is not modified.

The object sizes being tested are chosen to be under the size of the
required size alignment or on multiple pages, size aligned or not.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-25 18:05:01 +02:00
Loic Dachary
eb8fdfa4f5 erasure-code: add test for isa chunk_size method
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-25 18:04:58 +02:00
John Spray
7a468f358b msg: allow calling dtor immediately after ctor
Asserting on reaper_stop only made sense if the
messenger had ever been started: as it stood,
one couldn't create and destroy a messenger
without also starting and stopping it.

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-25 17:01:10 +01:00
Loic Dachary
af07d29e27 erasure-code: isa encode tests adapted to per chunk alignment
The encode tests use the alignment constraints. It has been changed to
be aligned on a per chunk basis instead of computing a more expensive
object alignement constraint. The test function is modified to take the
change into account but the logic is otherwise unmodified.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-25 17:39:16 +02:00
Loic Dachary
aa9d70be38 erasure-code: isa test compare chunks with memcmp instead of strncmp
Because they may contain null characters.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-25 17:39:16 +02:00
Loic Dachary
ed77178e7d erasure-code: run isa tests via libtool and valgrind
Because running valgrind with no libtool does not test the binary but
the enclosing shell script.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-25 17:39:16 +02:00
Loic Dachary
668c352721 erasure-code: do not use typed tests for isa
Because there only is one type.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-25 17:39:16 +02:00
Loic Dachary
28c2b6e4f2 erasure-code: isa uses per chunk alignment constraints
Copy code from the jerasure plugin to enforce alignment constraints per
chunk instead of using the total object size. It is simpler and reduces
the size of the chunks. See
c7daaaf5e6
for more information.

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-25 17:39:10 +02:00
Andreas Peters
6f4909ae59 erasure-code: [ISA] modify get_alignment function to imply a platform/compiler independent alignment constraint of 32-byte aligned buffer addresses & length 2014-09-25 17:37:27 +02:00
Sage Weil
75525712b2 doc/release-notes: v0.67.11
Signed-off-by: Sage Weil <sage@redhat.com>
2014-09-25 07:17:56 -07:00
Loic Dachary
0124d8ee75 Merge pull request #2571 from dachary/wip-9579-isa-documentation
documentation: erasure-code plugin isa does not require k/m

Reviewed-by: Andreas Peters <andreas.joachim.peters@cern.ch>
2014-09-25 15:37:54 +02:00
John Spray
8dc94a2d8c osdc/Filer: drop probe/purge locks before calling objecter
Fixes: #9562

Signed-off-by: John Spray <john.spray@redhat.com>
2014-09-25 13:53:36 +01:00
Loic Dachary
9593d8769f documentation: erasure-code plugin isa does not require k/m
http://tracker.ceph.com/issues/9579 Refs: #9579

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-25 12:01:59 +02:00
Abhishek Lekshmanan
688622424e mailmap: Yan Zheng affiliation
Also adding Yan Zheng to .peoplemap to track org. change

Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
2014-09-25 12:28:12 +05:30
Abhishek Lekshmanan
fc1380b1a2 mailmap: Thorsten Glaser affiliation
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
Reviewed-by: Thorsten Glaser <tg@mirbsd.de>
2014-09-25 12:27:26 +05:30
David Zafman
7973280a48 osd: Remove unused PG functions queue_notify(), queue_info(), queue_log()
Signed-off-by: David Zafman <dzafman@redhat.com>
2014-09-24 17:55:28 -07:00
Guang Yang
0f884fdb31 For pgls OP, get/put budget on per list session basis, instead of per OP basis, which could lead to deadlock.
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
2014-09-25 00:47:46 +00:00
Samuel Just
7f87cf1b1d ReplicatedPG: clean out completed trimmed objects as we go
Also, explicitely maintain a max number of concurrently trimming
objects.

Fixes: 9113
Backport: dumpling, firefly, giant
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-09-24 15:33:11 -07:00
Josh Durgin
7a39e7cbe6 Merge remote-tracking branch 'origin/giant' 2014-09-24 15:27:02 -07:00
Sage Weil
c5906eca2f Merge pull request #2567 from dachary/wip-6697-strncmp-vs-memcmp
tests: use memcmp to compare binary buffers

Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-24 07:30:37 -07:00
Loic Dachary
2cd9b5f969 tests: use memcmp to compare binary buffers
instead of strncmp because it will stop at the first \0

http://tracker.ceph.com/issues/6697 Fixes: #6697

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-09-24 16:00:44 +02:00
Loic Dachary
468d245a02 Merge pull request #2506 from dachary/wip-9304-unintended-implicit-ruleset
erasure-code: pool create must not always create a ruleset

Reviewed-by: João Eduardo Luís <joao@redhat.com>
2014-09-24 13:35:55 +02:00
John Spray
b8e6a6b180 Merge remote-tracking branch 'origin/giant' 2014-09-24 11:40:52 +01:00
Samuel Just
c17ac03a50 ReplicatedPG: don't move on to the next snap immediately
If we have a bunch of trimmed snaps for which we have no
objects, we'll spin for a long time.  Instead, requeue.

Fixes: #9487
Backport: dumpling, firefly, giant
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-09-23 16:28:04 -07:00
Sage Weil
255b430a87 osd: initialize purged_snap on backfill start; restart backfill if change
If we backfill a PG to a new OSD, we currently neglect to initialize
purged_snaps.  As a result, the first time the snaptrimmer runs it has to
churn through every deleted snap for all time, and to make matters worse
does so in one go with the PG lock held.  This leads to badness on any
cluster with a significant number of removed snaps that experiences
backfill.

Resolve this by initializing purged_snaps when we finish backfill.  The
backfill itself will clear out any stray snaps and ensure the object set
is in sync with purged_snaps.  Note that purged_snaps on the primary
that is driving backfill will not change during this period as the
snaptrimmer is not scheduled unless the PG is clean (which it won't be
during backfill).

If we by chance to interrupt backfill, go clean with other OSDs,
purge snaps, and then let this OSD rejoin, we will either restart
backfill (non-contiguous log) or the log will include the result of
the snap trim (the events that remove the trimmed snap).

Fixes: #9487
Backfill: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
2014-09-23 16:28:04 -07:00
Samuel Just
4be53d5eeb PG: check full ratio again post-reservation
Otherwise, we might queue 30 pgs for backfill at 0.80 fullness
and then never check again filling the osd after pg 11.

Fixes: #9574
Backport: dumpling, firefly, giant
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-09-23 12:53:41 -07:00