Commit Graph

24086 Commits

Author SHA1 Message Date
Sage Weil
0f42eddef5 msgr: drop messages on cons with CLOSED Pipes
Back in commit 6339c5d439, we tried to make
this deal with a race between a faulting pipe and new messages being
queued.  The sequence is

- fault starts on pipe
- fault drops pipe_lock to unregister the pipe
- user (objecter) queues new message on the con
- submit_message reopens a Pipe (due to this bug)
- the message managed to make it out over the wire
- fault finishes faulting, calls ms_reset
- user (objecter) closes the con
- user (objecter) resends everything

It appears as though the previous patch *meant* to drop *m on the floor in
this case, which is what this patch does.  And that fixes the crash I am
hitting; see #4271.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-02-28 16:57:42 -08:00
Samuel Just
5d54ab154c FileJournal::wrap_read_bl: adjust pos before returning
Otherwise, we may feed an offset past the end of the journal to
check_header in read_entry and incorrectly determine that the entry is
corrupt.

Fixes: 4296
Backport: bobtail
Backport: argonaut
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-28 11:08:27 -08:00
Josh Durgin
f58601d681 librbd: fix rollback size
The duplicate calls to get_image_size() and get_snap_size() replaced
by 5806226cf0 uncovered this. The first
call was using the currently set snap_id instead of the snapshot being
rolled back to.

Fixes: #4272
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-02-26 14:55:11 -08:00
Sage Weil
c8dd2b67b3 msg: fix entity_addr_t::is_same_host() for IPv6
We weren't checking the memcmp return value properly!  Aie...

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-26 14:07:12 -08:00
Sage Weil
95a379aa73 ceph_common.sh: tolerate missing mds, mon, osds in conf
With set -e this seems to fail (at least on some machines) if, say, there
is no MDS in the conf file.  This fixes it.

Tested-by: Mark Nelson <mark.nelson@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-26 11:10:44 -08:00
Sage Weil
9096d70642 Merge remote-tracking branch 'gh/wip-4249' into next 2013-02-25 17:48:07 -08:00
Josh Durgin
9d472ca75d systest: restrict list error acceptance
Only ignore errors after the midway point if the midway_sem_post is
defined.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 5b24a68b6e)
2013-02-25 16:50:47 -08:00
Josh Durgin
b64d26176a systest: fix race with pool deletion
The second test have pool deletion and object listing wait on the same
semaphore to connect and start. This led to errors sometimes when the
pool was deleted before it could be opened by the listing process. Add
another semaphore so the pool deletion happens only after the listing
has begun.

Fixes: #4147
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit b0271e3905)
2013-02-25 16:50:44 -08:00
Josh Durgin
5806226cf0 librbd: drop snap_lock before invalidating cache
Writeback will take the snap_lock, so read everything we need under it
before invalidating the cache. This avoids a recursive lock when writeback
uses snap_lock while snap_rollback() was holding it.

Remove a not-very-useful debugging message that depended on snap_lock being held.

Fixes: #4249
Backport: bobtail
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-25 11:36:58 -08:00
Sage Weil
0cd215ee5b mds: reencode MDSMap in MMDSMap if MDSENC feature is not present
In some cases the MMDSMap message from mon -> client passes from leader ->
peon -> client, and the leader doesn't encode with the correct feature
bits.  As with MMOSDMap, we reencode the nested MDSMap based on the
features if relevant bits are not present.

We forgot to include this with the mds encoding changes.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-23 16:36:52 -08:00
Sage Weil
c07e8ea7dd qa/run_xfstests.sh: use $TESTDIR instead of /tmp/cephtest
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-23 08:38:10 -08:00
Sage Weil
8235b16c1a osd: an interval can't go readwrite if its acting is empty
Let's not forget that min_size can be zero.

Fixes: #4159
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4277265d99)
2013-02-23 08:34:07 -08:00
Dan Mick
8c05af5dc3 configuration parsing: give better error for missing =
A ceph.conf line with "key" and no "= value" currently shows
"unexpected character while parsing putative key value,
at char N line M".  There's no reason it can't be clearer.

Fixes: #4229
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-02-21 21:45:27 -08:00
Sage Weil
dd007db3ca ceph_common.sh: fix iteration of items in ceph.conf
This broke in c8f528a407.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-21 17:30:06 -08:00
Greg Farnum
6bd8781dda mds: use inode_t::layout for dir layout policy
This cherry-pick is going in the reverse direction of normal. That's
because this direction makes for the minimal change -- this patchset
is required to fix the loss of directory layouts we were previously
seeing, but fixing it requires changing the encoding versions. So we
wrote it on top of Bobtail and let it update the struct_v's as they existed
then. Note that we here change a few encoding versions in ways which are
NOT COMPATIBLE with previous development code (but not any releases). In
particular, development code introduced and this removes the
file_layout_policy_t, and some of the CInode and EMetaBlob encoding
struct_v values were used in development code to mean one thing, but
mean something different due to the Bobtail patch.

Remove the default_file_layout struct, which was just a ceph_file_layout,
and store it in the inode_t.  Rip out all the annoying code that put this
on the heap.

To aid in this usage, add a clear_layout() function to inode_t.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 36ed407e0f)
Conflicts:

	src/mds/CInode.cc
	src/mds/CInode.h
	src/mds/MDCache.cc
	src/mds/Server.cc
	src/mds/events/EMetaBlob.h
Cherry-pick-
Reviewed-by: Sage Weil <sage@inktank.com>
2013-02-21 13:44:01 -08:00
Sage Weil
84ef1649c5 mds: parse ceph.*.layout vxattr key/value content
Use qi to parse a strictly formatted set of key/value pairs.  Be picky
about whitespace.  Any subset of recognized keys is allowed.  Parse the
same set of keys as the ceph.*.layout.* vxattrs.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5551aa5b3b)
2013-02-21 13:44:01 -08:00
Jan Harkes
ad00fc72e1 Fix failing > 4MB range requests through radosgw S3 API.
When a range request is made for more than rgw_get_obj_max_req_size
bytes the first returned chunk sets 'ret' to STATUS_PARTIAL_CONTENT and
all remaining chunks behave as if there is an error state and only
return a minimal header.

Fix this by passing STATUS_PARTIAL_CONTENT to set_req_state_err, but
leave the 'ret' member variable untouched.

Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit c83a01d4e8)
2013-02-21 12:51:40 -08:00
Sage Weil
6d8dfb18fe osd: clear recovery state on pg removal
This ensures we release our in-progress recovery counters, which prevents
recovery from getting blocked indefinitely when a pool removal races with
recovery ops.

Fixes: #4217
Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-21 10:43:20 -08:00
Yehuda Sadeh
0201cc80d4 rgw: refactor header grants
Move definition to a static array.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-02-20 12:39:37 -08:00
caleb miles
eb0f49d4b6 rgw_acl: Support ACL grants in headers.
Issue 3669: Support S3 ACL grants specified in request headers. Allow
requests, excluding POST object, to specify ACL grants in HTTP headers.

Signed-off-by: caleb miles <caleb.miles@inktank.com>

Conflicts:
	src/rgw/rgw_acl_s3.cc
	src/rgw/rgw_acl_s3.h
	src/rgw/rgw_rest_s3.cc
	src/rgw/rgw_rest_s3.h

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-02-20 12:32:41 -08:00
Sage Weil
2e1b02bf01 osd: lock pg in build_past_intervals_parallel()
Methods called by write_if_dirty() (get_osdmap()) assert that the pg
is locked.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-20 10:22:48 -08:00
Yehuda Sadeh
db99fb4417 rgw: fix multipart uploads listing
Fixes: #4177
Backport: bobtail
Listing multipart uploads had a typo, and was requiring the
wrong resource (uploadId instead of uploads).

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-02-19 17:59:09 -08:00
Yehuda Sadeh
34f885be53 rgw: don't copy object when it's copied into itself
Fixes: #4150
Backport: bobtail

When object copied into itself, object will not be fully copied: tail
reference count stays the same, head part is rewritten.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-02-19 17:58:52 -08:00
Sage Weil
4eb9bf21cb test/bufferlist: fix warning
In file included from test/bufferlist.cc:31:0:
../src/gtest/include/gtest/gtest.h: In function ‘testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = unsigned int, T2 = int]’:
../src/gtest/include/gtest/gtest.h:1300:30: instantiated from ‘static testing::AssertionResult testing::internal::EqHelper::Compare(const char*, const char*, const T1&, const T2&) [with T1 = unsigned int, T2 = int, bool lhs_is_null_literal = false]’
test/bufferlist.cc:1604:227: instantiated from here
warning: ../src/gtest/include/gtest/gtest.h:1263:3: comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 15:33:20 -08:00
Gary Lowell
d0424ebced Merge branch 'master' of https://github.com/ceph/ceph 2013-02-19 14:55:14 -08:00
Gary Lowell
bcb210c677 Merge branch 'next' 2013-02-19 14:53:54 -08:00
Joe Buck
3ff0fe0fc7 testing: updating hadoop-internal test
Small tweaks to the hadoop-internal test
to better use existing environment varaibles.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Noah Watkins <noahwatkins@gmail.com>
2013-02-19 14:05:38 -08:00
Noah Watkins
f1bff178a4 qa: sample test for new replication tests
Signed-off-by: Joe Buck <jbbuck@gmail.com>
2013-02-19 14:05:11 -08:00
Sage Weil
60d9465b53 doc/release-notes: v0.57
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 13:50:18 -08:00
Samuel Just
dbadb3e292 PG: remove weirdness log for last_complete < log.tail
In the case of a divergent object prior to log.tail,
last_complete may end up before log.tail.

Backport: bobtail
Fixes #4174
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-02-19 10:52:49 -08:00
Sage Weil
5fc83c8d98 os/FileStore: check replay guard on src for collection rename
This avoids a problematic sequence like:

     - rename A/ -> B/
     - remove B/1...100
     - destroy B/
     - create A/
     - write A/101...
     <crash>
     - replay A/ -> B/
     - remove B/1...100  (fails but tolerated)
     - destroy B/        (fails with ENOTEMPTY)

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-19 10:41:09 -08:00
Sage Weil
56c5a07708 osd: requeue pg waiters at the front of the finished queue
We could have a sequence like:

- op1
- notify
- op2

in the finished queue.  Op1 gets put on waiting_for_pg, the notify
creates the pg and requeues op1 (and the end), op2 is handled, and
finally op1 is handled.  That breaks ordering; see #2947.

Instead, when we wake up a pg, queue the waiting messages at the front
of the dispatch queue.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-19 10:41:09 -08:00
Sage Weil
f1841e4189 osd: pull requeued requests off one at a time
Pull items off the finished queue on at a time.  In certain cases, an
event may result in new items betting added to the finished queue that
will be put at the *front* instead of the back.  See latest incarnation
of #2947.

Note that this is a significant changed in behavior in that we can
theoretically starve if an event keeps resulting in new events getting
generated.  Beware!

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-19 10:41:09 -08:00
Gary Lowell
9a7a9d06c0 v0.57 2013-02-19 10:07:42 -08:00
Sage Weil
4002d70ac0 osd: fix printf warning on pg_log_entry_t::get_key_name
warning: osd/osd_types.cc:1716:76: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'version_t {aka long long unsigned int}' [-Wformat]
warning: osd/osd_types.cc:1716:76: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'version_t {aka long long unsigned int}' [-Wformat]

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 09:12:52 -08:00
Sage Weil
f80f84936e qa: test_mon_workloadgen: use default config file path
I'm not sure why we wouldn't.  Also, this makes this test work without
annoying plumbing to pass the explicit path through.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 09:08:57 -08:00
Sage Weil
6d338591b7 qa: mon/workloadgen.sh: drop TEST_CEPH_CONF code
The binaries already pick up on CEPH_CONF, which will be set as needed.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 09:02:14 -08:00
Sage Weil
8ca2274cc0 rbd: udevadm settle before unmap
udev runs blkid on device close, and other such nonsense that can
make unmap fail with EBUSY.  Settle before we unmap to avoid this if
possible.  See #4183.

Closes: #4186
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2013-02-19 08:44:34 -08:00
Joe Buck
b45f67e0b5 test: correcting hadoop-internal tests
Changing the hadoop-internal tests to use the
newly added $TESTDIR environment variable.
Also, removed unneeded variables.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-19 08:36:36 -08:00
Joe Buck
d2dbab1f4f testing: adding a Hadoop wordcount test
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-19 08:35:13 -08:00
Sage Weil
45a4fe0915 qa: rbd map-snapshot-io: udevadm settle
Udev runs blkid on device close, thwarting any rbd unmap that
immediately follows use of the device.  Explicitly settle for now.

See #4183.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-18 20:32:41 -08:00
Sage Weil
8e0be54857 debian: allow extra args to get passed to ./configure via the environment
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-18 17:07:55 -08:00
Sage Weil
231dc1bee6 qa: rbd/map-snapshot-io: remove image when done
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-18 11:24:46 -08:00
Sage Weil
1a7a57ac8f qa: fix quoting of wget URLs
Broke this in ae0c2bbb50.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-18 10:58:10 -08:00
Sage Weil
3612ed617e osd: log weirdness if caller_ops hash gets bigger than the log
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-18 10:53:11 -08:00
Sage Weil
c2f2e563c3 Merge pull request #65 from javacruft/wip-ocf-rbd
Strip any trailing whitespace from rbd showmapped

Reviewed-by: Sage Weil <sage@inktank.com>
2013-02-18 09:18:54 -08:00
James Page
ad84ea07ca Strip any trailing whitespace from rbd showmapped
More recent versions of ceph append a bit of whitespace to the line
after the name of the /dev/rbdX device; this causes the monitor check
to fail as it can't find the device name due to the whitespace.

This fix excludes any characters after the /dev/rbdN match.
2013-02-18 16:24:54 +00:00
Sage Weil
133d0ea2ce buffer: drop large malloc tests
These succeed on my machine and eat unseemly amounts of RAM.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-17 21:47:30 -08:00
Sage Weil
7fcbfdc09b buffer: put big buffer on heap, not stack
This fixes a segfault on my x86_64 wheezy box.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-17 21:47:07 -08:00
Loic Dachary
fb472a57c6 unit tests for src/common/buffer.{cc,h}
Implement unit tests covering most lines of code ( > 92% ) and all
methods as show by the output of make check-coverage :
http://dachary.org/wp-uploads/2013/03/ceph-lcov/ .

The following static constructors are implemented by opaque classes
defined in buffer.cc ( buffer::raw_char, buffer::raw_posix_aligned
etc. ). Testing the implementation of these classes is done by
variations of the calls to the static constructors.

    copy(const char *c, unsigned len);
    create(unsigned len);
    claim_char(unsigned len, char *buf);
    create_malloc(unsigned len);
    claim_malloc(unsigned len, char *buf);
    create_static(unsigned len, char *buf);
    create_page_aligned(unsigned len);

The raw_mmap_pages class cannot be tested because it is commented out in
raw_posix_aligned. The raw_hack_aligned class is only tested under Cygwin.
The raw_posix_aligned class is not tested under Cygwin.

The unittest_bufferlist.sh script calls unittest_bufferlist with the
CEPH_BUFFER_TRACK=true environment variable to enable the code
tracking the memory usage. It cannot be done within the bufferlist.cc
file itself because it relies on the initialization of a global
variable  ( buffer_track_alloc ).

When raw_posix_aligned is called on DARWIN, the data is not aligned
on CEPH_PAGE_SIZE because it calls valloc(size) which is the equivalent of
memalign(sysconf(_SC_PAGESIZE),size) and not memalign(CEPH_PAGE_SIZE,size).
For this reason the alignment test is de-activated on DARWIN.

The tests are grouped in

TEST(BufferPtr, ... ) for buffer::ptr
TEST(BufferListIterator, ...) for buffer::list::iterator
TEST(BufferList, ...) for buffer::list
TEST(BufferHash, ...) for buffer::hash

and each method ( and all variations of the prototype ) are
included into a single TEST() function.

Although most aspects of the methods are tested, including exceptions
and border cases, inconsistencies are not highlighted . For
instance

    buffer::list::iterator i;
    i.advance(1);

would dereference a buffer::raw NULL pointer although

    buffer::ptr p;
    p.wasted()

asserts instead of dereferencing the buffer::raw NULL pointer. It
would be better to always assert in case a NULL pointer is about to be
used. But this is a minor inconsistency that is probably not worth a
test.

The following buffer::list methods

    ssize_t read_fd(int fd, size_t len);
    int write_fd(int fd) const;

are not fully tested because the border cases cannot be reliably
reproduced. Going thru a pointer indirection when calling the ::writev
or safe_read functions would allow the test to create mockups to synthetize
the conditions for border cases.

tracker.ceph.com/issues/4066 refs #4066

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-02-17 21:30:51 -08:00