Commit Graph

24097 Commits

Author SHA1 Message Date
Sage Weil
fea77682a6 osdc/Objecter: unwatch is a mutation, not a read
This was causing librados to unblock after the ACK on unwatch, which meant
that librbd users raced and tried to delete the image before the unwatch
change was committed..and got EBUSY.  See #3958.

The watch operation has a similar problem.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-21 13:28:47 -08:00
Sage Weil
4277265d99 osd: an interval can't go readwrite if its acting is empty
Let's not forget that min_size can be zero.

Fixes: #4159
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-21 11:32:39 -08:00
Sage Weil
c8d0889df5 Merge branch 'next'
Conflicts:
	src/osd/ReplicatedPG.cc
2013-02-21 10:44:04 -08:00
Sage Weil
6d8dfb18fe osd: clear recovery state on pg removal
This ensures we release our in-progress recovery counters, which prevents
recovery from getting blocked indefinitely when a pool removal races with
recovery ops.

Fixes: #4217
Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-21 10:43:20 -08:00
Sage Weil
5551aa5b3b mds: parse ceph.*.layout vxattr key/value content
Use qi to parse a strictly formatted set of key/value pairs.  Be picky
about whitespace.  Any subset of recognized keys is allowed.  Parse the
same set of keys as the ceph.*.layout.* vxattrs.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-20 17:09:49 -08:00
Samuel Just
b531aa3688 Merge branch 'wip_watch_cleanup'
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-02-20 13:29:31 -08:00
Samuel Just
0202bf2903 ReplicatedPG: allow multiple watches in one transaction
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-20 13:29:20 -08:00
Samuel Just
9a399afd71 doc: add some internal docs for watch/notify
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-20 13:29:20 -08:00
Samuel Just
661a28320b librados/: include watch cookie in notify_ack
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-20 13:29:20 -08:00
Samuel Just
8ece91ff21 ReplicatedPG: accept watch cookie value with notify ack
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-20 13:29:20 -08:00
Samuel Just
ebdf66dfbf Watch/Notify: rework watch/notify
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-20 13:29:20 -08:00
Samuel Just
7af3299797 osd/: move ObjectContext over to osd_types.h
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-20 13:29:20 -08:00
Samuel Just
22ec5bc315 PG: check object_contexts on flushed
At FlushedEvt, all outstanding io should be complete and
the object_contexts map should be empty.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-20 13:29:19 -08:00
Samuel Just
359c0dfd29 ReplicatedPG: add intrusive_ptr hooks
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-20 13:29:19 -08:00
Samuel Just
7fe7eff92e Timer.cc: use complete() rather than finish()
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-20 13:29:19 -08:00
Sage Weil
8713f18da8 osd: remove force hack for testing the HASHPSPOOL code
Also from 8cc2b0f124.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-20 13:25:58 -08:00
Sage Weil
ceb390f672 mon: allow syslog level and facility for cluster log to be controlled
Allow user to control the minimum level to go to syslog for the client-
and server-side submission paths for the cluster log, along with the syslog
'facility'.  See syslog(3) man page.

Also move the level checks into a LogEntry method.

Closes: #3704
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Luis <joao.luis@inktank.com>
2013-02-20 12:52:32 -08:00
Yehuda Sadeh
0201cc80d4 rgw: refactor header grants
Move definition to a static array.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-02-20 12:39:37 -08:00
caleb miles
eb0f49d4b6 rgw_acl: Support ACL grants in headers.
Issue 3669: Support S3 ACL grants specified in request headers. Allow
requests, excluding POST object, to specify ACL grants in HTTP headers.

Signed-off-by: caleb miles <caleb.miles@inktank.com>

Conflicts:
	src/rgw/rgw_acl_s3.cc
	src/rgw/rgw_acl_s3.h
	src/rgw/rgw_rest_s3.cc
	src/rgw/rgw_rest_s3.h

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-02-20 12:32:41 -08:00
Sage Weil
04f3fe4e2c mon: fix new pool type
I broke this in 8cc2b0f124.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-20 10:37:01 -08:00
Sage Weil
2e1b02bf01 osd: lock pg in build_past_intervals_parallel()
Methods called by write_if_dirty() (get_osdmap()) assert that the pg
is locked.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-20 10:22:48 -08:00
Sage Weil
473beb53c5 qa: mon/pool_ops.sh: fix last test
Got this one backwards, bah!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-20 08:44:03 -08:00
Greg Farnum
3692ccd696 doc: make the cephfs man page marginally more truthful
Put it in the right place this time.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-02-19 18:00:44 -08:00
Yehuda Sadeh
db99fb4417 rgw: fix multipart uploads listing
Fixes: #4177
Backport: bobtail
Listing multipart uploads had a typo, and was requiring the
wrong resource (uploadId instead of uploads).

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-02-19 17:59:09 -08:00
Yehuda Sadeh
34f885be53 rgw: don't copy object when it's copied into itself
Fixes: #4150
Backport: bobtail

When object copied into itself, object will not be fully copied: tail
reference count stays the same, head part is rewritten.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-02-19 17:58:52 -08:00
Greg Farnum
efc4947599 man: make the cephfs man page marginally more truthful
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-02-19 17:48:26 -08:00
Sage Weil
de892bbaf6 Merge branch 'wip-pool'
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-19 16:16:11 -08:00
Sage Weil
128cb17d87 osd/OSDMap: note OSDHASHPSPOOL feature when pool FLAG_HASHPSPOOL is set
This allows the osd and mon to enforce feature bits on their connections.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 16:12:09 -08:00
Sage Weil
b90167d6bc mon: move OSDMap feature bit calculation into an OSDMap method
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 16:12:09 -08:00
Sage Weil
8cc2b0f124 osd: introduce HASHPSPOOL pool flag, feature to avoid overlapping pg placements
The existing code will overlay the placement of PGs from pools because
it simply adds the ps to the pool as the CRUSH input.  That means that
the layout/placement for pg 0.10 == 1.9 == 2.8 == 3.7 == 4.6 == ...,
which is not optimal.

Instead, use hash(ps, poolid).  The avoids the initial problem of
the sequence being adjacent to other pools.  It also avoids the (small)
possibility that hash(poolid) will drop us somewhere in the output
number space where our sequence of outputs overlaps with some other
pool; instead, out output sequence will be a fully random (for a well-
behaved hash).

Use the multi-input hash functions used by CRUSH for this.

Default to the legacy behavior for now.  We won't enable this until
deployed systems and kernel code catch up.

Fixes: #4128
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 15:59:00 -08:00
Sage Weil
96e153aeef qa: mon/pool_ops.sh: test pool set size
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 15:54:52 -08:00
Sage Weil
7bd49d02ad qa: mon/pool_ops.sh: fix pool tests
The '! command' doesn't fail properly, even with -e, in bash (wtf!).

Also, the last pool deletion command succeeds because the pool
'--yes-i-really-really-mean-it' doesn't exist.  So drop that test.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 15:54:07 -08:00
Sage Weil
c86f9239a2 Merge remote-tracking branch 'gh/wip-4159' 2013-02-19 15:52:03 -08:00
Sage Weil
4eb9bf21cb test/bufferlist: fix warning
In file included from test/bufferlist.cc:31:0:
../src/gtest/include/gtest/gtest.h: In function ‘testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = unsigned int, T2 = int]’:
../src/gtest/include/gtest/gtest.h:1300:30: instantiated from ‘static testing::AssertionResult testing::internal::EqHelper::Compare(const char*, const char*, const T1&, const T2&) [with T1 = unsigned int, T2 = int, bool lhs_is_null_literal = false]’
test/bufferlist.cc:1604:227: instantiated from here
warning: ../src/gtest/include/gtest/gtest.h:1263:3: comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 15:33:20 -08:00
Gary Lowell
d0424ebced Merge branch 'master' of https://github.com/ceph/ceph 2013-02-19 14:55:14 -08:00
Gary Lowell
bcb210c677 Merge branch 'next' 2013-02-19 14:53:54 -08:00
Joe Buck
3ff0fe0fc7 testing: updating hadoop-internal test
Small tweaks to the hadoop-internal test
to better use existing environment varaibles.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Noah Watkins <noahwatkins@gmail.com>
2013-02-19 14:05:38 -08:00
Noah Watkins
f1bff178a4 qa: sample test for new replication tests
Signed-off-by: Joe Buck <jbbuck@gmail.com>
2013-02-19 14:05:11 -08:00
Sage Weil
60d9465b53 doc/release-notes: v0.57
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 13:50:18 -08:00
Samuel Just
dbadb3e292 PG: remove weirdness log for last_complete < log.tail
In the case of a divergent object prior to log.tail,
last_complete may end up before log.tail.

Backport: bobtail
Fixes #4174
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-02-19 10:52:49 -08:00
Sage Weil
5fc83c8d98 os/FileStore: check replay guard on src for collection rename
This avoids a problematic sequence like:

     - rename A/ -> B/
     - remove B/1...100
     - destroy B/
     - create A/
     - write A/101...
     <crash>
     - replay A/ -> B/
     - remove B/1...100  (fails but tolerated)
     - destroy B/        (fails with ENOTEMPTY)

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-19 10:41:09 -08:00
Sage Weil
56c5a07708 osd: requeue pg waiters at the front of the finished queue
We could have a sequence like:

- op1
- notify
- op2

in the finished queue.  Op1 gets put on waiting_for_pg, the notify
creates the pg and requeues op1 (and the end), op2 is handled, and
finally op1 is handled.  That breaks ordering; see #2947.

Instead, when we wake up a pg, queue the waiting messages at the front
of the dispatch queue.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-19 10:41:09 -08:00
Sage Weil
f1841e4189 osd: pull requeued requests off one at a time
Pull items off the finished queue on at a time.  In certain cases, an
event may result in new items betting added to the finished queue that
will be put at the *front* instead of the back.  See latest incarnation
of #2947.

Note that this is a significant changed in behavior in that we can
theoretically starve if an event keeps resulting in new events getting
generated.  Beware!

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-19 10:41:09 -08:00
Gary Lowell
9a7a9d06c0 v0.57 2013-02-19 10:07:42 -08:00
Sage Weil
4002d70ac0 osd: fix printf warning on pg_log_entry_t::get_key_name
warning: osd/osd_types.cc:1716:76: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'version_t {aka long long unsigned int}' [-Wformat]
warning: osd/osd_types.cc:1716:76: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'version_t {aka long long unsigned int}' [-Wformat]

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 09:12:52 -08:00
Sage Weil
f80f84936e qa: test_mon_workloadgen: use default config file path
I'm not sure why we wouldn't.  Also, this makes this test work without
annoying plumbing to pass the explicit path through.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 09:08:57 -08:00
Sage Weil
6d338591b7 qa: mon/workloadgen.sh: drop TEST_CEPH_CONF code
The binaries already pick up on CEPH_CONF, which will be set as needed.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-19 09:02:14 -08:00
Sage Weil
8ca2274cc0 rbd: udevadm settle before unmap
udev runs blkid on device close, and other such nonsense that can
make unmap fail with EBUSY.  Settle before we unmap to avoid this if
possible.  See #4183.

Closes: #4186
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2013-02-19 08:44:34 -08:00
Joe Buck
b45f67e0b5 test: correcting hadoop-internal tests
Changing the hadoop-internal tests to use the
newly added $TESTDIR environment variable.
Also, removed unneeded variables.

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-19 08:36:36 -08:00
Joe Buck
d2dbab1f4f testing: adding a Hadoop wordcount test
Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-19 08:35:13 -08:00