Commit Graph

24156 Commits

Author SHA1 Message Date
Sage Weil
64267eb3d8 test/librados/watch_notify: fix warning
In file included from test/librados/watch_notify.cc:8:0:
../src/gtest/include/gtest/gtest.h: In function ‘testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = long unsigned int, T2 = int]’:
../src/gtest/include/gtest/gtest.h:1300:30: instantiated from ‘static testing::AssertionResult testing::internal::EqHelper::Compare(const char*, const char*, const T1&, const T2&) [with T1 = long unsigned int, T2 = int, bool lhs_is_null_literal = false]’
test/librados/watch_notify.cc:67:224: instantiated from here
warning: ../src/gtest/include/gtest/gtest.h:1263:3: comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-22 14:57:45 -08:00
Sage Weil
53586e71f3 ceph-object-corpus: re-update
This was set by 9af94eea20, then single
paxos merge, then accidentally reverted by the next commit
6cb53740f2.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-22 14:40:10 -08:00
Samuel Just
2dae6a68ee PG::proc_replica_log: oinfo.last_complete must be *before* first entry in omissing
Fixes: #4189
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-22 14:19:55 -08:00
Sage Weil
e4fd70fcec Merge remote-tracking branch 'gh/wip-rbd-flatten-deadlock'
Reviewed-by: Sage Weil <sage@inktank.com>
2013-02-22 14:23:45 -08:00
Sage Weil
e03657e452 Merge remote-tracking branch 'gh/wip-objecter-fsx'
Reviewed-by: Sage Weil <sage@inktank.com>
2013-02-22 14:16:07 -08:00
David Zafman
d612a9abac Merge branch 'wip-3403-4-rebase'
Feature: #3403

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-22 12:50:19 -08:00
Josh Durgin
3105034067 objecter: don't resend linger ops unnecessarily
recalc_linger_op_target() was checking and then setting
linger_op->pgid and linger_op->active, but these were only set by
recalc_linger_op_target(). This was only called by handle_osd_map(),
so the first osdmap after a watch was established would cause a resend
of the watch. Analogous to the normal Op, set this information by
calling recalc_linger_op_target in send_linger().

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-21 23:33:46 -08:00
Josh Durgin
15bb9ba9fb objecter: initialize linger op snapid
Since they are write ops now, it must be CEPH_NOSNAP or the OSD
returns EINVAL.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-21 23:23:02 -08:00
David Zafman
5648117626 Add test for list_watchers() C++ interface
Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-02-21 21:50:02 -08:00
David Zafman
1c3241e3bf Add listwatchers command to rados
Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-02-21 21:50:02 -08:00
David Zafman
af339aee46 Add ObjectReadOperation and IoCtx functions
Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-02-21 21:50:02 -08:00
David Zafman
cfe923920c librados: expose a list of watchers on an object
Add new op CEPH_OSD_OP_LIST_WATCHERS
Add Objecter handling

Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-02-21 21:50:02 -08:00
David Zafman
bf5cf3318d Add rados_types.h header file
Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-02-21 21:50:01 -08:00
Sage Weil
dc181224ab osd/PG: fix typo, missing -> omissing
From ce7ffc3440.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-21 17:55:21 -08:00
Josh Durgin
94ae725465 test_librbd_fsx: fix image closing
Always close the image we opened in check_clone(), and check the
return code of the rbd_close() called before cloning.

Refs: #3958
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-21 17:39:22 -08:00
Sage Weil
6c08c7c1c6 objecter: separate out linger_read() and linger_mutate()
A watch is a mutation, while a notify is a read.  The mutations need to
pass in a proper snap context to be fully correct.

Also, make the WRITE flag implicit so the caller doesn't need to pass it
in.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-21 17:31:41 -08:00
Sage Weil
de4fa95f03 osd: make watch OSDOp print sanely
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-21 17:31:41 -08:00
Sage Weil
60ebf02a28 Merge branch 'next' 2013-02-21 17:30:46 -08:00
Sage Weil
dd007db3ca ceph_common.sh: fix iteration of items in ceph.conf
This broke in c8f528a407.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-21 17:30:06 -08:00
Dan Mick
6cb53740f2 ceph-conf.rst: missing '=' in example network settings
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2013-02-21 17:02:17 -08:00
Sage Weil
9af94eea20 Merge remote-tracking branch 'gh/wsp.bobtail.2merge' 2013-02-21 15:45:36 -08:00
Samuel Just
ce7ffc3440 PG::proc_replica_log: adjust oinfo.last_complete based on omissing
Otherwise, search_for_missing may neglect to check the missing
set for some objects assuming that if the need version is
prior to last_complete, the replica must have it.

Fixes: #4994
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-21 15:37:14 -08:00
Samuel Just
8086d1d8c0 Merge remote-tracking branch 'upstream/wip_clone_attrs'
Reviewed-by: Sage Weil <sage@inktank.com>
2013-02-21 14:42:33 -08:00
Greg Farnum
79f09bf33e MDS: remove a few other unnecessary is_base() checks
We should let users remove xattrs as well as set them. ;) And
the check in handle_client_setlayout was totally useless -- perhaps
intended for setdirlayout?

This is a follow-on to 9f82ae60fa and
should be taken wherever it goes.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-02-21 14:30:42 -08:00
Greg Farnum
9f82ae60fa mds: allow xattrs on the root inode
This was previously disallowed because Once Upon a Time, the root
inode wasn't persisted to disk and was an entirely in-memory construct. But
it's safe now, and has been for a while.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-02-21 14:21:08 -08:00
Greg Farnum
6bd8781dda mds: use inode_t::layout for dir layout policy
This cherry-pick is going in the reverse direction of normal. That's
because this direction makes for the minimal change -- this patchset
is required to fix the loss of directory layouts we were previously
seeing, but fixing it requires changing the encoding versions. So we
wrote it on top of Bobtail and let it update the struct_v's as they existed
then. Note that we here change a few encoding versions in ways which are
NOT COMPATIBLE with previous development code (but not any releases). In
particular, development code introduced and this removes the
file_layout_policy_t, and some of the CInode and EMetaBlob encoding
struct_v values were used in development code to mean one thing, but
mean something different due to the Bobtail patch.

Remove the default_file_layout struct, which was just a ceph_file_layout,
and store it in the inode_t.  Rip out all the annoying code that put this
on the heap.

To aid in this usage, add a clear_layout() function to inode_t.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 36ed407e0f)
Conflicts:

	src/mds/CInode.cc
	src/mds/CInode.h
	src/mds/MDCache.cc
	src/mds/Server.cc
	src/mds/events/EMetaBlob.h
Cherry-pick-
Reviewed-by: Sage Weil <sage@inktank.com>
2013-02-21 13:44:01 -08:00
Sage Weil
84ef1649c5 mds: parse ceph.*.layout vxattr key/value content
Use qi to parse a strictly formatted set of key/value pairs.  Be picky
about whitespace.  Any subset of recognized keys is allowed.  Parse the
same set of keys as the ceph.*.layout.* vxattrs.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5551aa5b3b)
2013-02-21 13:44:01 -08:00
Sage Weil
fea77682a6 osdc/Objecter: unwatch is a mutation, not a read
This was causing librados to unblock after the ACK on unwatch, which meant
that librbd users raced and tried to delete the image before the unwatch
change was committed..and got EBUSY.  See #3958.

The watch operation has a similar problem.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-21 13:28:47 -08:00
Samuel Just
81bd996428 FileStore::_clone: use _fsetattrs rather than _setattrs
The omap portion of the clone happened above in DBObjectMap::clone.
Only the fs stored attrs need to be explicitely copied.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-21 13:28:26 -08:00
Samuel Just
5b48e63c03 FileStore::_setattrs: use _fsetattrs
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-21 13:26:56 -08:00
Samuel Just
c33c51f01f FileStore: add _fsetattrs
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-21 13:26:40 -08:00
Samuel Just
2ec04f9633 FileStore::_setattrs: only do omap operations if necessary
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-21 13:25:49 -08:00
Samuel Just
83fad1c7f2 FileStore::_setattrs no need to grab an Index lock for the omap operations
Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-02-21 13:24:42 -08:00
Yehuda Sadeh
08efb158ae Merge pull request #67 from jaharkes/content_length
Handle empty CONTENT_LENGTH environment variable.
2013-02-21 12:59:06 -08:00
Jan Harkes
ad00fc72e1 Fix failing > 4MB range requests through radosgw S3 API.
When a range request is made for more than rgw_get_obj_max_req_size
bytes the first returned chunk sets 'ret' to STATUS_PARTIAL_CONTENT and
all remaining chunks behave as if there is an error state and only
return a minimal header.

Fix this by passing STATUS_PARTIAL_CONTENT to set_req_state_err, but
leave the 'ret' member variable untouched.

Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit c83a01d4e8)
2013-02-21 12:51:40 -08:00
Yehuda Sadeh
e5a01317db Merge pull request #66 from jaharkes/range_requests
Fix failing > 4MB range requests through radosgw S3 API.
2013-02-21 12:42:06 -08:00
Jan Harkes
96896eb092 Handle empty CONTENT_LENGTH environment variable.
nginx seems to be providing a CONTENT_LENGTH environment variable with no data
when the request body is empty.

Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
2013-02-21 15:36:30 -05:00
Jan Harkes
c83a01d4e8 Fix failing > 4MB range requests through radosgw S3 API.
When a range request is made for more than rgw_get_obj_max_req_size
bytes the first returned chunk sets 'ret' to STATUS_PARTIAL_CONTENT and
all remaining chunks behave as if there is an error state and only
return a minimal header.

Fix this by passing STATUS_PARTIAL_CONTENT to set_req_state_err, but
leave the 'ret' member variable untouched.

Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
2013-02-21 15:29:11 -05:00
Sage Weil
4277265d99 osd: an interval can't go readwrite if its acting is empty
Let's not forget that min_size can be zero.

Fixes: #4159
Signed-off-by: Sage Weil <sage@inktank.com>
2013-02-21 11:32:39 -08:00
Josh Durgin
a1ae856287 librbd: make sure racing flattens don't crash
The only way for a parent to disappear is a racing flatten completing,
or possibly in the future the image being forcibly removed. In either
case, continuing to flatten makes no sense, so stop early.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-21 11:26:49 -08:00
Josh Durgin
995ff0e3ea librbd: use rwlocks instead of mutexes for several fields
Image metadata like snapshots, size, and parent is frequently read,
but rarely updated. During flatten, we were depending on the parent
lock to prevent the parent ImageCtx from disappearing out from under
us while we read from it. The copy-up path also needed the parent lock
to be able to read from the parent image, which lead to a deadlock.

Convert parent_lock, snap_lock, and md_lock to RWLocks, and change
their use to read instead of exclusive locks where appropriate. The
main place exclusive locks are needed is in ictx_refresh, so this is
pretty simple. This fixes the deadlock, since parent_lock is only
needed for read access in both flatten and the copy-up operation.

cache_lock and refresh_lock are only really used for exclusive access,
so leave them as regular mutexes.

One downside to this is that there's no way to assert is_locked()
for RWLocks, so we'll have to be very careful about changing code
in the future.

Fixes: #3665
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-21 11:19:40 -08:00
Josh Durgin
e0f8e5a80d common: add lockers for RWLocks
This makes them easier to use, especially instead of existing mutexes.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-21 11:15:43 -08:00
Sage Weil
c8d0889df5 Merge branch 'next'
Conflicts:
	src/osd/ReplicatedPG.cc
2013-02-21 10:44:04 -08:00
Sage Weil
6d8dfb18fe osd: clear recovery state on pg removal
This ensures we release our in-progress recovery counters, which prevents
recovery from getting blocked indefinitely when a pool removal races with
recovery ops.

Fixes: #4217
Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-02-21 10:43:20 -08:00
Josh Durgin
94e5deebc6 test: fix run-rbd-tests pool deletion
Use the new safety check

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-21 10:38:38 -08:00
Joao Eduardo Luis
6612b0402e ceph-object-corpus: use temporary 'wsp.master.new' corpus until we get merged into master
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-02-21 18:29:36 +00:00
Joao Eduardo Luis
beafca57fb Merge branch 'wsp.bobtail.2merge' into wsp.bobtail.master
Conflicts:
	src/.gitignore
	src/Makefile.am
	src/include/ceph_features.h
	src/mon/MDSMonitor.cc
	src/mon/PGMonitor.cc
2013-02-21 18:04:22 +00:00
Joao Eduardo Luis
04dac7ee7a vstart.sh: Create mon data directory before --mkfs
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-02-21 18:02:23 +00:00
Joao Eduardo Luis
89f920492d test: ObjectMap: add a generic leveldb store tool
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-02-21 18:02:23 +00:00
Joao Eduardo Luis
cb85fb7d9a mon: ceph-mon: convert an old monitor store to the new format
With the single-paxos patches we shifted from an approach with multiple
paxos instances (one for each paxos service) keeping their own versions
to a single paxos instance for all the paxos services, thus ending up
with a single global version for paxos.

With the release of v0.52, the monitor started tracking these global
versions, keeping them for the single purpose of making it possible to
convert the store to a single-paxos format.

This patch now introduces a mechanism to convert a GV-enabled store to
the single-paxos format store when the monitor is upgraded.

As we require the global versions to be present, we first check if the
store has the GV feature set: if not we will not proceed, but we will
start the conversion otherwise.

In the end of the conversion, the monitor data directory will have a
brand new 'store.db' directory, where the key/value store lies,
alongside with the old store.  This makes it possible to revert to a
previous monitor version if things go sideways, without jeopardizing the
data in the store.

The conversion is done as during a rolling upgrade, without any
intervention by the user.  Fire up the new monitor version on an old
store, and the monitor itself will convert the store, trim any lingering
versions that might not be required, and proceed to start as expected.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-02-21 18:02:23 +00:00