Commit Graph

32979 Commits

Author SHA1 Message Date
Sage Weil
68e27116d0 Merge pull request #1609 from ceph/wip-7739
mds: fix some uninitialized message fields

Reviewed-by: Zheng Yan <zheng.z.yan@intel.com>
2014-04-06 17:56:05 -07:00
Sage Weil
76cbd5dd82 mds: fix uninit MMDSSlaveRequest lock_type
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-06 17:36:52 -07:00
Samuel Just
c0fd3df41e Merge pull request #1608 from ceph/wip-8002
osd: fix osd map subscribe on YOU_DIED osd_ping

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-06 16:32:38 -07:00
Sage Weil
4ea9e4818f osd: fix map subscription in YOU_DIED osd_ping handler
If we have epoch X and find out we died as of epoch Y, we still want to
request X+1.  Among other things, this fixes a 'stall' if Y happens to be
the most recent map published and no new maps are generated because we will
never get anything back from our subscription.

This makes this osdmap_subscribe() caller match every other caller by
passing in current epoch + 1.

Fixes: #8002
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-06 16:03:50 -07:00
Sage Weil
2f7522c83a msgr: add ms_dump_on_send option
This is useful only for debugging.  The encoded contents of a message are
dumped to the log on message send.  This is useful when valgrind is
triggering warnings about uninitialized memory in messages because the
call chain will indicate which message type is to blame, whereas the
usual writer thread context does not tell us any useful information.

Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-06 13:19:11 -07:00
Sage Weil
87e6a62e4f mds: fix uninitialized fields in MDiscover
Fixes: #7739
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-06 13:18:40 -07:00
Sage Weil
67fd4218d3 mon: wait for quorum for MMonGetVersion
We should not respond to checks for map versions when we are in the
probing or electing states or else clients will get incorrect results when
they ask what the latest map version is.

Fixes: #7997
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-05 16:58:55 -07:00
Yan, Zheng
a75af4c253 client: try shrinking kernel inode cache when trimming session caps
Notify kernel to invalidate top level directory entries. As a side
effect, the kernel inode cache get shrinked.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-05 21:23:54 +08:00
Yan, Zheng
82015e409d client: release clean pages if no open file want RDCACHE
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-05 10:35:22 +08:00
Sage Weil
9484daf9d8 osd: disable agent when stats_invalid (post-split)
After a split the pg stats are approximate but not precisely correct.  Any
inaccuracy can be problematic for the agent because it determines the
level of effort and potentially full/blocking behavior based on that.

We could concievably do some estimation here that is "safe" in that we
don't commit to too much effort (or back off later if it isn't paying off)
and never block, but that is error-prone.

Instead, just disable the agent until a scrub makes the stats reliable
again.

We should document that a scrub after split is recommended (in any case)
and especially important on cache tiers, but there are currently *no*
user docs about PG splitting.

Fixes: #7975
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-04 18:15:04 -07:00
Sage Weil
6a4c50d7f2 Merge pull request #1605 from ceph/wip-7993
ceph-post-file: use getopt for multiple options, add longopts to help

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-04 18:07:52 -07:00
Greg Farnum
232ac1a52a OSD: _share_map_outgoing whenever sending a message to a peer
This ensures that they get new maps before an op which requires them (that
they would then request from the monitor).

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-04-04 16:06:05 -07:00
Dan Mick
6f40b64463 ceph-post-file: use getopt for multiple options, add longopts to help
Fixes: #7993
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2014-04-04 15:26:42 -07:00
Samuel Just
ebb865b12c Merge pull request #1603 from ceph/wip-7983
osd/ReplicatedPG: do not hit_set_persist while potentially backfilling hit_set_*

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-04 15:17:00 -07:00
Dan Mick
f2edd959fc Merge pull request #1604 from ceph/wip-7992
ceph-post-file: fix installation of ssh key files
2014-04-04 14:41:02 -07:00
Sage Weil
2f6a62b457 ceph-post-file: fix installation of ssh key files
Fixes: #7992
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-04 14:39:56 -07:00
Sage Weil
e02b7f93ab osd/ReplicatedPG: do not hit_set_persist while potentially backfilling hit_set_*
The hit_set transactions may include both a modify of the new hit_set and
deletion of an old one, spanning the backfill boundary, and we may end up
sending a backfill target a blank transaction that does not correctly
remove the old object.  Later it will notice the stray object and
throw an assertion.

Fix this by skipping hit_set_persist() if any of the backfill targets are
still working on the very first hash value in the PG (which is where all
of the hit_set objects live).  This is coarse but simple.

Another solution would be to send separate ops for the trim/deletion and
new hit_set update, but that is a bit more complex and a bit more
runtime overhead (twice the messages).

Fixes: #7983
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-04 13:56:33 -07:00
Sage Weil
4aef403dbc doc/release-notes: note about emperor backport of mon auth fix
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-04 12:59:41 -07:00
Joao Eduardo Luis
db266a3fb2 mon: MonCommands.h: have 'auth' read-only operations require 'x' cap
This reintroduces the same semantics that were in place in dumpling prior
to the refactoring of the cap/command matching code.

We haven't added this requirement to auth read-write operations as that
would have the potential to break a lot of well-configured keyrings once
the users upgraded, without any significant gain -- we assume that if
they have set 'rw' caps on a given entity, they are indeed expecting said
entity to be sort-of-privileged entities with regard to monitor access.

Fixes: #7919

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-04 12:51:27 -07:00
Greg Farnum
9caf3dbc37 Migrator: use a null ref instead of NULL when calling into path_traverse
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-04-04 10:40:52 -07:00
Greg Farnum
0c9af9397c Migrator: use MDRequestRef and MutationRef instead of raw pointers
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-04-04 10:40:51 -07:00
Greg Farnum
3429dc594d SimpleLock: use MutationRef instead of raw pointers
While we're here, remove the non-const get_xlock_by() (because
we don't need it). Also note we return a full MutationRef
(instead of a ref to the stored one). It's necessary in case we
don't have a set-up more() object.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-04-04 10:40:51 -07:00
Greg Farnum
c09878e92a Mutation: move self_ref into MutationImpl instead of MDRequestImpl
We keep an MDRequestImpl::set_self_ref(MDRequestRef&) function so
that we don't need to do the pointer conversion elsewhere.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-04-04 10:40:51 -07:00
Greg Farnum
3be138f51f Mutation: rename to MutationImpl and define MutationRef
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-04-04 10:40:51 -07:00
Greg Farnum
f41a2f87c2 Locker: use MDRequestRef instead of MDRequest*
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-04-04 10:39:58 -07:00
Greg Farnum
5872c2d873 MDCache: use MDRequestRef instead of MDRequest*
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-04-04 10:39:58 -07:00
Greg Farnum
565b2c8938 Server: Use MDRequestRef instead of raw pointers
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-04-04 10:39:58 -07:00
Greg Farnum
90ceb7c557 MDS: Convert the request_start* functions and their immediate callers
Also, the active_requests mapping gets weak pointers.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-04-04 10:39:58 -07:00
Greg Farnum
f773307ec9 mds: MDRequest: rename to MDRequestImpl, and declare MDRequestRef
We're switching the MDRequest to be used as a shared pointer. This is the
first step on the path to inserting an OpTracker into the MDS.
Give the MDRequestImpl a weak_ptr self_ref so that we can keep
using the elist for now.

Signed-off-by: Greg Farnum <greg@inktank.com>
2014-04-04 10:39:58 -07:00
Greg Farnum
fd235cddf1 include/memory: add static_pointer_cast
Signed-off-by: Greg Farnum <greg@inktank.com>
2014-04-04 10:38:52 -07:00
Samuel Just
82d2551c8c Merge pull request #1602 from ceph/wip-cache-create-fix
ReplicatedPG: fix CEPH_OSD_OP_CREATE on cache pools

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-04-04 10:34:40 -07:00
Yan, Zheng
d12167812b client: fix null pointer dereference in Client::unlink
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-05 00:49:37 +08:00
Yan, Zheng
f68e60ea6c ObjectCacher: assert no waiter when remove buffer head
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-05 00:49:37 +08:00
Yan, Zheng
4be0b6b103 client: cleanup Client::_invalidate_inode_cache()
drop parameter 'keep_caps'

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-05 00:49:36 +08:00
Yan, Zheng
abc19dd4f2 client: drop Fr cap before gettattr CEPH_STAT_CAP_SIZE
When MDS receives the getattr request, corresponding inode's filelock
can be in unstable state which waits for client's Fr cap.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-05 00:48:47 +08:00
Yan, Zheng
954007e6df client: properly retain used caps
Pass properly 'retain' to Client::send_cap() because it is used to
adjust cap->issued.

Also make Client::encode_inode_release() not release used/dirty caps.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-05 00:48:33 +08:00
Yan, Zheng
2d5bd84b93 client: assign implemented caps to caps field of MClientCaps
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-05 00:47:51 +08:00
Yan, Zheng
1538a98a4f client: hold Fcr caps during readahead
Fcr caps prevent the file from being truncated.

Fixes: #7958
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-05 00:47:51 +08:00
Yan, Zheng
701c22a81b client: implement RDCACHE reference tracking
make the code be able to track Fc caps used by aysnc buffer reads

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-05 00:47:45 +08:00
Ilya Dryomov
b219c8f917 ReplicatedPG: fix CEPH_OSD_OP_CREATE on cache pools
The following

./ceph osd pool create data-cache 8 8
./ceph osd tier add data data-cache
./ceph osd tier cache-mode data-cache writeback
./ceph osd tier set-overlay data data-cache

./rados -p data create foo
./rados -p data stat foo

results in

  error stat-ing data/foo: No such file or directory

even though foo exists in the data-cache pool, as it should.  STAT
checks for (exists && !is_whiteout()), but the whiteout flag isn't
cleared on CREATE as it is on WRITE and WRITEFULL.  The problem is
that, for newly created 0-sized cache pool objects, CREATE handler in
do_osd_ops() doesn't get a chance to queue OP_TOUCH, and so the logic
in prepare_transaction() considers CREATE to be a read and therefore
doesn't clear whiteout.  Fix it by allowing CREATE handler to queue
OP_TOUCH at all times, mimicking WRITE and WRITEFULL behaviour.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-04-04 20:23:14 +04:00
Sage Weil
2bd548e915 Merge pull request #1600 from ceph/wip-7922
Wip 7922

Passes my manual testing and the new teuthology test case.

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-04 09:22:42 -07:00
David Zafman
be8b228140 osd: Send REJECT to all previously acquired reservations
When getting a REJECT from a backfill target, tell already GRANTed targets to
go back to RepNotRecovering state by sending a REJECT to them.

Fixes: #7922

Signed-off-by: David Zafman <david.zafman@inktank.com>
2014-04-03 22:13:17 -07:00
Sage Weil
18201efd65 doc/release-notes: v0.79 release notes
Signed-off-by: Sage Weil <sage@inktank.com>
2014-04-03 18:28:15 -07:00
Dan Mick
4dc62669ec Fix byte-order dependency in calculation of initial challenge
Fixes: #7977
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-03 18:28:15 -07:00
Samuel Just
6cb50d74a3 ReplicatedPG::_delete_oid: adjust num_object_clones
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-04-03 17:53:42 -07:00
Samuel Just
0f2ab4dd76 ReplicatedPG::agent_choose_mode: improve debugging
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-04-03 17:53:40 -07:00
Sage Weil
80a1ed8a74 Merge pull request #1599 from ceph/wip-7978
rgw: only look at next placement rule if we're not at the last rule

Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-03 17:44:13 -07:00
Yehuda Sadeh
0552ecbabb rgw: only look at next placement rule if we're not at the last rule
Fixes: #7978
We tried to move to the next placement rule, but we were already at the
last one, so we ended up looping forever.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2014-04-03 15:15:41 -07:00
Samuel Just
eb23ac46e9 ReplicatedPG::agent_choose_mode: use num_user_objects for target_max_bytes calc
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-04-03 13:04:41 -07:00
Samuel Just
cc9ca67af3 ReplicatedPG::agent_choose_mode: exclude omap objects for ec base pool
Fixes: #7831
Signed-off-by: Samuel Just <sam.just@inktank.com>
2014-04-03 13:04:03 -07:00