Commit Graph

12284 Commits

Author SHA1 Message Date
Yehuda Sadeh
e493c7ae93 osd: handle notify-ack 2010-11-23 16:47:11 -08:00
Yehuda Sadeh
3110e36144 osd: basic watch/notify handling 2010-11-23 16:47:11 -08:00
Yehuda Sadeh
2bce34e78d osd: handle watch op, register client on object xattr 2010-11-23 16:47:11 -08:00
Colin Patrick McCabe
2f13dd8ed9 gui: more reindenting
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-23 15:39:53 -08:00
Colin Patrick McCabe
66a78c23b7 gui: reindent a bunch of code
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-23 15:37:15 -08:00
Greg Farnum
d8652de616 mdcache: in trim_non_auth, only print out path if it has a parent dentry.
This should only occur with the root inode, but caused a segfault for
anybody running more than one MDS who restarted.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
2010-11-23 14:40:54 -08:00
Herb Shiu
8768b52dc4 mds: Reply checking_lock while reading filelock
Use checking_lock to repalce lock_state in extra buffer list to let client can get correct file lock reply.
2010-11-23 14:04:03 -08:00
Sage Weil
5ed06ffc7d client: remove inode from flush_caps list when auth_cap changes
Avoid confusing other code (e.g. kick_flushing_caps) by staying on the mds
flushign_caps list when we don't even have an auth_cap with them anymore.
We'll need to re-flush to a new MDS later.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-23 13:59:09 -08:00
Sage Weil
4041bf0dda mds: fix set_state_rejoin auth_pin check
We carry an auth pin IFF !stable AND auth.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-23 13:59:09 -08:00
Sage Weil
e97eae1518 init-ceph: tolerate failure in cleanallogs
Otherwise /var/log/ceph/stat makes rm -f error out and we fail.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-23 13:52:22 -08:00
Sage Weil
5498c46780 osd: fix recover_replicas() unfound check
missing_loc.count(soid) == 0 only means unfound if it's not missing on the
primary.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-23 13:52:22 -08:00
Sage Weil
5452dae6f9 osd: recover_primary() until primary has all found objects
The logic in that if was effectively reversed.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-23 13:52:22 -08:00
Sage Weil
7ea7a43584 osd: only discover_all_missing if unfound
Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-23 13:52:22 -08:00
Sage Weil
671b1c09fa osd: add get_num_unfound() helper
Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-23 13:52:22 -08:00
Sage Weil
413ecb0bcf osd: only search_for_missing if there are unfound objects
Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-23 13:52:22 -08:00
Sage Weil
36f703e1e7 osd: removing unused variable, fix warning
Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-23 13:52:22 -08:00
Sage Weil
285cc94674 osd: fix is_all_uptodate()
This should only return true when recovery is done, i.e., no more missing
objects.  Nothing to do with unfound.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-23 13:52:22 -08:00
Colin Patrick McCabe
55570baf03 osd: fix PG::is_all_uptodate
In PG::is_all_uptodate, don't try to look for peer_missing[osd->whoami].
The primary keeps that in PG::missing!

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-23 13:52:21 -08:00
Colin Patrick McCabe
c0c301d5d8 osd: PG::read_log: don't be clever with lost xattr
Formerly, we had a special case in read_log for dealing with objects
whose objects were present on the disk, but not their attributes. This
conflicts with our plans to mark objects as lost by putting a bit in the
object attributes, since without those attributes, we'll never know if
the objects were formerly marked as lost.

This should almost never happen, and if it does, we just handle the
objects as missing in the normal way.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-23 13:52:21 -08:00
Colin Patrick McCabe
0e15da8d2e Rename peer_summary_requested to peer_backlog_req
Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-23 13:52:21 -08:00
Colin Patrick McCabe
846122866d Build might_have_unfound set at activation
The might_have_unfound set is used by the primary OSD during recovery.
This set tracks the OSDs which might have unfound objects that the
primary OSD needs. As we receive Missing from each OSD in
might_have_unfound, we will remove the OSD from the set.

When might_have_unfound is empty, we will mark objects as LOST if the
latest version of the object resided on an OSD marked as lost.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-23 13:52:21 -08:00
Samuel Just
36c6569c11 monmaptool: Return a non-zero error code and print a useful error
message if unable to read the monmap file.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
2010-11-23 12:26:38 -08:00
Sage Weil
fc212548ae mds: allow for old fs's with stray instead of stray0
New fs's get stray0, but we want to still behave with old ones.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-23 09:43:49 -08:00
Sage Weil
de61991a87 Merge branch 'testing' into unstable
Conflicts:
	configure.ac
2010-11-23 09:37:13 -08:00
Sage Weil
868665d5f2 v0.23.1 2010-11-22 23:02:09 -08:00
Sage Weil
c327c6a206 mon: always use send_reply for auth replies
Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-22 22:41:57 -08:00
Sage Weil
61dd4f03e6 mon: simplify send_reply code
No need to specify destination in send_reply, as we always have the request
for reference.

Simplify MRoute constructors (keep the ones we use) for tid and bcast
best-effort case.

Do NOT do a best-effort forward of a reply with a tid specified if the tid
is not in the routed-request map.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-22 22:41:42 -08:00
Colin Patrick McCabe
2c71bd3345 osd: add assert to _process_pg_info
When activating an inactive replica, assert that we are doing so based
on a message from the primary.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-22 17:37:55 -08:00
Colin Patrick McCabe
a70943fded osd: re-indent some code in _process_pg_info
Re-indent the code and add a comment.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-22 17:35:12 -08:00
Sage Weil
71369541ab msgr: tolerate 0 bytes from tcp_read_nonblocking
This can happen, I belive when we get a signal or something.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-22 16:12:10 -08:00
Sage Weil
7ec0034b65 init-ceph: fix (and test!) cleanlogs and cleanalllogs
Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-22 16:12:01 -08:00
Sage Weil
7b4a801fec mds: fix rejoin_scour_survivor_replicas inode check
We want to remove replicas that we don't ack, but those don't appear in
the strong_inode map; they're appended to the base_inode bufferlist.  Make
a (temporary) set to track who those are so that we know who to get rid of.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-22 16:03:58 -08:00
Greg Farnum
dd11fe270c types: Allow inodeno_t structs to alias.
This removes a compiler warning that appeared in a gcc upgrade and
is apparently erroneous, about its usage violating strict-aliasing rules
when the + operator is used.
2010-11-22 15:08:15 -08:00
Greg Farnum
8d95b5b61a messenger: init rc to -1, removing compiler warning.
This actually is initialized before all uses, but compilers tend to
have trouble with assignment in if-else branches, and -1 is considered
invalid so there's no danger of refactoring breaking anything.
2010-11-22 15:08:15 -08:00
Samuel Just
ac6b018acb Causes the MDSes to switch among a set of stray directories when
switching to a new journal segment.

MDSCache:
	The stray member has been replaced with strays, an array of inodes
	representing the set of available stray directories, as well as
	stray_index indicating the index of the current stray directory.

	get_stray() now returns a pointer to the current stray directory
	inode.

	advance_stray() advances stray_index to the next stray directory.

	migrate_stray no longer takes a source argument, the source mds
	is inferred from the parent of the dir entry.

	stray dir entries are now stray<index> rather than stray.

	scan_stray_dir now scans all stray directories.

MDSLog:
	start_new_segment now calls advance_stray() on MDSCache to force a new
	stray directory.

mdstypes:
	NUM_STRAY indicates the number of stray directories to use per MDS

	MDS_INO_STRAY now takes an index argument as well as the mds number

	MDS_INO_STRAY_OWNER(i) returns the mds owner of the stray directory i

	MDS_INO_STRAY_OWNER(i) returns the index of the stray directory i

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
2010-11-22 13:25:14 -08:00
Samuel Just
3f8f59059a Timer must be initialized in Client::init and shutdown in
Client::shutdown.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
2010-11-22 13:16:28 -08:00
Colin Patrick McCabe
8eb4de9e6e generate_past_intervals:generate back to lastclean
PG::generate_past_intervals needs to generate all the intervals back to
history.last_epoch_clean, rather than just to
history.last_epoch_started. This is required by
PG::build_might_have_unfound, which needs to examine these intervals
when building the might_have_unfound set.

Move the check for whether past_intervals is up-to-date into
generate_past_intervals itself. Fix the check.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
2010-11-22 10:47:53 -08:00
Sage Weil
80f2823571 vstart.sh: 'init-ceph stop' instead of 'stop.sh'
This just makes it easier to run multiple vstart sessions as the same user
on the same host.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-22 10:07:40 -08:00
Sage Weil
53d0650a42 Merge branch 'osd_msgr' into unstable 2010-11-22 09:55:37 -08:00
Sage Weil
27c6f217ca mds: remove bogus assert
Causes problems during resolve finish.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-22 09:55:01 -08:00
Sage Weil
9e15ade88d mds: do not eval subtree root when replay|resolve
This is nonsensical.  And can lead to scatter_writebehind, which breaks
horribly.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-22 09:55:01 -08:00
Sage Weil
c0c81d53b4 mds: trim exported subtree _after_ adjusting auth
We need to set the subtree bounds before trimming it away, or else we may
throw out things we're still auth for.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-22 09:55:01 -08:00
Sage Weil
cd53719f3c mds: resolve cleanup
Only track ambiguous imports and such if we get a resolve message while in
the resolve state.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-22 09:55:01 -08:00
Sage Weil
924b1fcbf7 osd: bind to new cluster address when wrongly marked down
If we come back up on the same address, there is a possible race.  Other
nodes will mark_down when they see us go down.  If we go up first, queue
some messages, and _then_ they see that we're down and mark_down, the
messages we queued will get lost.  Since it's stateful on the cluster
backend, we need to introduce an ordering so that closing out the _old_
session doesn't break the new session.  We do this by binding to a new
address (just a different port, actually) before marking ourselves back
up.

Fixes #592.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-22 09:49:43 -08:00
Sage Weil
1940976339 msgr: implement rebind() to pick a new port
Closes out all old connections and binds to a _different_ port.  This
ensures that someone doing mark_down on our old address won't get us.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-22 09:45:29 -08:00
Greg Farnum
f7170f95f0 client: only encode_cap_releases once per request.
Accomplish this by making a list of cap releases in the (permanent)
MetaRequest, and then copying that into the (potentially-temporary)
MClientRequest.
2010-11-22 09:09:01 -08:00
Sage Weil
51abcaa2c0 mon: clean up cluster_addr code a bit, better debug output
Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-21 20:52:41 -08:00
Sage Weil
28498a00cf osd: send correct ip addrs to monitor for cluster_, hb_addr
Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-21 20:52:40 -08:00
Sage Weil
2031364451 osdmap: fix cluster_addr encoding; printing
The cluster addrs were getting lost because we were checking v instead of
ev.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-21 20:52:40 -08:00
Sage Weil
ec434eda6a osd: unconditionally set up separate msgr instance for osd<->osd msgs
Always set up cluster_messenger (before we would only do so if there was
an explicit address configured for it).  The overhead to do so is minimal,
it simplifies the code, and will allow us to fix down->up transitions
(later).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-21 19:59:43 -08:00