Commit Graph

7785 Commits

Author SHA1 Message Date
Sage Weil
ef095e1f36 mds: clear dirtyscattered in remove_inode() 2010-06-03 09:27:56 -07:00
Sage Weil
26822162bd mds: allow dup lookups in anchorclient
It's not practical for callers to avoid dups, particularly since they may
be unaware of each other.  And it's trivial to support it here.
2010-06-03 09:17:13 -07:00
Sage Weil
8a2a9bd6e4 assert: fix assert vs atomic_ops.h breakage
This was causing us to use the system assert, not the ceph one.
2010-06-03 09:01:58 -07:00
Sage Weil
f5ccc66289 mds: ensure past snap parents get opened before doing file recovery
Otherwise we can fail to get_snaps() when we start the recovery:

#0  0x00007fa037625f55 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fa037628d90 in *__GI_abort () at abort.c:88
#2  0x00007fa03761f07a in *__GI___assert_fail (assertion=0x9f3d81 "oldparent", file=<value optimized out>, line=170, function=0x9f4680 "void SnapRealm::build_snap_set(std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> >&, snapid_t&, snapid_t&, snapid_t&, snapid_t, snapid_t)") at assert.c:78
#3  0x00000000008f7656 in SnapRealm::build_snap_set (this=0x222a300, s=..., max_seq=..., max_last_created=..., max_last_destroyed=..., first=..., last=...) at mds/snap.cc:170
#4  0x00000000008f7e8c in SnapRealm::check_cache (this=0x222a300) at mds/snap.cc:194
#5  0x00000000008f892a in SnapRealm::get_snaps (this=0x222a300) at mds/snap.cc:209
#6  0x00000000007f2c85 in MDCache::queue_file_recover (this=0x2202a00, in=0x7fa0340f5450) at mds/MDCache.cc:4398
#7  0x0000000000865011 in Locker::file_recover (this=0x21fe850, lock=0x7fa0340f59b0) at mds/Locker.cc:3437
#8  0x00000000007e5899 in MDCache::start_files_to_recover (this=0x2202a00, recover_q=..., check_q=...) at mds/MDCache.cc:4503
#9  0x00000000007e887e in MDCache::rejoin_gather_finish (this=0x2202a00) at mds/MDCache.cc:3904
#10 0x00000000007ed6cf in MDCache::handle_cache_rejoin_strong (this=0x2202a00, strong=0x7fa030025440) at mds/MDCache.cc:3618
#11 0x00000000007ed84a in MDCache::handle_cache_rejoin (this=0x2202a00, m=0x7fa030025440) at mds/MDCache.cc:3063
#12 0x00000000007fade6 in MDCache::dispatch (this=0x2202a00, m=0x7fa030025440) at mds/MDCache.cc:5668
#13 0x0000000000735313 in MDS::_dispatch (this=0x22014d0, m=0x7fa030025440) at mds/MDS.cc:1390
#14 0x00000000007372a3 in MDS::ms_dispatch (this=0x22014d0, m=0x7fa030025440) at mds/MDS.cc:1295
#15 0x0000000000728b97 in Messenger::ms_deliver_dispatch(Message*) ()
#16 0x0000000000716c5e in SimpleMessenger::dispatch_entry (this=0x2202350) at msg/SimpleMessenger.cc:332
#17 0x00000000007119c7 in SimpleMessenger::DispatchThread::entry (this=0x2202760) at msg/SimpleMessenger.h:494
#18 0x000000000071f4e7 in Thread::_entry_func (arg=0x2202760) at ./common/Thread.h:39
#19 0x00007fa03849673a in start_thread (arg=<value optimized out>) at pthread_create.c:300
#20 0x00007fa0376bf6dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-03 08:19:24 -07:00
Sage Weil
c0e9d21009 mds: relax lock state before encoding export (and lock state)
We can't fuss with lock state in the finish method because we already
encoded the old state to the new auth, and we are now just a replica.

We do still want to relax the lock state to be more replica friendly,
though, so do that in the encode_export_inode method.
2010-06-03 08:04:33 -07:00
Sage Weil
3768ef941e mds: do not bother tableserver until it is active
We resend these requests when the TS does go active, and if we send dups
things get all screwed up (see partial log below).

Should we worry about dup queries?

10.06.02_22:32:08.112834 7f881dfdb910 -- 10.3.64.22:6802/7866 --> mds0 10.3.64.22:6803/13552 -- mds_table_request(anchortable prepare 69 148 bytes) v1 -- ?+0 0x7f88180e4580
10.06.02_22:32:08.116427 7f881dfdb910 mds1.tableserver(anchortable) handle_mds_recovery mds0
10.06.02_22:32:08.116449 7f881dfdb910 mds1.tableclient(anchortable) handle_mds_recovery mds0
10.06.02_22:32:08.116457 7f881dfdb910 mds1.tableclient(anchortable) resending 69
10.06.02_22:32:08.116470 7f881dfdb910 -- 10.3.64.22:6802/7866 --> mds0 10.3.64.22:6803/13552 -- mds_table_request(anchortable prepare 69 148 bytes) v1 -- ?+0 0x7f8818120cb0
10.06.02_22:32:08.116840 7f881dfdb910 -- 10.3.64.22:6802/7866 <== mds0 10.3.64.22:6803/13552 7 ==== mds_table_request(anchortable agree 69 tid 165) v1 ==== 16+0+0 (1328913316 0 0) 0x2362830
10.06.02_22:32:08.116861 7f881dfdb910 mds1.tableclient(anchortable) handle_request mds_table_request(anchortable agree 69 tid 165) v1
10.06.02_22:32:08.116872 7f881dfdb910 mds1.tableclient(anchortable) got agree on 69 atid 165
10.06.02_22:32:08.127662 7f881dfdb910 mds1.tableclient(anchortable) commit 165
10.06.02_22:32:08.127683 7f881dfdb910 -- 10.3.64.22:6802/7866 --> mds0 10.3.64.22:6803/13552 -- mds_table_request(anchortable commit tid 165) v1 -- ?+0 0x7f8818114860
10.06.02_22:32:08.128244 7f881dfdb910 mds1.tableclient(anchortable) _prepare 70
10.06.02_22:32:08.128261 7f881dfdb910 -- 10.3.64.22:6802/7866 --> mds0 10.3.64.22:6803/13552 -- mds_table_request(anchortable prepare 70 82 bytes) v1 -- ?+0 0x7f88180e4580
10.06.02_22:32:08.131873 7f881dfdb910 -- 10.3.64.22:6802/7866 <== mds0 10.3.64.22:6803/13552 8 ==== mds_table_request(anchortable agree 69 tid 165 148 bytes) v1 ==== 164+0+0 (4238497285 0 0) 0x2362310
10.06.02_22:32:08.131900 7f881dfdb910 mds1.tableclient(anchortable) handle_request mds_table_request(anchortable agree 69 tid 165 148 bytes) v1
10.06.02_22:32:08.131911 7f881dfdb910 mds1.tableclient(anchortable) stray agree on 69 tid 165, already committing, resending COMMIT
10.06.02_22:32:08.131923 7f881dfdb910 -- 10.3.64.22:6802/7866 --> mds0 10.3.64.22:6803/13552 -- mds_table_request(anchortable commit tid 165) v1 -- ?+0 0x7f8818120cb0
10.06.02_22:32:08.144147 7f881dfdb910 -- 10.3.64.22:6802/7866 <== mds0 10.3.64.22:6803/13552 10 ==== mds_table_request(anchortable ack tid 165) v1 ==== 16+0+0 (584840829 0 0) 0x246dd20
10.06.02_22:32:08.144179 7f881dfdb910 mds1.tableclient(anchortable) handle_request mds_table_request(anchortable ack tid 165) v1
10.06.02_22:32:08.144195 7f881dfdb910 mds1.tableclient(anchortable) got ack on tid 165, logging
10.06.02_22:32:08.144217 7f881dfdb910 mds1.log submit_entry 5515297~17 : ETableClient anchortable ack tid 165
10.06.02_22:32:08.152419 7f881dfdb910 -- 10.3.64.22:6802/7866 <== mds0 10.3.64.22:6803/13552 11 ==== mds_table_request(anchortable agree 69 tid 166 148 bytes) v1 ==== 164+0+0 (4238497285 0 0) 0x2362830
10.06.02_22:32:08.152448 7f881dfdb910 mds1.tableclient(anchortable) handle_request mds_table_request(anchortable agree 69 tid 166 148 bytes) v1
10.06.02_22:32:08.152460 7f881dfdb910 mds1.tableclient(anchortable) stray agree on 69 tid 166, sending ROLLBACK
10.06.02_22:32:08.152470 7f881dfdb910 -- 10.3.64.22:6802/7866 --> mds0 10.3.64.22:6803/13552 -- mds_table_request(anchortable rollback tid 166) v1 -- ?+0 0x7f8818120cb0
10.06.02_22:32:08.172729 7f881dfdb910 -- 10.3.64.22:6802/7866 <== mds0 10.3.64.22:6803/13552 13 ==== mds_table_request(anchortable ack tid 165) v1 ==== 16+0+0 (584840829 0 0) 0x2362310
10.06.02_22:32:08.172770 7f881dfdb910 mds1.tableclient(anchortable) handle_request mds_table_request(anchortable ack tid 165) v1
10.06.02_22:32:08.172786 7f881dfdb910 mds1.tableclient(anchortable) got ack on tid 165, logging
10.06.02_22:32:08.172806 7f881dfdb910 mds1.log submit_entry 5515318~17 : ETableClient anchortable ack tid 165
10.06.02_22:32:08.174091 7f881dfdb910 -- 10.3.64.22:6802/7866 <== mds0 10.3.64.22:6803/13552 14 ==== mds_table_request(anchortable agree 70 tid 168 82 bytes) v1 ==== 98+0+0 (1154743153 0 0) 0x246dd20
10.06.02_22:32:08.174119 7f881dfdb910 mds1.tableclient(anchortable) handle_request mds_table_request(anchortable agree 70 tid 168 82 bytes) v1
10.06.02_22:32:08.174131 7f881dfdb910 mds1.tableclient(anchortable) got agree on 70 atid 168
10.06.02_22:32:08.202508 7f881dfdb910 mds1.tableclient(anchortable) _logged_ack 165
10.06.02_22:32:08.202530 7f881dfdb910 mds1.tableclient(anchortable) _logged_ack 165
<crash>
2010-06-02 23:07:42 -07:00
Sage Weil
7c0df05407 mds: do not reset filelock state when checking max_size during recovery
This was broken by d5574993 (probably, that commit fixed a similar
problem).  The rejoin_ack initializes replica state properly, so we can't
go changing it now.  I'm not sure why this was resetting the state to
LOCK, because that's clearly not allowed.

Print when check_max_size does a no-op so that this is a bit easier to see
next time.
2010-06-02 22:14:54 -07:00
Sage Weil
15c6651ff5 mds: lock->sync replica state is lock, not sync
It's not readable yet.  And after the lock->sync gather completes we send
out a SYNC.

Fixes failed assertion like:

10.06.02_21:27:04.444202 7f17a25ac910 mds1.locker handle_file_lock a=sync on (ifile sync) from mds0 [inode 1 [...2,head] / rep@0.2 v7 snaprealm=0xe27400 f(v0 m10.06.02_21:26:13.366344 1=0+1) ds=1=0+1 rb=0 rf=0 rd=0 (iauth sync) (ilink sync) (idft sync) (isnap sync) (inest sync) (ifile sync) (ixattr sync) (iversion lock) | nref=1 0x7f179c006280]
mds/Locker.cc: In function 'void Locker::handle_file_lock(ScatterLock*, MLock*)':
mds/Locker.cc:3468: FAILED assert(lock->get_state() == 2 || lock->get_state() == 15 || lock->get_state() == 21)
 1: (Locker::handle_file_lock(ScatterLock*, MLock*)+0x1d8) [0x86d70a]
 2: (Locker::handle_lock(MLock*)+0x191) [0x86e30f]
 3: (Locker::dispatch(Message*)+0x41) [0x870f27]
 4: (MDS::_dispatch(Message*)+0x1a17) [0x7364cb]
 5: (MDS::ms_dispatch(Message*)+0x2f) [0x737961]
 6: (Messenger::ms_deliver_dispatch(Message*)+0x55) [0x72918d]
 7: (SimpleMessenger::dispatch_entry()+0x532) [0x71710a]
 8: (SimpleMessenger::DispatchThread::entry()+0x29) [0x711f25]
 9: (Thread::_entry_func(void*)+0x20) [0x7232f4]
 10: /lib/libpthread.so.0 [0x7f17a407073a]
 11: (clone()+0x6d) [0x7f17a329469d]

Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-02 21:33:40 -07:00
Sage Weil
1c930f9b38 msg: add missing msg_types.cc 2010-06-02 19:37:44 -07:00
Sage Weil
5262a96a07 mds: add export_dir command 2010-06-02 12:40:23 -07:00
Sage Weil
4075b95c5c mds: add MDCache::cache_traverse() 2010-06-02 12:40:15 -07:00
Sage Weil
eac36cb5b3 initscript: unmount btrfs if we mounted it 2010-06-02 11:50:29 -07:00
Sage Weil
0d1e5dbf4c move addr parse() into entity_addr_t 2010-06-02 11:50:29 -07:00
Sage Weil
a3323c98d6 tcp: parse ipv4 and ipv6 addresses 2010-06-02 11:50:29 -07:00
Greg Farnum
08afc8df68 mon: fix unsynchronized clock logic;
change output for clarity
2010-06-02 11:34:40 -07:00
Sage Weil
b441fbdc9f mds: lookup exact snap dn on import 2010-06-01 16:34:16 -07:00
Sage Weil
38cb2403c0 mds: update dn->first too when lock state adjusts inode->first
This keeps dn->first in sync with inode->first
2010-06-01 16:33:53 -07:00
Sage Weil
9248cd9e64 mds: don't change lock states on replicated inode
The reconnect will infer some client caps, which will affect what lock
states we want.  If we're not replicated, fine, just pick something good.
Otherwise, try_eval() and go through the proper channels.

This _might_ be the source of #165...
2010-06-01 15:23:46 -07:00
Sage Weil
afadb12245 mds: fix root null deref in recalc_auth_bits
Root may be null if we don't have any subtrees besides ~mds$id.
2010-06-01 15:02:56 -07:00
Sage Weil
364f3cb061 mds: adjust subtree map when unlinking dirs
Otherwise we get subtree bounds in the stray dir and get confused down
the line.
2010-06-01 14:14:23 -07:00
Sage Weil
c4bbb0008b mds: discover snapped paths on retried ops
This is intended to mitigate a livelock issue with traversing to snapped
metadata.  The client specifies all snap requests relative to a non-snap
inode.  The traversal through the snapped portion of the namespace will
normally happen on the auth node, but the actual target may be on another
node that does not have that portion of the namespace.  To avoid indefinite
request ping-pong, the mds will begin to discover and replicate the snapped
path components if the request has been retried.

This doesn't perform optimally, but it will at least work.
2010-06-01 12:57:23 -07:00
Greg Farnum
464e46c81d mon: add wiggle room for clock synchronization check 2010-06-01 11:39:58 -07:00
Greg Farnum
7f8a743c29 mds: add case for CEPH_LOCK_DVERSION to LockType 2010-06-01 10:30:05 -07:00
Greg Farnum
00c3dafd5a xlist: add assert to catch invalid iterator usage 2010-05-29 18:36:05 -07:00
Greg Farnum
79b3962545 ObjectCacher: do not try to deref an invalidated xlist::iterator
Fixes #159
2010-05-29 11:06:15 -07:00
Sage Weil
83094d97a5 paxos: fix store_state fix 2010-05-28 13:21:19 -07:00
Sage Weil
62e290e87f msgr: print bind errors to stderr 2010-05-28 12:59:25 -07:00
Sage Weil
3a705ded1e paxos: cleanup 2010-05-28 12:50:37 -07:00
Sage Weil
3c3e82e0f5 paxos: only store committed values in store_state
The uncommitted value is handled specially by handle_last()
2010-05-28 12:48:41 -07:00
Sage Weil
187011cdbc initscript: fix typo with $lockfile stuff 2010-05-28 12:41:41 -07:00
Sage Weil
6b72d70be4 paxos: set last_committed in share_state()
It wasn't getting set for LAST message, which broke recovery somewhat.

Broken by 8e76c5a1d827e01f77149245679bd00ba27120e0.
2010-05-28 12:37:24 -07:00
Sage Weil
4b79774563 mds: fix null dn deref during anchor_prepare 2010-05-27 16:32:43 -07:00
Sage Weil
892a0e25cc config: parse in $host from conf file
So you can do stuff like
	log dir = /data/$host
2010-05-27 14:59:27 -07:00
Sage Weil
594d45687f osdmaptool: include raw, up, acting mappings 2010-05-27 14:59:27 -07:00
Sage Weil
0a1d526bd3 osdmap: assert maxrep >= minrep 2010-05-27 14:59:27 -07:00
Sage Weil
a1a1350237 mkcephfs: pass -c to cmon --mkfs 2010-05-27 14:59:27 -07:00
Sage Weil
330e1e21e6 osd: warn, don't crash, on purged_snaps shrinkage 2010-05-27 14:59:27 -07:00
Sage Weil
d2c40055c4 initscript: incorporate Josef's fedora fixes
Add 'status' command.
Add chkconfig line.
Do lockfile stuff only if /var/run/subsys exists.

Still specifying the runlevels, though.  The init script bails out (with
success code) if the ceph.conf is missing.
2010-05-27 14:58:56 -07:00
Sage Weil
a3dc4bdac2 sample.ceph.conf: include debug options, commented out 2010-05-26 21:47:35 -07:00
Greg Farnum
05256bb030 rados: you can now set the crush rule to use when creating a pool 2010-05-26 16:58:38 -07:00
Greg Farnum
8044f7ac7e librados: add crush_rule parameter to create_pool functions 2010-05-26 16:58:38 -07:00
Greg Farnum
a9e1727172 objecter: add optional crush_rule parameter; set in pool_op_submit as needed 2010-05-26 16:58:38 -07:00
Greg Farnum
78375cfde9 mon: add crush_rule data member to MPoolOp; use it in new pool creation on mon 2010-05-26 16:58:38 -07:00
Sage Weil
648ce97628 mds: LAYZIO is not liked, but it is allowed 2010-05-26 14:47:49 -07:00
Sage Weil
297d3ecd45 client: update ioctl.h (lazyio, invalidate_range) 2010-05-26 14:47:49 -07:00
Sage Weil
a13b5b1c16 mds: include LAYZIO cap in sync->mix and mix->sync transitions 2010-05-26 14:47:49 -07:00
Sage Weil
a92df208ff mds: include LAZYIO in CEPH_CAP_ANY set 2010-05-26 14:47:49 -07:00
Greg Farnum
75de272367 mon: warn to log, not just dout, on clock drift 2010-05-26 14:35:15 -07:00
Greg Farnum
9b4d25b9b1 mon: detect and warn on clock synchronization problems;
change MMonPaxos::lease_expire to lease_timestamp
2010-05-26 14:35:15 -07:00
Christian Brunner
bee74a1e4c ceph: add conversion to qemu coding style
Hi Yehuda,

I've added a small hack to make push_to_qemu.pl convert tabs to spaces.

Christian
2010-05-26 14:11:25 -07:00