Sage Weil
21a97d1e7c
mon: don't leak MAuth
2010-06-06 22:15:18 -07:00
Sage Weil
0c38b3d63d
objectcacher: add verify_stats() debugging helper
2010-06-04 16:32:07 -07:00
Sage Weil
dff7cb33aa
objectcacher: fix stat accounting when resizing bufferheads
...
Must keep stats in mind when adjusting bufferheads!
2010-06-04 16:32:07 -07:00
Sage Weil
a76d8fc65d
objectcacher: cleanup formatting
2010-06-04 16:32:07 -07:00
Sage Weil
462552ab9d
objectcacher: fix use of invalid iterator in map_write()
...
The p points to bh, which is removed by merge_left. Move it back to final,
so we can advance to the new next a few lines down.
2010-06-04 16:32:06 -07:00
Sage Weil
12a5d7b2b5
objectcacher: match states before merging in map_write
...
The caller is going to set us to dirty, so we don't care what state we
have, so long as the left and right bits we're merging match all is ok.
2010-06-04 16:32:06 -07:00
Yehuda Sadeh
522c12e547
osd: fix rollback when head points at the rolled back snapshot
2010-06-04 16:23:38 -07:00
Sage Weil
8d1e7739a5
Merge branch 'rbd' into unstable
2010-06-04 13:10:28 -07:00
Sage Weil
7b6aea6aea
osd: clean up rollback debug output
2010-06-04 13:09:42 -07:00
Sage Weil
1b5920f806
uclient: handle inode with no caps from mds
...
This happens when you readdir and some inodes are in a different snaprealm.
2010-06-04 13:01:21 -07:00
Greg Farnum
e79a3fae4e
osd: filter_xattrs on a rollback op
2010-06-04 12:57:59 -07:00
Greg Farnum
48555f527a
osd: fix naughty iterator usage after invalidating it
2010-06-04 12:55:27 -07:00
Greg Farnum
a70a3668c0
osd: _make_clone now properly duplicates xattrs
2010-06-04 12:49:49 -07:00
Greg Farnum
c730b85ce3
osd: add filter_xattrs function to remove non-user xattrs from a map of them
2010-06-04 12:49:49 -07:00
Sage Weil
84b279a4b3
mds: fix straydn->first part deux
...
9ed0c30ecf
forgot to remove the old code.
2010-06-04 11:07:09 -07:00
Greg Farnum
97f00aec19
debugging output
2010-06-03 18:22:24 -07:00
Greg Farnum
d386327205
rados: print out pool instead of object
2010-06-03 18:22:18 -07:00
Sage Weil
c4e6482d30
mds: only purge dentries with no extra refs (besides dirty)
...
Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-03 17:33:30 -07:00
Sage Weil
9ed0c30ecf
mds: set straydn first to match inode on unlink
2010-06-03 17:33:30 -07:00
Sage Weil
ec0aa43a6c
mds: don't export stray (~mdsfoo/stray), and ignore in balancer
...
We _must_ keep mdsdir and stray on local mds for normal operations.
Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-03 17:33:30 -07:00
Sage Weil
074a9b10f4
mds: make discover work for multiversion inodes (e.g. dirs)
...
If we don't have the specific snap, look up the head and see if it's
multiversion.
This doesn't give us a "range" lookup like we get with dentries because
the inode_map is a hash, not a map. However, we shouldn't need it,
because we always have a specific snapped inode we're looking for (because
it is refered to by a dentry) or we are looking at a multiversion head.
2010-06-03 17:33:30 -07:00
Sage Weil
9ead80f8bc
mds: fix CDir::take_sub_waiting vs dnwaiter pin
...
Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-03 17:33:30 -07:00
Sage Weil
791ca28295
mds: kill open_foreign_stray; but open remote mdsdirs instead
...
Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-03 17:33:29 -07:00
Sage Weil
551a12f52e
mds: fix cap clone logic to look at matching first, not last
...
The cap->client_follows is set to follows+1 by flushsnap, since the real
follows value isn't convenient. But it is enough to know that it is more
than the old version's follows, so do that.
2010-06-03 17:33:29 -07:00
Yehuda Sadeh
ff0e871565
libatomic: fix assert.h compilation
2010-06-03 16:45:06 -07:00
Greg Farnum
3989ae40e3
osd: make sure we don't return EAGAIN to client
2010-06-03 14:12:41 -07:00
Sage Weil
62b900f5e2
mds: open past snap parents at end of rejoin phase
...
We really need past parents open before we go active or else anything
that needs to build a snap context will fail.
2010-06-03 14:14:04 -07:00
Sage Weil
26449e7c6a
mdsmap: show individual mds states in summary
2010-06-03 13:48:10 -07:00
Sage Weil
09185a0078
osd: improve snap_trimmer debug output
2010-06-03 13:26:39 -07:00
Sage Weil
2b33d99b8c
mds: another cap_exports message/mdcache encoding fix
...
Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-03 13:24:48 -07:00
Sage Weil
55da048fb7
mds: only adjust dn->first on lock msg if !multiversion
...
The multiversion dn->first references a range of inode versions; don't
drag it forward. Fixes 38cb2403c0
.
Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-03 13:08:16 -07:00
Sage Weil
5f905961c5
mds: more fix cap_exports typing
2010-06-03 12:03:23 -07:00
Sage Weil
054669ab2b
mds: fix scatter_nudge infinite loop
2010-06-03 11:59:50 -07:00
Sage Weil
40b23227cb
mds: fix ESessions type
2010-06-03 11:08:00 -07:00
Sage Weil
5cd7919ab7
mds: drag in->first forward with straydn in handle_dentry_unlink
2010-06-03 11:04:05 -07:00
Sage Weil
394d9c3db5
mds: fix anchorclient dup lookups, again
2010-06-03 10:38:56 -07:00
Sage Weil
980f234fae
mds: only log successful requests as completed
2010-06-03 10:17:37 -07:00
Sage Weil
fa1e560344
mds: anchor dir on mksnap
2010-06-03 10:09:19 -07:00
CC Lien
c09d610c00
mkcephfs: error when creating journal file in a directory that differs from OSD data dir
...
mkcephfs creates osd data directory automatically, but it doesn't create a
directory for the osd journal file.
When you have a journal file in a directory that differs from the osd data
directory in your configuration, like:
osd data = /osd/osd$id
osd journal = /journal/osd$id
You will receive a "mount failed to open journal /journal/osd0/journal: No
such file or directory" error when doing mkcephfs
Signed-off-by: CC Lien <cc_lien@tcloudcomputing.com>
2010-06-03 09:45:10 -07:00
Sage Weil
5dd4a2d6b7
mds: fix mismatched cap_exports type between msg and MDCache
...
The types need to match because they are encoded/decoded interchangeably.
See MMDSCacheRejoin::decode() and MDCache::rejoin_send_rejoins().
2010-06-03 09:40:57 -07:00
Sage Weil
609e657204
mds: fix trim_unlinked iterator badness
...
We may remove the next inode in the map. Queue up unlinked roots first,
which we know remove_inode_recursive() won't reach, and iterate over those.
2010-06-03 09:33:27 -07:00
Sage Weil
915ab3ca2d
mds: define MDS_REF_SET in unstable
2010-06-03 09:28:15 -07:00
Sage Weil
ef095e1f36
mds: clear dirtyscattered in remove_inode()
2010-06-03 09:27:56 -07:00
Sage Weil
26822162bd
mds: allow dup lookups in anchorclient
...
It's not practical for callers to avoid dups, particularly since they may
be unaware of each other. And it's trivial to support it here.
2010-06-03 09:17:13 -07:00
Sage Weil
8a2a9bd6e4
assert: fix assert vs atomic_ops.h breakage
...
This was causing us to use the system assert, not the ceph one.
2010-06-03 09:01:58 -07:00
Sage Weil
f5ccc66289
mds: ensure past snap parents get opened before doing file recovery
...
Otherwise we can fail to get_snaps() when we start the recovery:
#0 0x00007fa037625f55 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007fa037628d90 in *__GI_abort () at abort.c:88
#2 0x00007fa03761f07a in *__GI___assert_fail (assertion=0x9f3d81 "oldparent", file=<value optimized out>, line=170, function=0x9f4680 "void SnapRealm::build_snap_set(std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> >&, snapid_t&, snapid_t&, snapid_t&, snapid_t, snapid_t)") at assert.c:78
#3 0x00000000008f7656 in SnapRealm::build_snap_set (this=0x222a300, s=..., max_seq=..., max_last_created=..., max_last_destroyed=..., first=..., last=...) at mds/snap.cc:170
#4 0x00000000008f7e8c in SnapRealm::check_cache (this=0x222a300) at mds/snap.cc:194
#5 0x00000000008f892a in SnapRealm::get_snaps (this=0x222a300) at mds/snap.cc:209
#6 0x00000000007f2c85 in MDCache::queue_file_recover (this=0x2202a00, in=0x7fa0340f5450) at mds/MDCache.cc:4398
#7 0x0000000000865011 in Locker::file_recover (this=0x21fe850, lock=0x7fa0340f59b0) at mds/Locker.cc:3437
#8 0x00000000007e5899 in MDCache::start_files_to_recover (this=0x2202a00, recover_q=..., check_q=...) at mds/MDCache.cc:4503
#9 0x00000000007e887e in MDCache::rejoin_gather_finish (this=0x2202a00) at mds/MDCache.cc:3904
#10 0x00000000007ed6cf in MDCache::handle_cache_rejoin_strong (this=0x2202a00, strong=0x7fa030025440) at mds/MDCache.cc:3618
#11 0x00000000007ed84a in MDCache::handle_cache_rejoin (this=0x2202a00, m=0x7fa030025440) at mds/MDCache.cc:3063
#12 0x00000000007fade6 in MDCache::dispatch (this=0x2202a00, m=0x7fa030025440) at mds/MDCache.cc:5668
#13 0x0000000000735313 in MDS::_dispatch (this=0x22014d0, m=0x7fa030025440) at mds/MDS.cc:1390
#14 0x00000000007372a3 in MDS::ms_dispatch (this=0x22014d0, m=0x7fa030025440) at mds/MDS.cc:1295
#15 0x0000000000728b97 in Messenger::ms_deliver_dispatch(Message*) ()
#16 0x0000000000716c5e in SimpleMessenger::dispatch_entry (this=0x2202350) at msg/SimpleMessenger.cc:332
#17 0x00000000007119c7 in SimpleMessenger::DispatchThread::entry (this=0x2202760) at msg/SimpleMessenger.h:494
#18 0x000000000071f4e7 in Thread::_entry_func (arg=0x2202760) at ./common/Thread.h:39
#19 0x00007fa03849673a in start_thread (arg=<value optimized out>) at pthread_create.c:300
#20 0x00007fa0376bf6dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-03 08:19:24 -07:00
Sage Weil
c0e9d21009
mds: relax lock state before encoding export (and lock state)
...
We can't fuss with lock state in the finish method because we already
encoded the old state to the new auth, and we are now just a replica.
We do still want to relax the lock state to be more replica friendly,
though, so do that in the encode_export_inode method.
2010-06-03 08:04:33 -07:00
Sage Weil
3768ef941e
mds: do not bother tableserver until it is active
...
We resend these requests when the TS does go active, and if we send dups
things get all screwed up (see partial log below).
Should we worry about dup queries?
10.06.02_22:32:08.112834 7f881dfdb910 -- 10.3.64.22:6802/7866 --> mds0 10.3.64.22:6803/13552 -- mds_table_request(anchortable prepare 69 148 bytes) v1 -- ?+0 0x7f88180e4580
10.06.02_22:32:08.116427 7f881dfdb910 mds1.tableserver(anchortable) handle_mds_recovery mds0
10.06.02_22:32:08.116449 7f881dfdb910 mds1.tableclient(anchortable) handle_mds_recovery mds0
10.06.02_22:32:08.116457 7f881dfdb910 mds1.tableclient(anchortable) resending 69
10.06.02_22:32:08.116470 7f881dfdb910 -- 10.3.64.22:6802/7866 --> mds0 10.3.64.22:6803/13552 -- mds_table_request(anchortable prepare 69 148 bytes) v1 -- ?+0 0x7f8818120cb0
10.06.02_22:32:08.116840 7f881dfdb910 -- 10.3.64.22:6802/7866 <== mds0 10.3.64.22:6803/13552 7 ==== mds_table_request(anchortable agree 69 tid 165) v1 ==== 16+0+0 (1328913316 0 0) 0x2362830
10.06.02_22:32:08.116861 7f881dfdb910 mds1.tableclient(anchortable) handle_request mds_table_request(anchortable agree 69 tid 165) v1
10.06.02_22:32:08.116872 7f881dfdb910 mds1.tableclient(anchortable) got agree on 69 atid 165
10.06.02_22:32:08.127662 7f881dfdb910 mds1.tableclient(anchortable) commit 165
10.06.02_22:32:08.127683 7f881dfdb910 -- 10.3.64.22:6802/7866 --> mds0 10.3.64.22:6803/13552 -- mds_table_request(anchortable commit tid 165) v1 -- ?+0 0x7f8818114860
10.06.02_22:32:08.128244 7f881dfdb910 mds1.tableclient(anchortable) _prepare 70
10.06.02_22:32:08.128261 7f881dfdb910 -- 10.3.64.22:6802/7866 --> mds0 10.3.64.22:6803/13552 -- mds_table_request(anchortable prepare 70 82 bytes) v1 -- ?+0 0x7f88180e4580
10.06.02_22:32:08.131873 7f881dfdb910 -- 10.3.64.22:6802/7866 <== mds0 10.3.64.22:6803/13552 8 ==== mds_table_request(anchortable agree 69 tid 165 148 bytes) v1 ==== 164+0+0 (4238497285 0 0) 0x2362310
10.06.02_22:32:08.131900 7f881dfdb910 mds1.tableclient(anchortable) handle_request mds_table_request(anchortable agree 69 tid 165 148 bytes) v1
10.06.02_22:32:08.131911 7f881dfdb910 mds1.tableclient(anchortable) stray agree on 69 tid 165, already committing, resending COMMIT
10.06.02_22:32:08.131923 7f881dfdb910 -- 10.3.64.22:6802/7866 --> mds0 10.3.64.22:6803/13552 -- mds_table_request(anchortable commit tid 165) v1 -- ?+0 0x7f8818120cb0
10.06.02_22:32:08.144147 7f881dfdb910 -- 10.3.64.22:6802/7866 <== mds0 10.3.64.22:6803/13552 10 ==== mds_table_request(anchortable ack tid 165) v1 ==== 16+0+0 (584840829 0 0) 0x246dd20
10.06.02_22:32:08.144179 7f881dfdb910 mds1.tableclient(anchortable) handle_request mds_table_request(anchortable ack tid 165) v1
10.06.02_22:32:08.144195 7f881dfdb910 mds1.tableclient(anchortable) got ack on tid 165, logging
10.06.02_22:32:08.144217 7f881dfdb910 mds1.log submit_entry 5515297~17 : ETableClient anchortable ack tid 165
10.06.02_22:32:08.152419 7f881dfdb910 -- 10.3.64.22:6802/7866 <== mds0 10.3.64.22:6803/13552 11 ==== mds_table_request(anchortable agree 69 tid 166 148 bytes) v1 ==== 164+0+0 (4238497285 0 0) 0x2362830
10.06.02_22:32:08.152448 7f881dfdb910 mds1.tableclient(anchortable) handle_request mds_table_request(anchortable agree 69 tid 166 148 bytes) v1
10.06.02_22:32:08.152460 7f881dfdb910 mds1.tableclient(anchortable) stray agree on 69 tid 166, sending ROLLBACK
10.06.02_22:32:08.152470 7f881dfdb910 -- 10.3.64.22:6802/7866 --> mds0 10.3.64.22:6803/13552 -- mds_table_request(anchortable rollback tid 166) v1 -- ?+0 0x7f8818120cb0
10.06.02_22:32:08.172729 7f881dfdb910 -- 10.3.64.22:6802/7866 <== mds0 10.3.64.22:6803/13552 13 ==== mds_table_request(anchortable ack tid 165) v1 ==== 16+0+0 (584840829 0 0) 0x2362310
10.06.02_22:32:08.172770 7f881dfdb910 mds1.tableclient(anchortable) handle_request mds_table_request(anchortable ack tid 165) v1
10.06.02_22:32:08.172786 7f881dfdb910 mds1.tableclient(anchortable) got ack on tid 165, logging
10.06.02_22:32:08.172806 7f881dfdb910 mds1.log submit_entry 5515318~17 : ETableClient anchortable ack tid 165
10.06.02_22:32:08.174091 7f881dfdb910 -- 10.3.64.22:6802/7866 <== mds0 10.3.64.22:6803/13552 14 ==== mds_table_request(anchortable agree 70 tid 168 82 bytes) v1 ==== 98+0+0 (1154743153 0 0) 0x246dd20
10.06.02_22:32:08.174119 7f881dfdb910 mds1.tableclient(anchortable) handle_request mds_table_request(anchortable agree 70 tid 168 82 bytes) v1
10.06.02_22:32:08.174131 7f881dfdb910 mds1.tableclient(anchortable) got agree on 70 atid 168
10.06.02_22:32:08.202508 7f881dfdb910 mds1.tableclient(anchortable) _logged_ack 165
10.06.02_22:32:08.202530 7f881dfdb910 mds1.tableclient(anchortable) _logged_ack 165
<crash>
2010-06-02 23:07:42 -07:00
Sage Weil
7c0df05407
mds: do not reset filelock state when checking max_size during recovery
...
This was broken by d5574993
(probably, that commit fixed a similar
problem). The rejoin_ack initializes replica state properly, so we can't
go changing it now. I'm not sure why this was resetting the state to
LOCK, because that's clearly not allowed.
Print when check_max_size does a no-op so that this is a bit easier to see
next time.
2010-06-02 22:14:54 -07:00
Sage Weil
15c6651ff5
mds: lock->sync replica state is lock, not sync
...
It's not readable yet. And after the lock->sync gather completes we send
out a SYNC.
Fixes failed assertion like:
10.06.02_21:27:04.444202 7f17a25ac910 mds1.locker handle_file_lock a=sync on (ifile sync) from mds0 [inode 1 [...2,head] / rep@0.2 v7 snaprealm=0xe27400 f(v0 m10.06.02_21:26:13.366344 1=0+1) ds=1=0+1 rb=0 rf=0 rd=0 (iauth sync) (ilink sync) (idft sync) (isnap sync) (inest sync) (ifile sync) (ixattr sync) (iversion lock) | nref=1 0x7f179c006280]
mds/Locker.cc: In function 'void Locker::handle_file_lock(ScatterLock*, MLock*)':
mds/Locker.cc:3468: FAILED assert(lock->get_state() == 2 || lock->get_state() == 15 || lock->get_state() == 21)
1: (Locker::handle_file_lock(ScatterLock*, MLock*)+0x1d8) [0x86d70a]
2: (Locker::handle_lock(MLock*)+0x191) [0x86e30f]
3: (Locker::dispatch(Message*)+0x41) [0x870f27]
4: (MDS::_dispatch(Message*)+0x1a17) [0x7364cb]
5: (MDS::ms_dispatch(Message*)+0x2f) [0x737961]
6: (Messenger::ms_deliver_dispatch(Message*)+0x55) [0x72918d]
7: (SimpleMessenger::dispatch_entry()+0x532) [0x71710a]
8: (SimpleMessenger::DispatchThread::entry()+0x29) [0x711f25]
9: (Thread::_entry_func(void*)+0x20) [0x7232f4]
10: /lib/libpthread.so.0 [0x7f17a407073a]
11: (clone()+0x6d) [0x7f17a329469d]
Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-02 21:33:40 -07:00