Commit Graph

9311 Commits

Author SHA1 Message Date
Sage Weil
9ad91c0f72 objectstore: simpler transaction encoding
Just concatenate operations to a bufferlist as we go.  No
distinct decoding step is needed; we parse the transaction as it
is replayed/applied.  This avoids the old decoded intermediate
representation overhead.

Since we still decode the old version, that code is still there,
but not used for anything new.
2010-02-18 15:05:33 -08:00
Sage Weil
59d1b673b8 vstart: default to 3 mds 2010-02-18 12:36:04 -08:00
Sage Weil
ed87bef243 uclient: do not retain caps being revoked
Matches kclient commit 68c28323.
2010-02-18 11:50:49 -08:00
Sage Weil
aac8930fa5 debug: fix warnings, use larger path buffers 2010-02-18 09:57:34 -08:00
Sage Weil
4aeed8101f logger: fix warning 2010-02-18 09:57:15 -08:00
Sage Weil
109f37f3ab workqueue: behave when multiple threads call drain()
Use a counter, not a bool.
2010-02-17 21:47:18 -08:00
Sage Weil
f8b2584178 mds: add support for directory sticky bit
Take an rdlock on the directory authlock, so that we can reliably set the
new inode's gid if the directory mode has SGID bit set.
2010-02-17 21:24:50 -08:00
Sage Weil
4540b5e61a filestore: only do btrfs_snap if btrfs 2010-02-17 21:11:30 -08:00
Sage Weil
0e33171c60 Merge commit 'origin/filestore' into unstable
Conflicts:

	src/os/FileStore.cc
	src/os/FileStore.h
2010-02-17 14:57:31 -08:00
Sage Weil
ef27fd6871 update release checklist 2010-02-17 14:55:40 -08:00
Sage Weil
98f5be53f8 v0.19 2010-02-17 13:53:06 -08:00
Sage Weil
801d248ae8 mon: disable 'osd setmap'
This is dangerous, since it doesn't preserve old pool ids or pool_max, and
will confuse osds and generally wreak havoc.
2010-02-17 09:18:16 -08:00
Sage Weil
e2ed6db844 osdmap: fix uninit var warning
Harmless, but this shuts it up.
2010-02-16 19:53:51 -08:00
Sage Weil
f4a5f53b7c mon: add 'auth export ]name]' to export a full or partial keyring 2010-02-16 16:37:59 -08:00
Sage Weil
65f5123645 qa: fix snaptest1.sh 2010-02-16 16:00:59 -08:00
Sage Weil
ab03efb06c osdmap: decode old osdmaps prior to pool_max stuff 2010-02-16 15:59:09 -08:00
Sage Weil
157dec1663 osdmap: get rid of useless max_pools 2010-02-16 15:49:59 -08:00
Sage Weil
1b0711dc57 osd: pool cleanups
missed this before:

 - no need to initalize in create_pending(), constructor does that
 - int32_t, not int
 - pool_max while we're at it
 - initialize pool_max in OSDMap constructor
2010-02-16 15:49:33 -08:00
Sage Weil
465b46fb0c todo 2010-02-16 15:02:00 -08:00
Sage Weil
9f79756f8e mds: ignore session RENEWCAPS if state not open|stale
This avoids breakage where a renewcaps races with a session
being purged, for example.
2010-02-16 15:02:00 -08:00
Greg Farnum
455c594c6c osdmap/mon: Be more defensive about highest_pool_num usage 2010-02-16 14:38:15 -08:00
Greg Farnum
5bbb3d6e83 rados tool: mkpool/rmpool commands now available 2010-02-16 14:34:44 -08:00
Greg Farnum
c555ee34f3 mon: can now delete pools via 'ceph osd pool delete foo' 2010-02-16 14:34:44 -08:00
Greg Farnum
12e3742fe8 rgw: actually delete pools when using rados! 2010-02-16 14:34:44 -08:00
Greg Farnum
9851a2020d rados/objecter: can now delete pools! 2010-02-16 14:34:44 -08:00
Greg Farnum
aea1082ec4 mon/msg: MPoolOp can carry POOL_OP_DELETE; OSDMon puts pool in incre old_pools 2010-02-16 14:34:44 -08:00
Greg Farnum
69f923d216 librados: init PoolCtx properly -- was always setting snap_seq to CEPH_NOSNAP 2010-02-16 14:34:44 -08:00
Greg Farnum
3ced5e7de2 osd: Deal with pools being removed from OSDMap.
This potentially has issues, since pools are not removed from the map
until after all the PGs are removed (which is threaded, not inline with
map delivery). But Sage thinks it's okay and the system keeps working
even if you delete a pool while benchmarking on it with rados.
2010-02-16 14:34:05 -08:00
Greg Farnum
212a9fd6a6 OSDMap: get_pg_pool now returns a pointer
This lets us return NULL if the pool isn't in the map, which is
needed functionality for pool deletion. Meanwhile, code which
expects the pool to exist will continue to cause a crash if it doesn't.
2010-02-16 09:27:41 -08:00
Greg Farnum
60594796cd rados: fix seg fault on cleanup of a failed pool open 2010-02-16 09:26:55 -08:00
Sage Weil
3b1a90e648 mds: infer 'follows' in journal_dirty_inode on non-head inodes
There are lots of callers to journal_dirty_inode that may
unwittingly be dealing with a non-head inode (e.g.
check_file_max).  If the provided inode is snapped, infer an
appropriate follows values so as not to cow_inode() again.
2010-02-15 13:47:50 -08:00
Sage Weil
01ed8d0aaa mds: clear cap->issued on flushsnap
This allows _do_cap_update to clear out the client_range.

Kill (now) unused/unnecessary 'wanted' arg to _do_cap_update.

Also delay cap removal until after _do_cap_update (whcih takes
a Capability*).  This probably needs further cleanup.
2010-02-15 13:27:01 -08:00
Sage Weil
6deb60b7dd mds: don't croak on null dentries in cache during reconnect/rejoin
They're created when we replay unlink events from the log.
2010-02-15 11:40:20 -08:00
Yehuda Sadeh
34ad5bf859 objectcacher: use trimtrunc read/write ops 2010-02-12 16:03:47 -08:00
Yehuda Sadeh
92baa54b41 osdc: clean up some mess 2010-02-12 16:03:47 -08:00
Yehuda Sadeh
bc32b8f763 objecter: add read_trunc, write_trunc 2010-02-12 16:03:47 -08:00
Sage Weil
7a73f915e0 mkmonfs: rm -rf, so that we kill 0600 admin_keyring.bin 2010-02-12 14:54:01 -08:00
Sage Weil
ce464a5a0a osd: fix recovery requeue race
If a recovery op finished right as another recovery op was
begin started, we could get into start_recovery_ops() and get
max = 0 and not start anything.  Since the PG wasn't being
requeued for later, it would never recover.  So, requeue if we
race and get max == 0.
2010-02-12 14:45:02 -08:00
Sage Weil
72d7117771 init-ceph: print 'already started' instead of failing to start 2010-02-12 14:20:02 -08:00
Sage Weil
10ae652f03 msgr: more conservative locking, thread join asserts
We caught a bunch of crashes like this:

10.02.11 17:01:01.600660 7f87070c3950 -- 10.3.14.134:6800/8203 >> 10.3.14.130:6800/18914 pipe(0x7fc2be2cebe0 sd=36 pgs=2409 cs=1 l=0).do_sendmsg error Broken pipe
10.02.11 17:01:01.600700 7f87070c3950 -- 10.3.14.134:6800/8203 >> 10.3.14.130:6800/18914 pipe(0x7fc2be2cebe0 sd=36 pgs=2409 cs=1 l=0).writer error sending 0x7fc27da1c570, 32: Broken pipe
10.02.11 17:01:01.600796 7f87070c3950 -- 10.3.14.134:6800/8203 >> 10.3.14.130:6800/18914 pipe(0x7fc2be2cebe0 sd=-1 pgs=2409 cs=1 l=0).fault initiating reconnect
...
./common/Thread.h: In function 'int Thread::join(void**)':
./common/Thread.h:66: FAILED assert(0)
 1: (Thread::join(void**)+0x73) [0x64fcd3]
 2: (SimpleMessenger::Pipe::join_reader()+0x68) [0x6555a2]
 3: (SimpleMessenger::Pipe::connect()+0xf5) [0x645be9]
 4: (SimpleMessenger::Pipe::writer()+0x157) [0x64793d]
 5: (SimpleMessenger::Pipe::Writer::entry()+0x19) [0x63e107]
 6: (Thread::_entry_func(void*)+0x20) [0x64e816]
 7: /lib/libpthread.so.0 [0x7fc2c3bbdfc7]
 8: (clone()+0x6d) [0x7fc2c2e005ad]

that look a bit like multiple procs were racing into
join_reader().  Add an assert to catch that if it happens again,
and also wrap thread starts in pipe_lock to ensure we keep the
_running flags in sync with reality.  Add in a few other
sanity checks too.
2010-02-12 13:38:38 -08:00
Sage Weil
f5209d750c mon: note mds beacon times more carefully
We need to update the beacon timestamp even when we are updating
the mds state.  Otherwise we can get caught in a busy loop
between marking an mds laggy and !laggy because the beacon stamp
never updates.

So even if we are updating, and the reply will be slow, update
our timestamp, so we don't mark the mds laggy.
2010-02-12 13:35:57 -08:00
Sage Weil
28257a0057 osd: bail out of interval loop completely
We're going backwards, so once this test fails, it always fails,
and we can break instead of continue.  Any skipped intervals will
be pruned shortly anyway.
2010-02-12 13:27:49 -08:00
Sage Weil
d7eb8ce54d osd: always update up_thru if pg changes before going active
We already required this if prior PG members were down, so this
affected the 'failure' case.  We now also require it for
non-failure PG changes (expansion, migration).

This fixes our maybe_went_rw calculation for prior PG intervals,
which is based on up_thru.  If maybe_went_rw is false when the
pg actually went rw, we can lose (and have lost) data.  But it is
not practical to calculate without up_thru being consistently
updated, because determining whether a pg would have been able to
go active depends on knowing last_epoch_started at a previous
point in time, which then determines how many prior intervals
may have been considered, which in turn determines whether
up_thru would have been updated, etc.  Much simpler to update it
all the time.

This should not impose a significantly greater cost, since we
already need it for the failure case.  And in general the
migration/expansion/whatever case is no more common nor critical
than the failure case.
2010-02-12 13:26:19 -08:00
Sage Weil
f931fec558 osd: simplify, and version, pg attrs 2010-02-12 12:52:18 -08:00
Sage Weil
d8b622a4d0 osd: remove some dead code from build_prior
Not sure what any_up_now used to be for, but it's not used now.
2010-02-12 12:45:15 -08:00
Sage Weil
d17257b983 osd: fail startup if store is in use (before we fork) 2010-02-12 11:20:30 -08:00
Sage Weil
226c727eaf osd: set heartbeat addr properly
This was broken by the osd startup change in 8538efc
2010-02-12 11:07:20 -08:00
Sage Weil
5f3f958d33 osd: fix memset transposed params 2010-02-11 16:25:59 -08:00
Sage Weil
8538efc5c2 osd: don't block on mon negotiation on startup
That means we don't check for monmap vs ondisk fsid checks and
such.  They're mostly useless anyway.
2010-02-11 16:18:54 -08:00
Sage Weil
315c8cd9d3 mkcephfs: fix up permissions, ownership on temp keyrings 2010-02-11 16:10:26 -08:00