Commit Graph

27489 Commits

Author SHA1 Message Date
David Zafman
fe6633172e Handle non-existent front interface in maps from older MONs
Fix OSDService::get_con_osd_hb() to not try to get_connection() without front interface
Fix OSD::handle_osd_map() to check for missing front interface

Fixes: #5460

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-26 21:06:59 -07:00
Sage Weil
867ead91e4 qa/workunits/rbd/simple_1tb: add simple rbd read/write test on large image
Motivated by #5454.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-26 20:41:58 -07:00
Sage Weil
8a17f33b14 ceph-disk: do not mount over an osd directly in /var/lib/ceph/osd/$cluster-$id
If we see a 'ready' file in the target OSD dir, do not mount our device
on top of it.

Among other things, this prevents ceph-disk activate on stray disks from
stepping on teuthology osds.

Fixes: #5445
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-26 18:28:01 -07:00
Sage Weil
986185ca02 mon/PGMonitor: avoid duplicating map_pg_create() effort on same maps
If we have an election and refresh, but the osdmap does not change, there
is no need to recalculate the pg create maps.  However, if we register new
creating pgs, we do... when the last_pg_scan update gets pulled out of
paxos (i.e., on both leader and peon mons).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-26 17:34:39 -07:00
Dan Mick
ca55c3416e cephtool/test.sh: add case for auth add with no caps
Test case for failure in #5467.  Supplying new auth info overwrites.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-26 17:09:49 -07:00
Dan Mick
bfed2d60a5 MonCommands.h: auth add doesn't require caps (it can use -i <file>)
This was a regression from the old behavior introduced by the
CLI rewrite.

Fixes: #5467
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-06-26 16:18:47 -07:00
Dan Mick
d1d902846d Merge branch 'next' 2013-06-26 12:39:15 -07:00
Dan Mick
71f3e56d4b Makefile.am: fix libglobal.la race with ceph_test_cors
ceph_test_cors had libglobal.la in its _LDFLAGS macro definition;
it should have been in _LDADD.  Moreover, things using libglobal.la
ought to be using LIBGLOBAL_LDA to add it to _LDADD.  Fix them all.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-06-26 12:28:09 -07:00
Sage Weil
e635c47851 mon/PGMonitor: use post_paxos_update, not init, to refresh from osdmap
We do two things here:
 - make init an one-time unconditional init method, which is what the
   health service expects/needs.
 - switch PGMonitor::init to be post_paxos_update() which is called after
   the other services update, which is what PGMonitor really needs.

This is a new version of the fix originally in commit
a2fe013794 (and those around it).  That is,
this re-fixes a problem where osds do not see pg creates from their
subscribe due to map_pg_creates() not getting called.

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-26 06:55:02 -07:00
Sage Weil
131686980f mon/PaxosService: add post_paxos_update() hook
Some services need to update internal state based on other service's
state, and thus need to be run after everyone has pulled their info out of
paxos.

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-26 06:55:02 -07:00
Sage Weil
ea1f316e5d mon: do not reopen MonitorDBStore during startup
level doesn't seem to like this when it races with an internal compaction
attempt (see below).  Instead, let the store get opened by the ceph_mon
caller, and pull a bit of the logic into the caller to make the flow a
little easier to follow.

    -2> 2013-06-25 17:49:25.184490 7f4d439f8780 10 needs_conversion
    -1> 2013-06-25 17:49:25.184495 7f4d4065c700  5 asok(0x13b1460) entry start
     0> 2013-06-25 17:49:25.316908 7f4d3fe5b700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f4d3fe5b700

 ceph version 0.64-667-g089cba8 (089cba8fc0e8ae8aef9a3111cba7342ecd0f8314)
 1: ceph-mon() [0x649f0a]
 2: (()+0xfcb0) [0x7f4d435dccb0]
 3: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, leveldb::Slice const&)+0x154) [0x806e54]
 4: ceph-mon() [0x808840]
 5: ceph-mon() [0x808b39]
 6: ceph-mon() [0x806540]
 7: (leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*)+0xdd) [0x7f363d]
 8: (leveldb::DBImpl::BackgroundCompaction()+0x2c0) [0x7f4210]
 9: (leveldb::DBImpl::BackgroundCall()+0x68) [0x7f4cc8]
 10: ceph-mon() [0x80b3af]
 11: (()+0x7e9a) [0x7f4d435d4e9a]
 12: (clone()+0x6d) [0x7f4d4196bccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-26 06:55:02 -07:00
Sage Weil
516445bebc mon/Paxos: simplify trim()
Collapse all the trim methods into a single simple method.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-26 06:55:02 -07:00
Sage Weil
b8d04a2a8b mon/PaxosService: rename scrub
Make the name patch the one in Paxos.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-26 06:55:02 -07:00
Sage Weil
ac63b2e095 mon/Paxos: clean up removal of pre-conversion paxos states
Use a helper, independent of trim machinery, and call on leader, too.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-26 06:55:02 -07:00
Sage Weil
d2f3811814 mon/Paxos: update first_committed only from paxos
Do not touch the in-memory first_committed until the trim commits.  This
avoids any possible confusion due to races and keeps commit() as similar
to store_state() as possible.

Similarly, do not touch first_committed from store_state.  We should
*only* pull it out of the kv store.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-26 06:55:02 -07:00
Sage Weil
290ccde1dc mon/Paxos: set first_committed on first commit
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-26 06:55:01 -07:00
Gary Lowell
5511daf345 doc: public network statement needed on new monitors.
When using ceph-deploy to create a new monitor on a host that is not
in the initial set of hosts defined by the ceph-deploy new command,
a "public network" statement needs to be added to the ceph.conf file.
Fixes #5195.

Signed-off-by: Gary Lowell  <gary.lowell@inktank.com>
2013-06-26 06:27:17 -07:00
Yehuda Sadeh
af00f73348 rgw: automatic pool creation for placement pools
With the new pools configuration, now we auto create the
pools when needed (through bucket creation). Also, make
sure only to configure default placement in zone structure,
if old config hasn't been done yet.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 22:55:53 -07:00
Sage Weil
fe365339b9 mon/Paxos: never write first_committed except during trim
The trimming is handled by proposing transactions.  Do not confuse matters
by writing (incorrect) first_committed values at any other point.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-25 21:25:04 -07:00
Sage Weil
e93730b7ff mon: enable leveldb cache by default
512 MB sounds reasonable to me.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-25 21:25:04 -07:00
Sage Weil
ad9c294850 mon/Paxos: assert that the store gives us back what we just wrote
In bug #5424 I observed leveldb failing internally and then returning
bad info.  We then hit a random/confusing assert.  Try to detect this
earlier by verifying that a get of a just-written last_committed gives
us back the right thing.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-06-25 21:25:04 -07:00
Sage Weil
11e0325372 mon/Paxos: drop unnecessary last_committed loads
Drop (apparently) ad-hoc refreshes of last_committed from the store.
These are unnecessary and confusing.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-25 21:25:04 -07:00
Sage Weil
d31ed95064 mon/PaxosService: allow paxos service writes while paxos is updating
In commit f985de28f8 I mistakenly made
is_writeable() false while paxos was updating due to a misread of
Paxos::propose_new_value() (I didn't see that it would queue).
This is problematic because it narrows the window during which each service
is writeable for no reason.

Allow service to be writeable both when paxos is active and updating.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-25 21:25:04 -07:00
Sage Weil
2d2aa00ed3 mon/PGMonitor: store PGMap directly in store, bypassing PaxosService stash_full
Instead of encoding incrementals and periodically dumping the whole encoded
PGMap, instead store everything in a range of keys, and update them
between versions using transactions.  The per-version values are now
breadcrumbs indicating which keys were dirtied so they can be refreshed
via update_from_paxos().

This has several benefits:
 - we avoid every encoding the entire PGMap
 - we avoid dumping that blob into leveldb keys
 - we limit the amount of data living in forward-moving keys, which leveldb
   has a hard time compacting away
 - pgmap data instead lives over a fixed range of keys, which leveldb
   excels at
 - we only keep the latest copy of the PGMap (which is all we care about)

Bump the internal monitor protocol version.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-25 21:25:04 -07:00
Yehuda Sadeh
7a2566c60f rgw: remove test placement info
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 20:06:45 -07:00
Yehuda Sadeh
224130c9f7 rgw (test): remove some warnings
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 19:18:51 -07:00
Yehuda Sadeh
1b162ce662 rgw: initialize user system flag
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 17:59:37 -07:00
Yehuda Sadeh
7681c58e03 rgw: log in the same shard for bucket entry point and instance
We'd like to have bucket entry point and instance info at the same
log shard, so that we can process them in order.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 15:30:44 -07:00
Yehuda Sadeh
d4e39a7676 rgw: unlink/link don't always update entry point
Some operations already update the entry point, so no
need to do it again.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 14:59:49 -07:00
Sage Weil
5680fa1e85 doc/release-notes: v0.65
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-25 14:14:39 -07:00
Yehuda Sadeh
6673b2d3aa rgw: tie metadata put to bucket link/unlink
and lot's of constifying

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 14:00:59 -07:00
Yehuda Sadeh
5c3df085c6 cls_rgw: cleanup
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 14:00:45 -07:00
Gary Lowell
70be76b2e2 Merge branch 'next' 2013-06-25 13:45:22 -07:00
Yehuda Sadeh
82db84bec5 rgw: some more internal api cleanups
Use of rgw_bucket when referring to the bucket instance,
use bucket name when referring to the bucket entry point.
Also, remove bucket input param where not needed (internally
was using the bucket structure from the bucket info).

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 12:56:25 -07:00
Yehuda Sadeh
c4be5a7057 rgw: unlink bucket from user on metadata rm bucket:< bucket>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 12:19:17 -07:00
Yehuda Sadeh
86c73c94ff rgw: fixes to object versioning tracking
There are a few different cases for setting the object version.
Either we need to create a new version, or we need to set the
version provided (one metadata put). We also need to make sure
that we log the correct previous version of the object. This
commit fixes a few cases.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 11:09:19 -07:00
Yehuda Sadeh
8bd31d42a2 rgw: filter read xattrs
We're only interested in object xattrs that have specific rgw.user
prefix.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 11:05:15 -07:00
Yehuda Sadeh
422bb6d0ac rgw: add str_startswith()
useful util

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 11:03:12 -07:00
Josh Durgin
0e1612b3c4 Merge pull request #380 from dachary/wip-4907
get_xattr() can return more than 4KB

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-06-25 10:57:41 -07:00
Yehuda Sadeh
8db289f2e2 cls_ver: rename version xattr, add some more logging
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-25 10:36:00 -07:00
Sage Weil
12678a1093 Merge pull request #379 from dachary/wip-5312
skip TEST(EXT4StoreTest, _detect_fs) if DISK or MOUNTPOINT are undefined

Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-25 10:15:10 -07:00
Sage Weil
c8f793694c mon/AuthMonitor: start at format 1 (latest) for new clusters
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-25 09:57:00 -07:00
Sage Weil
950c0f353b mon/PaxosService: move upgrade_format() machinery into PaxosService
We originally did this in AuthMonitor, but it is perfect for PGMonitor too,
so make it generic.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-25 09:57:00 -07:00
Sage Weil
0d73eb4dad mon/PGMonitor: drop some dead code
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-25 09:57:00 -07:00
Sage Weil
0fd776da48 mon/PGMap: make int type explicit
We get away with this because int is 32-bits on x86_64 and i386 both, but
we should be explicit anyway!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-25 09:57:00 -07:00
Sage Weil
29e14bafa4 mon/PaxosService: s/get_version()/get_last_committed()/
Avoid aliasing simple accessors; use a single name instead.  Also, function
name overloading will throw a wrench in the class inheritance later.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-25 09:57:00 -07:00
Gary Lowell
c2d517ef96 v0.65 2013-06-25 09:19:32 -07:00
Loic Dachary
3016f46f53 get_xattr() can return more than 4KB
Instead of failing if the attribute to be returned is larger than 4KB,
double the buffer size each time librados.rados_getxattr returns
-errno.ERANGE and try again.

http://tracker.ceph.com/issues/4907 fixes #4907

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-06-25 16:10:22 +02:00
Loic Dachary
6e320a1bd3 skip TEST(EXT4StoreTest, _detect_fs) if DISK or MOUNTPOINT are undefined
The TEST(EXT4StoreTest, _detect_fs) test is meant to be run from
qa/workunits/filestore/filestore.sh, after the ext4 file system was
created. If the DISK and MOUNTPOINT environment variables are not
defined, display a message explaining the expected environment and
silentely skip the test. The tests in store_test.cc are not unit tests
because they depend on their environment.

http://tracker.ceph.com/issues/5312 fixes #5312

Signed-off-by: Loic Dachary <loic@dachary.org>
2013-06-25 15:09:57 +02:00
Yehuda Sadeh
63e81afeb8 rgw: multiple fixes related to metadata, bucket creation
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
2013-06-24 23:43:50 -07:00