Commit Graph

35056 Commits

Author SHA1 Message Date
Sage Weil
675b0042ef mon: add a cluster fingerprint
Generate it on cluster creations with the initial monmap.  Include it in
the report.  Provide no way for this uuid to be fed in to the cluster
(intentionally or not) so that it can be assumed to be a truly unique
identifier for the cluster.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-21 11:14:46 -07:00
Sage Weil
f71c8898e4 Merge pull request #2282 from dachary/wip-9153-jerasure-upgrade
erasure-code: preload the jerasure plugin

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-20 10:08:39 -07:00
Dan Mick
790de974a8 doc/start/quick-ceph-deploy: missing {ceph-node} from mon create-initial
Signed-off-by: Dan Mick <dan.mick@inktank.com>
2014-08-19 21:24:12 -07:00
Sage Weil
b3624500e1 Merge pull request #2283 from somnathr/wip-sd-9145
CollectionIndex: Collection name is added to the access_lock name

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-19 20:56:06 -07:00
Somnath Roy
615d2d9040 CollectionIndex: Collection name is added to the access_lock name
The CollectionIndex constructor is changed to accept the coll_t
so that the collection name can be used to form access_lock(RWLock)
name.This is needed otherwise lockdep will report a recursive lock error
and assert. lockdep needs unique lock names for each Index object.

Fixes: #9145

Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
2014-08-19 18:50:01 -07:00
Loic Dachary
9b802701f7 erasure-code: preload the jerasure plugin
Load the jerasure plugin when ceph-osd starts to avoid the following
scenario:

* ceph-osd-v1 is running but did not load jerasure

* ceph-osd-v2 is installed being installed but takes time : the files
  are installed before ceph-osd is restarted

* ceph-osd-v1 is required to handle an erasure coded placement group and
  loads jerasure (the v2 version which is not API compatible)

* ceph-osd-v1 calls the v2 jerasure plugin and does not reference the
  expected part of the code and crashes

Although this problem shows in the context of teuthology, it is unlikely
to happen on a real cluster because it involves upgrading immediately
after installing and running an OSD. Once it is backported to firefly,
it will not even happen in teuthology tests because the upgrade from
firefly to master will use the firefly version including this fix.

While it would be possible to walk the plugin directory and preload
whatever it contains, that would not work for plugins such as jerasure
that load other plugins depending on the CPU features, or even plugins
such as isa which only work on specific CPU.

http://tracker.ceph.com/issues/9153 Fixes: #9153

Backport: firefly
Signed-off-by: Loic Dachary <loic-201408@dachary.org>
2014-08-20 02:31:32 +02:00
Samuel Just
bb77e3af0e Merge pull request #2043 from guangyy/wip-pg-splitting
Support 'expected_num_objects' parameter when creating pool for pg folder splitting

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-19 15:45:31 -07:00
Sage Weil
fc41273495 mon: fix signed/unsigned warnings
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-19 14:33:54 -07:00
Gregory Farnum
23a9b76387 Merge pull request #2287 from ceph/wip-reweight-tunables
mon: make reweight-by-* sanity limits configurable

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-08-19 13:06:08 -07:00
Gregory Farnum
d9cf299134 Merge pull request #2279 from ceph/wip-hadoop
fix and reorg hadoop workunits

Reviewed-by: Greg Farnum <greg@inktank.com>
2014-08-19 11:47:07 -07:00
Sage Weil
82409ee644 mon: make reweight-by-* sanity limits configurable
Also drop the somewhat redundant osd_sum.kb check; the main thing we care
about here is

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-19 11:32:07 -07:00
Sage Weil
c36b72c1d1 Merge pull request #2199 from ceph/wip-reweight
mon: allow reweighting of osds by pg (isntead of bytes used)

Reviewed-by: Guang Yang <yguang@yahoo-inc.com>
2014-08-19 10:40:42 -07:00
Sage Weil
33048410c8 mon/OSDMonitor: respect CRUSH weights for reweight-by-pg
Do not assume that all OSDs are weighted equally for reweight-by-pg.

Note that reweight-by-utilization already reweights based on the size of
the OSD volume; we presume that this is already reflected by the CRUSH
weights.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-19 08:16:56 -07:00
Sage Weil
1ecf44eb57 mon/OSDMonitor: reweight-by-pg for pool(s)
Allow the reweight-by-pg to look at a specific set of pools.  If the list
is ommitted, use PGs from all pools.  This allows you to focus on a
specific pool (the one that will dominate data usage).  Otherwise things
may not be quite right because other pools may have PGs that contain
much less data.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-19 08:16:55 -07:00
Sage Weil
8b971e94d4 mon/OSDMonitor: adjust weights up, when possible
Note when OSDs are underloaded, as well.  If that is the case, adjust the
OSD reweight value if, if possible.  (It won't always be possible since
weights are capped at 1.)

Note that we set the underload threshold to the average, as we want to
aggressively adjust weights up (back to 1.0) whenever possible.  This gets
us a more efficient mapping calculation and reduces the amount of "noise"
in the weights.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-19 08:16:41 -07:00
Sage Weil
977f85279f qa/workunits/cephtool/test.sh: test reweight-by-pg
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-19 08:16:41 -07:00
Sage Weil
01cb40547c mon/OSDMonitor: reweight-by-pg
This is just like reweight-by-utilization, but looks purely at the PG to
OSD mapping, not at the number of bytes used on the target disks.  This
allows the reweighting to be done before any data is written into the
cluster, when no data will need to migrate as a result of the reweight.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-19 08:16:39 -07:00
Guang Yang
dbf624e1a2 Add tests for the collection hint OP: 1) Store Test 2) Idempotent Test.
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
2014-08-19 07:10:47 +00:00
Guang Yang
ad6a2be402 Implement the collection hint transaction, add a new transation type as expected number of objects.
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
2014-08-19 07:10:47 +00:00
Guang Yang
7d266d1304 Add a new transaction OP (collection hint) to ObjectStore.
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
2014-08-19 07:08:51 +00:00
Guang Yang
35f323d99a Add a new monitor command to let user specify the expected number of objects during pool creation.
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
2014-08-19 07:08:51 +00:00
Guang Yang
da37273de7 Add a new field 'expected_num_objects' to pg_pool_t which denotes the expected number of objects on this pool.
Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
2014-08-19 07:08:51 +00:00
Sage Weil
92b227e1c0 Merge remote-tracking branch 'gh/next' 2014-08-18 21:10:32 -07:00
John Wilkins
ab886c4a0b doc: Removed quick guide and wireshark from top-level IA.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-08-18 14:29:09 -07:00
John Wilkins
acee2e5833 doc: Move wireshark documentation to dev.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2014-08-18 14:28:38 -07:00
Sage Weil
ce6e9a916b doc/release-notes: v0.84
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-18 11:57:59 -07:00
Sage Weil
a59bc86594 Merge pull request #2280 from ceph/wip-fs-docs
doc: add notes on using "ceph fs new"

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-18 10:04:41 -07:00
john
b016f84682 doc: add notes on using "ceph fs new"
Signed-off-by: John Spray <john.spray@redhat.com>
2014-08-18 17:47:31 +01:00
Jenkins
8336f81c5c 0.84 2014-08-18 09:02:20 -07:00
Sage Weil
bda230186f qa/workunits/rbd/qemu-iotests: touch common.env
This seems to be necessary on trusty.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 055be68cf8)
2014-08-18 08:47:36 -07:00
Sage Weil
1dc1fb8a60 qa/workunits/hadoop: move all hadoop tests into a hadoop/ dir
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-18 08:39:14 -07:00
Sage Weil
3d3fcc98be qa/workunits/hadoop-wordcount: fix/use -rmr command
-rm -r -f ... doesn't seem to work; use -rmr instead.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-18 08:38:10 -07:00
Sage Weil
adaf5a6a88 qa/workunits/hadoop-wordcount: use -x
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-18 08:37:38 -07:00
Sage Weil
055be68cf8 qa/workunits/rbd/qemu-iotests: touch common.env
This seems to be necessary on trusty.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-17 20:54:28 -07:00
Sage Weil
313e60b360 Merge pull request #2010 from ceph/wip-misplaced
osd: track misplaced objects separately from degraded objects

Reviewed-by: Samuel Just <sam.just@inktank.com>
2014-08-17 20:49:05 -07:00
Sage Weil
5045c5cb4c qa/workunits/rest/test.py: use rbd instead of data pool for size tests
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 22:07:56 -07:00
Sage Weil
3279f3e737 qa/workunits/rest/test.py: do snap test on our data2/3 pool
This way it works when a 'data' pool doesn't already exist.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 22:07:56 -07:00
Sage Weil
6d7a229c14 qa/workunits/rest/test.py: fix rd_kb -> rd_bytes
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 22:07:56 -07:00
Sage Weil
284647f350 Merge pull request #2272 from ceph/wip-8621
Wip 8621

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-16 22:04:13 -07:00
Sage Weil
0e07f7f045 osd: fix theoretical use-after-free of OSDMap
In practice, the map will remain pinned for a while, but this
will make coverity happy.

*** CID 1231685:  Use after free  (USE_AFTER_FREE)
/osd/OSD.cc: 6223 in OSD::handle_osd_map(MOSDMap *)()
6217
6218           if (o->test_flag(CEPH_OSDMAP_FULL))
6219            last_marked_full = e;
6220           pinned_maps.push_back(add_map(o));
6221
6222           bufferlist fbl;
>>>     CID 1231685:  Use after free  (USE_AFTER_FREE)
>>>     Calling "encode" dereferences freed pointer "o".
6223           o->encode(fbl);
6224
6225           hobject_t fulloid = get_osdmap_pobject_name(e);
6226           t.write(coll_t::META_COLL, fulloid, 0, fbl.length(), fbl);
6227           pin_map_bl(e, fbl);
6228           continue;

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 14:51:31 -07:00
Sage Weil
44a0e3766a Merge pull request #2259 from ceph/wip-9039
Wip 9039

Reviewed-by: Sage Weil <sage@redhat.com>
2014-08-16 13:41:41 -07:00
Sage Weil
904a5f1c31 vstart.sh: make filestore fd cache size smaller
I hit the fd limit on a vstart cluster with the default 128; reduce this
to 16.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:19:46 -07:00
Sage Weil
932e478783 mon: track stuck undersized
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:19:46 -07:00
Sage Weil
190dc2f38f mon: track pgs that get stuck degraded
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:19:46 -07:00
Sage Weil
5168907fe2 osd: track last_fullsized in pg_stat_t
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:19:46 -07:00
Sage Weil
dbc3f65046 osd: track last_undegraded pg stat
Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:19:46 -07:00
Sage Weil
1907066fee osd/osd_types: add last_undegraded, last_undersized to pg_stat_t
Keep track of the last time the PG was known to not be degraded or
undersized.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:19:45 -07:00
Sage Weil
6d6767d34c osd/PG: track PG_STATE_UNDERSIZED separately from DEGRADED
DEGRADED means there are objects without complete reduncancy; also check
for needs_recovery().

UNDERSIZED means acting set is too small.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:18:54 -07:00
Sage Weil
b037e47a36 osd: add PG_STATE_UNDERSIZED
This is a distinct concept from degraded.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:18:54 -07:00
Sage Weil
6c0a213436 osd/PG: account for misplaces separately than degraded
A degraded object does not have enough replicas or shards, while a
misplaced object is not stored in the correct place.  Account for them
separately.

Signed-off-by: Sage Weil <sage@redhat.com>
2014-08-16 13:18:54 -07:00