Commit Graph

25854 Commits

Author SHA1 Message Date
Greg Farnum
3cf5824f60 Merge branch 'wip-4837-election-syncing' into next
Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-30 15:39:21 -07:00
Sage Weil
cd1d6fb3f9 ceph-disk: tolerate /sbin/service or /usr/sbin/service
CentOS/RH has it in /sbin, others in /usr/sbin.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-30 14:16:04 -07:00
Joao Eduardo Luis
a97eccadf7 mon: Monitor: disregard paxos_max_join_drift when deciding whether to sync
We should only rely on whether our paxos version is overlap with whatever
they have -- we'll catch up later with them.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-04-30 13:50:40 -07:00
Greg Farnum
a39bbdf32e mon: if we get our own sync_start back, drop it on the floor.
We have timeouts that will clean everything up, and this can happen
in some cases that we've decided are legitimate. Hopefully we'll
be able to do something else later.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-30 13:50:40 -07:00
Greg Farnum
d00b4cd783 Revert "mon: update assert for looser requirements"
We reverted the gating by paxos sequences, so now we don't
need to look at them at all.

This reverts commit 1e6f02b337.
Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-30 13:50:40 -07:00
Greg Farnum
cedcb1934f Revert "mon: when electing, be sure acked leaders have new enough stores to lead"
This was somehow broken -- out-of-date leaders were being elected -- and
we've decided smaller band-aids are more appropriate. We don't completely
revert the MMonElection changes, though -- there have been user clusters
running the code which includes these messages so we can't pretend it
never happened. We can make them clearly unused in the code, though.

This reverts commit fcaabf1a22.

Signed-off-by: Greg Farnum <greg@inktank.com>
2013-04-30 13:50:40 -07:00
Josh Durgin
c2bcc2a60c ObjectCacher: wait for all reads when stopping flusher
Stopping the flusher is essentially the shutdown step for the
ObjectCacher - the next thing is actually destroying it.

If we leave any reads outstanding, when they complete they will
attempt to use the now-destroyed ObjectCacher. This is particularly a
problem with rbd images, since an -ENOENT can instantly complete many
readers, so the upper layers don't wait for the other rados-level
reads of that object to finish before trying to shutdown the cache.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-30 13:47:47 -07:00
Sage Weil
17612a407a Merge branch 'wip-mon-compact' into next
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-04-30 11:49:31 -07:00
Greg Farnum
6ae9bbb5d0 elector: trigger a mon reset whenever we bump the epoch
We need to call reset during every election cycle; luckily we
can call it more than once. bump_epoch is (by definition!) only called
once per cycle, and it's called at the beginning, so we put it there.

Fixes #4858.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-30 11:01:54 -07:00
David Zafman
53a2c64ff1 Merge branch 'wip-2209' into next
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-04-30 10:55:12 -07:00
Sage Weil
0acede3bff mon: change leveldb block size to 64K
#leveldb on freenode says > 2MB is nonsense (it might explain the weird
behavior we saw).  Riak tuning guide suggests 256KB for large data block
environments.  Default is 8KB.  64KB seems sane for us.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-30 10:26:24 -07:00
Sage Weil
7d4c0dcfe4 Merge branch 'next' 2013-04-29 20:58:15 -07:00
John Wilkins
6f2a7df4b0 doc: Fix typo.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-29 18:57:05 -07:00
John Wilkins
35a9823449 doc: Added reference to transition from mkcephfs to ceph-deploy.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-29 18:54:04 -07:00
John Wilkins
de31b61864 doc: Updated index for new pages. Added inner table.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-29 18:53:37 -07:00
John Wilkins
fa9f17c5f9 doc: Added transition from mkcephfs to ceph-deploy page.
fixes: #4756

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-29 18:53:12 -07:00
John Wilkins
02853c5e62 doc: Added purge page to ceph-deploy.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-29 18:52:16 -07:00
John Wilkins
45d12f12ef doc: Added OSD page to ceph-deploy.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-29 18:51:46 -07:00
John Wilkins
0b912f46f1 doc: Added mds page for ceph-deploy.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-29 18:51:23 -07:00
John Wilkins
3c46c519c8 doc: Added admin tasks page for ceph-deploy.
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
2013-04-29 18:51:05 -07:00
David Zafman
1c15636b22 Set num_rd, num_wr_kb and num_wr in various places that needed it
Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-04-29 17:46:15 -07:00
David Zafman
adb7c8a060 osd: read kb stats not tracked?
In read cases track stats in PG::unstable_stats
Include unstable_stats in write_info() and publish_stats_to_osd()
For now this information may not get persisted

fixes: #2209

Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-04-29 17:46:15 -07:00
David Zafman
b5e246106d osd: Rename members and methods related to stat publish
pg_stats_lock to pg_stats_publish_lock
pg_stats_valid to pg_stats_publish_valid
pg_stats_stable to pg_stats_publish
update_stats() to publish_stats_to_osd()
clear_stats() to clear_publish_stats()

Signed-off-by: David Zafman <david.zafman@inktank.com>
2013-04-29 17:46:15 -07:00
Sage Weil
bd68b82bd6 mon: enable 'mon compact on trim' by default; trim in larger increments
This resolves the leveldb growth-without-bound problem observed by
mikedawson, and all the badness that stems from it.  Enable this by
default until we figure out why leveldb is not behaving better.

While we are at it, trim more states at a time.  This will make
compaction less frequent, which should help given that there is some
overhead unrelated to the amount of deleted data.

Fixes: #4815
Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 17:20:39 -07:00
Sage Weil
95ece01251 Merge pull request #249 from ceph/wip-cuttle-man
man page updates

Reviewed-by: Sage Weil <sage@inktank.com>
2013-04-29 17:09:37 -07:00
Sage Weil
929a9944c9 mon: share extra probe peers with debug log, mon_status
This is useful when debugging initial quorum formation.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 17:08:04 -07:00
Sage Weil
030bf8aaa1 debian: only start/stop upstart jobs if upstart is present
This avoids errors on non-upstart distros (like wheezy).

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 17:01:55 -07:00
Sage Weil
5d20c39caa Merge remote-tracking branch 'gh/wip-up' into next
Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-04-29 16:57:13 -07:00
Sage Weil
4b9325b2b3 Merge pull request #248 from ctrlaltdel/next
Fix a README typo
2013-04-29 16:46:52 -07:00
Josh Durgin
23c591ed99 Merge pull request #244 from dalgaaf/wip-da-pylint-2
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-29 16:20:42 -07:00
Josh Durgin
825a43176b man: update remaining copyright notices
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-29 16:01:38 -07:00
Josh Durgin
4abf081495 man: refresh content from rst
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-29 16:01:03 -07:00
Samuel Just
2b5dda0e6a Merge branch 'wip_4860' into next
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
2013-04-29 15:57:29 -07:00
Samuel Just
1bd011a101 PG,OSD: _remove_pg must remove pg keys
Instead of doing this in OSD::_remove_pg, pass a transaction
to on_removal and do it in PG.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-29 15:56:54 -07:00
Samuel Just
714601261b OSD: no need to remove snapdirs on _remove_pg()
The snapmapper patches removed snapdirs altogether.

Signed-off-by: Samuel Just <sam.just@inktank.com>
2013-04-29 15:56:19 -07:00
Sage Weil
8f6a1b8fa9 mon/Paxos: compact on trim
Compact the paxos keys when we trim old paxos states.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 15:45:58 -07:00
Sage Weil
3cb4f6783b mon: compact PaxosService prefix on trim
Each time we trim a PaxosService, have leveldb compact so that the
space from removed states is reclaimed.

This is probably not optimal if leveldb's heuristics are doing the right
thing, but it currently appears as if they are not.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 15:45:56 -07:00
Sage Weil
e8c9824102 mon: add compact_prefix transaction operation
Add a prefix compaction opteration to the transaction that will be
performed after the transaction applies.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 15:45:41 -07:00
Sage Weil
a2f7d1d1f1 leveldb: add compact_prefix method
Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 15:45:41 -07:00
Sage Weil
90b6b6df31 mon: compact leveldb on bootstrap
This is an opportunistic time to optimize our local data since we are
out of quorum.  It serves as a safety net for cases where leveldb's
automatic compaction doesn't work quite right and lets things get out
of hand.

Anecdotally we have seen stores in excess of 30GB compact down to a few
hundred KB.  And a 9GB store compact down to 900MB in only 1 minute.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 15:45:39 -07:00
Sage Weil
ee3cdaa86c mon: compact leveldb on bootstrap
This is an opportunistic time to optimize our local data since we are
out of quorum.  It serves as a safety net for cases where leveldb's
automatic compaction doesn't work quite right and lets things get out
of hand.

Anecdotally we have seen stores in excess of 30GB compact down to a few
hundred KB.  And a 9GB store compact down to 900MB in only 1 minute.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 15:45:17 -07:00
Sage Weil
5fa0f04852 mon: --compact argument, config option to compact the store on start
Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 15:44:58 -07:00
Sage Weil
6a00f33251 leveldb: add compact() method
This will compact the entire store; it will be slow!

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 15:43:47 -07:00
Josh Durgin
ffc8557acd doc: update rbd man page for new options
--no-progress and --allow-shrink were added recently.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-29 15:37:06 -07:00
Samuel Just
8b2a1475b0 gitignore: add ceph_monstore_tool
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-04-29 15:05:37 -07:00
Sage Weil
29831f9662 Makefile: fix java build warning
This is a workaround that makes the warning go away.  Not certain there
isn't something we should be changing...

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joe Buck <joe.buck@inktank.com>
2013-04-29 14:50:41 -07:00
Sage Weil
6a5be251df Merge branch 'wip-mon-pg' into next
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-04-29 11:27:22 -07:00
Sage Weil
a2fe013794 mon: remap creating pgs on startup
After Monitor::init_paxos() has loaded all of the PaxosService state,
we should then map creating pgs to osds.  This ensures we do so after the
osdmap has been loaded and the pgs actually map somewhere meaningful.

Fixes: #4675
Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 11:11:27 -07:00
Sage Weil
278186d750 mon: only map/send pg creations if osdmap is defined
This avoids calculating new pg creation mappings if the osdmap isn't
loaded yet, which currently happens when during Monitor::paxos_init()
on startup.  Assuming osdmap epoch is nonzero, it should always be
safe to do this (although possibly unnecessary).

More cleanup here is certainly possible, but this is one step toward fixing
the bad behavior for #4675.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 11:11:24 -07:00
Sage Weil
28d495a371 mon: factor map_pg_creates() out of send_pg_creates()
Factor out the portion of the function that remaps creating pgs to osds
from the part that sends those pending creates out.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-04-29 11:07:08 -07:00