Commit Graph

26290 Commits

Author SHA1 Message Date
Sage Weil
1e99be1569 vstart.sh: make client logs unique
Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-02 20:59:03 -07:00
Sage Weil
eb6d5fcf99 os/LevelDBStore: fix merge loop
We were double-incrementing p, both in the for statement and in the
body.  While we are here, drop the unnecessary else's.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-06-02 18:08:11 -07:00
Sage Weil
d7e2ab1451 mon: fix uninitialized fields in MMonHealth
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-31 21:16:54 -07:00
Sage Weil
f1ccb2d808 mon: start lease timer from peon_init()
In the scenario:

 - leader wins, peons lose
 - leader sees it is too far behind on paxos and bootstraps
 - leader tries to sync with someone, waits for a quorum of the others
 - peons sit around forever waiting

The problem is that they never time out because paxos never issues a lease,
which is the normal timeout that lets them detect a leader failure.

Avoid this by starting the lease timeout as soon as we lose the election.
The timeout callback just does a bootstrap and does not rely on any other
state.

I see one possible danger here: there may be some "normal" cases where the
leader takes a long time to issue its first lease that we currently
tolerate, but won't with this new check in place.  I hope that raising
the lease interval/timeout or reducing the allowed paxos drift will make
that a non-issue.  If it is problematic, we will need a separate explicit
"i am alive" from the leader while it is getting ready to issue the lease
to prevent a live-lock.

Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-31 17:09:19 -07:00
Sage Weil
fb3cd0c2a8 mon: discard messages from disconnected clients
If the client is not connected, discard the message.  They will
reconnect and resend anyway, so there is no point in processing it
twice (now and later).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-31 17:05:17 -07:00
Sage Weil
6b8e74f064 mon/Paxos: adjust trimming defaults up; rename options
- trim more at a time (by an order of magnitude)
- rename fields to paxos_trim_{min,max}; only trim when there are min items
  that are trimmable, and trim at most max items at a time.
- adjust the paxos_service_trim_{min,max} values up by a factor of 2.

Since we are compacting every time we trim, adjusting these up mean less
frequent compactions and less overall work for the monitor.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-31 17:05:03 -07:00
Sage Weil
b2e490413a Merge branch 'wip-osd-leaks' into next
Reviewed-by: David Zafman <david.zafman@inktank.com>
2013-05-31 14:48:51 -07:00
Sage Weil
cec8379800 osd: fix msg leak on shutdown in ms_dispatch
Reported-by: David Zafman <david.zafman@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-31 14:48:33 -07:00
Sage Weil
9865bb460b osd: reset heartbeat peers during shutdown
This fixes a leak of the Connection's and related structures.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-31 14:48:33 -07:00
Sage Weil
923683ff73 mon/MonClient: fix leak of MMonGetVersionReply
Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-31 14:48:32 -07:00
Sage Weil
222059ec28 osd: fix leak of MOSDMarkMeDown
Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-31 14:48:32 -07:00
Sage Weil
0f246a3a90 Merge pull request #338 from alram/next
Reviewed-by: Sage Weil <sage@inktank.com>
2013-05-31 12:47:24 -07:00
Alexandre Marangone
851619ab66 upstart: handle upper case in cluster name and id
Signed-off-by: Alexandre Marangone <alexandre.marangone@inktank.com>
2013-05-31 12:33:11 -07:00
Yehuda Sadeh
c5fc52ae0f rgw: only append prefetched data if reading from head
Fixes: #5209
Backport: bobtail, cuttlefish
If the head object wrongfully contains data, but according to the
manifest we don't read from the head, we shouldn't copy the prefetched
data. Also fix the length calculation for that data.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-31 10:10:51 -07:00
Yehuda Sadeh
b1312f94ed rgw: don't copy object idtag when copying object
Fixes: #5204
When copying object we ended up also copying the original
object idtag which overrode the newly generated one. When
refcount put is called with the wrong idtag the count
does't go down.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-31 10:10:42 -07:00
Sage Weil
df2d06db6f mon: destroy MonitorDBStore before g_ceph_context
Put it on the heap so that we can destroy it before the g_ceph_context
cct that it references.  This fixes a crash like

*** Caught signal (Segmentation fault) **
in thread 4034a80
ceph version 0.63-204-gcf9aa7a (cf9aa7a003)
1: ceph-mon() [0x59932a]
2: (()+0xfcb0) [0x4e41cb0]
3: (Mutex::Lock(bool)+0x1b) [0x6235bb]
4: (PerfCountersCollection::remove(PerfCounters*)+0x27) [0x6a0877]
5: (LevelDBStore::~LevelDBStore()+0x1b) [0x582b2b]
6: (LevelDBStore::~LevelDBStore()+0x9) [0x582da9]
7: (main()+0x1386) [0x48db16]
8: (__libc_start_main()+0xed) [0x658076d]
9: ceph-mon() [0x4909ad]

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 21:43:50 -07:00
Sage Weil
cf9aa7a003 debian: guard upstart {start,stop} with -x check
Sigh.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 17:23:36 -07:00
Sage Weil
a40010534f Merge branch 'wip-deb-removal' into next
Tested by Tamil, Gary.
2013-05-30 17:17:43 -07:00
Sage Weil
38ed3e43f5 Merge pull request #334 from ceph/wip-mon
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-05-30 16:27:02 -07:00
Sage Weil
1d75b49c72 debian: add radosgw.postinst
Start radosgw-all job.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 16:22:54 -07:00
Sage Weil
d126a205ca debian: invoke-rc.d does not work with upstart jobs
Broken by 19c5ac37ef.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 16:22:40 -07:00
Sage Weil
446e0770c7 fix test users of LevelDBStore
Need to pass in cct.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 15:53:35 -07:00
Sage Weil
3cc0f3d803 Merge pull request #335 from ceph/wip-5176
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-05-30 15:04:21 -07:00
Sage Weil
7802292e0a os/LevelDBStore: add perfcounters
Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 14:57:42 -07:00
Sage Weil
a47ca58398 mon: make compaction bounds overlap
When we trim items N to M, compact over range (N-1) to M so that the
items in the queue will share bounds and get merged.  There is no harm in
compacting over a larger range here when the lower bound is a key that
doesn't exist anyway.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 14:36:41 -07:00
Sage Weil
f628dd0e4a os/LevelDBStore: merge adjacent ranges in compactionqueue
If we get behind and multiple adjacent ranges end up in the queue, merge
them so that we fire off compaction on larger ranges.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 14:26:42 -07:00
Sage Weil
1ba1433617 Merge pull request #333 from ceph/wip-5203
Reviewed-by: Sage Weil <sage@inktank.com>
2013-05-30 11:42:45 -07:00
Sage Weil
c888d1d3f1 mon: fix leak of health_monitor and config_key_service
Switch to using regular pointers here.  The lifecycle of these services is
very simple such that refcounting is overkill.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 11:17:04 -07:00
Sage Weil
3c5706163b mon: return instead of exit(3) via preforker
This lets us run all the locally-scoped dtors so that leak checking will
work.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 11:17:04 -07:00
Joao Eduardo Luis
626de387e6 mon: Monitor: backup monmap using all ceph features instead of quorum's
When a monitor is freshly created and for some reason its initial sync is
aborted, it will end up with an incorrect backup monmap.  This monmap is
incorrect in the sense that it will not contain the monitor's names as
it will expect on the next run.

This results from us being using the quorum features to encode the monmap
when backing it up, instead of CEPH_FEATURES_ALL.

Fixes: #5203

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
2013-05-30 18:21:25 +01:00
Sage Weil
59916b8efa debian: stop radosgw daemons on package removal
Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 08:53:22 -07:00
Sage Weil
9e658f0321 debian: stop sysvinit ceph-mds daemons
Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 08:53:05 -07:00
Sage Weil
70a383204b debian: only stop daemons on removea; not upgrade
Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-30 08:51:16 -07:00
Sage Weil
50ac8917f1 osd: initialize new_state field when we use it
If we use operator[] on a new int field its value is undefined; avoid
reading it or using |= et al until we initialize it.

Fixes: #4967
Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
2013-05-29 16:50:04 -07:00
Sage Weil
482733e960 mds: stay in SCAN state in file_eval
If we are in the SCAN state, stay there until the recovery finishes.  Do
not jump to another state from file_eval().

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 0071b8e75b)
2013-05-29 10:28:25 -07:00
Sage Weil
29e4e7e316 osd: do not assume head obc object exists when getting snapdir
For a list-snaps operation on the snapdir, do not assume that the obc for the
head means the object exists.  This fixes a race between a head deletion and
a list-snaps that wrongly returns ENOENT, triggered by the DiffItersateStress
test when thrashing OSDs.

Fixes: #5183
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-05-29 09:49:11 -07:00
Sage Weil
6da4b20ca5 mon: compact trimmed range, not entire prefix
This will reduce the work that leveldb is asked to do by only triggering
compaction of the keys that were just trimmed.

We ma want to further reduce the work by compacting less frequently, but
this is at least a step in that direction.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-29 08:40:32 -07:00
Sage Weil
ab09f1e5c1 mon/MonitorDBStore: allow compaction of ranges
Allow a transaction to describe the compaction of a range of keys.  Do this
in a backward compatible say, such that older code will interpret the
compaction of a prefix + range as compaction of the entire prefix.  This
allows us to avoid introducing any new feature bits.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-29 08:35:44 -07:00
Sage Weil
e20c9a3f79 os/LevelDBStore: allow compaction of key ranges
Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-29 08:34:13 -07:00
Sage Weil
1bb4e7435c mon: disable tdump by default
Grr.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-28 22:13:11 -07:00
Sage Weil
6afc22a158 Merge remote-tracking branch 'gh/last' 2013-05-28 22:10:21 -07:00
Sage Weil
b6be785775 Merge branch 'wip-5172'
Reviewed-by: Samuel Just <sam.just@inktank.com>
2013-05-28 20:44:48 -07:00
Sage Weil
4af917d447 os/LevelDBStore: do compact_prefix() work asynchronously
We generally do not want to block while compacting a range of leveldb.
Push the blocking+waiting off to a separate thread.  (leveldb will do what
it can to avoid blocking internally; no reason for us to wait explicitly.)

This addresses part of #5176.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-28 20:40:28 -07:00
Sage Weil
dd35c26e5b osd: fix note_down_osd
Fix bug introduced in 27381c0c62.

Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-28 20:39:33 -07:00
Sage Weil
45b84f39ba osd: fix hb con failure handler
Fix a few bugs introduced by 27381c0c62:

- check against both front and back cons; either one may have failed.
- close *both* front and back before reopening either.  this is
  overkill, but slightly simpler code.
- fix leak of con when marking down
- handle race against osdmap update and note_down_osd

Fixes: #5172
Signed-off-by: Sage Weil <sage@inktank.com>
2013-05-28 20:39:30 -07:00
Sage Weil
ce6fc2ed87 Merge pull request #319 from dalgaaf/wip-da-pylint-3
Fix some smaller Python issues
2013-05-28 19:52:41 -07:00
Sage Weil
648dcb9240 Merge pull request #326 from dalgaaf/wip-da-CID-727978
kv_flat_btree_async.cc: fix AioCompletion resource leak
2013-05-28 15:48:11 -07:00
Gary Lowell
054e96cf79 v0.63 2013-05-28 13:58:22 -07:00
Samuel Just
5bca9c38ef HashIndex: sync top directory during start_split,merge,col_split
Otherwise, the links might be ordered after the in progress
operation tag write.  We need the in progress operation tag to
correctly recover from an interrupted merge, split, or col_split.

Fixes: #5180
Backport: cuttlefish, bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-05-28 12:47:51 -07:00
Sage Weil
e8f5284026 Merge pull request #325 from dalgaaf/wip-da-CID-727980
kv_flat_btree_async.cc: fix AioCompletion resource leak
2013-05-28 10:27:56 -07:00