Commit Graph

46085 Commits

Author SHA1 Message Date
Loic Dachary
75bd1a51d1 Merge pull request #6679 from suckowbiz/patch-1
Fixed typos

Reviewed-by: Loic Dachary <ldachary@redhat.com>
2015-11-23 17:33:52 +01:00
Sage Weil
4025f75466 doc/release-notes: fix typo
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 11:02:58 -05:00
Sage Weil
efbcd120da doc/release-notes: final v10.0.0 notes
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 11:00:29 -05:00
suckowbiz
5972a44106 doc: fix message typos in systemd
Signed-off-by: Tobias Suckow <tobias@suckow.biz>
2015-11-23 16:50:07 +01:00
Sage Weil
d4694f6a8e Merge branch 'master' of github.com:ceph/ceph 2015-11-23 09:01:30 -05:00
Sage Weil
8631b72590 Merge pull request #6666 from dachary/wip-release-notes
release-notes: draft v10.0.0 release notes
2015-11-23 09:01:48 -05:00
Sage Weil
5135292d95 Merge branch 'wip-bigbang'
Reviewed-by: Joao Eduardo Luis <joao@suse.de>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2015-11-23 08:39:46 -05:00
Sage Weil
9aabc8a9b8 test/mon/osd-crush.sh: escape ceph tell mon.*
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:52 -05:00
Sage Weil
72edab2823 osd: make some of the pg_temp methods/fields private
Reported-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:51 -05:00
Sage Weil
987f68a8df osdc/Objecter: call notify completion only once
If we race with a reconnect we could get a second notify message
before the notify linger op is torn down.  Ensure we only ever
call the notify completion once to prevent a segfault.

Fixes: #13805
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:51 -05:00
Sage Weil
d201c6d93f mon: change mon_osd_min_down_reporters from 1 -> 2
This makes more sense to me.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:51 -05:00
Sage Weil
0269a0c177 mon/OSDMonitor: simplify failure reporters vs reports logic
Since each OSD only sends a failure report for a given peer once,
we don't need to count reports vs reporters separately.  (This was
probably a bad idea anyway.)  Remove this logic and the associated
config option.

Reported-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:51 -05:00
Sage Weil
53f2c7f291 osd: simplify pg creation
We used to have a complicated pg creation process in which we
would query any previous mappings for the pg before we created the
new 'empty' pg locally.  The tracking of the prior mappings was
very simple (and broken), but it didn't really matter because the
mon would resend pg create messages periodically.  Now it doesn't,
so that broke.

However, none of this is necessary: the PG peering process does
all of the same things.  Namely, it

- enumerates past intervals
- determines which ones may have been rw
- queries OSDs from each one to gather any potential changes

This is a more robust version of what the creation code was (or
should have been doing).  So, let's rip it all out and let
peering handle it.  As long as the newly instantiated PG sets
last_epoch_started and _clean to the created epoch we will probe
and consider all of these prior mappings and find any previous
instance of the PG (if one existed).

Yay for removing unnecessary code!

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:51 -05:00
Sage Weil
57121dbe2c mon/MonClient: make _sub_got behave if we "got" old stuff
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:51 -05:00
Sage Weil
ca75e37a30 mon/OSDMonitor: fix oldest_map in send_incremental
This should be the oldest map on the sender (like every other
place that generates an MOSDMap message).

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:51 -05:00
Sage Weil
9864a79abc mon/PGMonitor: avoid useless pg gets when pool is deleted
If the .0 pg no longer exists, we know the entire pool was
deleted, and can avoid querying every other pg.  (This is a good
thing because leveldb and rocksdb can be very slow to query
missing keys.)

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:51 -05:00
Sage Weil
1f4b7141c5 mon/PGMonitor: revamp how pg creates are tracked
Previously we were calculating and managing in-core state that
wasn't committed as part of the pg_map, leading to all sorts of
ugliness that didn't really work.  Instead,

 * set mapping in all creating pgs in the committed pg_map
 * make all pg create message sending be based on committed state
 * update mappings for creating pgs every time we consume a new
   osdmap, so that we have a reliable/stable epoch to attach to
   it.

In particular, having that stable epoch means we have a reference
we can put in the pg create message that will also be used for
the subscription version.  That way OSDs get consistent creates
from any mon.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:51 -05:00
Sage Weil
3ad0c9215f mon/PGMonitor: only send pg create messages to up osds
If the OSD is down it will ignore the message.  If it gets marked up, we
will eventually consume that map and call check_subs().

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:51 -05:00
Sage Weil
23d4df3e01 mon/PGMonitor: only churn mapping_epoch if the primary changes
This results is fewer resent pg create messages.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:50 -05:00
Sage Weil
59123824a9 mon/PGMonitor: a bunch of cosmetic cleanup
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:50 -05:00
Sage Weil
0389763a5b mon/PGMonitor: drop old creating_pgs_by_osd
Obsoleted by creating_pgs_by_osd_epoch.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:50 -05:00
Sage Weil
160a0205c1 osd: reduce mon_subscribe messages
1. MonClient remembers our subscriptions; only indicate we want
osd_pg_creates once, in init.

2. We don't need to re-request the latest osdmap each time we
reconnect.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:50 -05:00
Sage Weil
7fcffe3d9f mon/MonClient: only send new subscriptions
Instead of resending all subscriptions, only send the new ones.  This
avoids races like

 - ask for 4+
 - mon sends maps 4-50
 - ask for 4+ and something else
 - mon has to resend same maps and the other thing

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:50 -05:00
Sage Weil
c85b15234a mon/PGMonitor: send pg creates via persistent subscriptions, not spam
Generate and send pg create messages only for those OSDs who have
subscribed on this monitor.  This is N time more efficient (where there
are N monitors) than the previous method.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:50 -05:00
Sage Weil
0938bf055d mon/PGMonitor: only map and send pg creates post paxos update
These other call sites are no longer needed.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:50 -05:00
Sage Weil
6cbdd6750c mon/PGMonitor: remove map_pg_creates, send_pg_creates commands
These shouldn't be triggered manually.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:50 -05:00
Sage Weil
c1f6eec94b messages/MOSDPGCreate: make it more readable
1- include the epoch
2- drop the 'pg'
3- hide the timestamp

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:49 -05:00
Sage Weil
d3eba9b0af osd: subscribe to all pg creates, not just once on start
We want to know about all future pg creations, not just those pending
when we start.  (This only helps once the mon knows how to do this...)

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:49 -05:00
Sage Weil
dd91837a8e mon/PGMonitor: track creating_pgs_by_osd_epoch
Track pg creations, grouped by the first epoch they mapped to a particular
OSD.  This will be necessary to send messages only for new creations.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:49 -05:00
Sage Weil
2754007c4b mon/PGMap: assert our pg counts don't go negative
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:49 -05:00
Sage Weil
b3b0a95e43 mon/OSDMonitor: do not prime pg_temp for creating pgs
It will be less work for the old primary to ignore the create message
and the new one to query it and find nothing that for the slightly more
complicated peering and removal process to happen.  Also, this reduces
bloat in the OSDMap a bit.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:49 -05:00
Sage Weil
242bf504f1 mon/PGMonitor: note mapping_epoch for creating pgs
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:49 -05:00
Sage Weil
39e06ef8f0 mon: let peon mons send the osdmap replies
Currently the leader mon often replies to OSDs by sending a set of
incremental OSDmaps (e.g., in response to an osd boot or failure).

Instead, send a small message to the proxying peon mon (if any)
with the epoch to start from and let *them* generate a suitable
reply.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:48 -05:00
Sage Weil
05aaa60eb5 msg/simple/Pipe: show keepalives at level 2
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:48 -05:00
Sage Weil
6557b76f84 mon: set mon_subscribe_interval to a day
This is only needed for legacy clients to avoid confusing them--
we don't actually need the renewals at all.  Make them infrequent
to reduce mon load.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:48 -05:00
Sage Weil
26496b9077 mon: only ack subscriptions (and renew) if client or mon is old
Old client expect an ack so they can schedule renewal; send it for
them only.

Old mons expect renewals.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:48 -05:00
Sage Weil
ae9d5ee65c mon: remove old subscribe renewal-based timeouts
This is no longer needed/used.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:48 -05:00
Sage Weil
6f30002485 mon: small cleanup in _ms_dispatch
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:48 -05:00
Sage Weil
e5fc790329 mon: new session_timeout mechanism that is not subscribe-based
Simplify the session liveness detection:

 - renew on any message
 - renew on keepalive[2] messages (lightweight ping in msgr)

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:48 -05:00
Sage Weil
536c70281a msg: make last_keepalive[_ack] lock safe
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:48 -05:00
Sage Weil
fb9dfada02 msg: track stamp of last keepalive[2] received
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:47 -05:00
Sage Weil
d781f48438 common: mirror leveldb default tuning w/ rocksdb
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:47 -05:00
Sage Weil
73bdf0fc04 mon/MonClient: don't send log if we're reconnecting
Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:47 -05:00
Sage Weil
a12dd1b612 mon: disabled rocksdb compression when used as the backend
This significantly reduced CPU utilization on the bigbang scale
testing cluster at CERN.  Note that it is already disabled for
leveldb by default (in ceph_mon.cc).

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:47 -05:00
Sage Weil
7489ec4849 osd: cap adjusted max mon report interval at 2/3 of timeout
This ensures that we don't throttle back mon reports so much that
the mon times out out due to no pg stat reports.  Since there is
little value is having a lower max anyway, just set this at an
upper bound (relative to the mon's timeout value).

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:47 -05:00
Sage Weil
39c1495406 osd: protect mon reporting with mon_report_lock
We need an exclusive lock over paths that update state related to
mon reports, lest they step on fields like up_thru_*, *stats_ack*,
last_mon_report, and so on.  Everybody still needs a read lock
on map_lock too to get a stable OSDMap epoch.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:47 -05:00
Sage Weil
e31b69514a osd: fix reconnect behavior from booting state
We don't need to restart the boot process unless we are in preboot;
if we are in booting state we just need to resend the boot
message.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:38:44 -05:00
Guang Yang
8b5b6c85cc osd: move the monitor report to OSD::tick_without_osd_lock
Fixes: #12722
Reviewed-by: Guang Yang <yguang@yahoo-inc.com>
2015-11-23 08:36:15 -05:00
Guang Yang
7bc4763ed7 osd: _got_mon_epochs - refactor the lock scope to avoid a race (which fail make check)
Reviewed-by: Guang Yang <yguang@yahoo-inc.com>
2015-11-23 08:36:15 -05:00
Sage Weil
21ca0b591a osd: don't send dup subscribes so much
The subscribe MonClient service is stateful--we don't need to
force a new subscribe send unless sub_want() says we need to.

Keep forcing it for instances where we request an *old* map.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-11-23 08:36:15 -05:00