In 313c1566d3 we switched to using the
get_addr() accessor methods, which assert that the osd exists. Check that
before calling.
Fixes: #2361
Signed-off-by: Sage Weil <sage@newdream.net>
Set this up in either global_init() or common_init_finish(), both opportune
times that occur after config parsing has happened and the user has the
option to modify this behavior. The exception would be libraries like
librados, which can't use rados_conf_* to enable this. Arguably flush
functionality should be exposed through the librados API directly, instead
of futzing with on_exit().
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
We may send an MOSDMap as a reply to various requests, including
- a failure report
- a boot message
- a pg_temp message
- an up_thru message
In these cases, send a single MOSDMap message, but limit how big it gets.
All recipients here are osds, which are smart enough to request more maps
based on the MOSDMap::newest_map field.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Compare *every* address for a match, or else note that it is (or might be)
different. Previously, we falsely took diff==0 to mean that all addrs
were definitely equal, which was not necessarily the case.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Fixes#2353. Problem was that there were (at least) two osd processes
that were racing for the fs detection, which triggered some errors
in the btrfs create/remove snapshot.
Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
This mimics the allows you to get and set subsystem debug levels via the
normal config access methods. Among other things, this allows librados
users to set debug levels.
Fixes: #2350
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
We only deal with the case where the entire map is identical, since the
individual items are too small to make the pointer overhead worthwhile.
Too bad. A in-memory btree-like structure would work better for this.
Signed-off-by: Sage Weil <sage@newdream.net>
There are cruft from the old primary/chain/splay replication code. All
current code says <0 is stray, 0 is primary, and >0 is replica. That is,
the role is the acting vector position, or -1 if not in the vector.
Signed-off-by: Sage Weil <sage@newdream.net>
Compare two maps. If an addrs matches, share the reference. If all
addrs match, share the entire vector.
This leads to roughly 70% drop in memory utilization for the set of
thrashed maps I'm working with.
Signed-off-by: Sage Weil <sage@newdream.net>
It is possible that the crush map contains device ids that do not exist as
osds. Filter them out of the CRUSH result.
Drop the max devices assert, as that is trivially violated by adding a new
item to the crush map beyond max_osd (via 'ceph osd crush add ...').
Signed-off-by: Sage Weil <sage@newdream.net>
We share a lot of identical addresses between map versions because they
don't tend to change very often. Instead of having a separate copy for
every map, use shared_ptr and share references. Also use a reference for
the entire addr vector(s) in case no addrs differ at all.
Create new encode/decode macros for vector< shared_ptr<T> >.
Signed-off-by: Sage Weil <sage@newdream.net>
Consider pending changes when calculating the current up/in ratios. Among
other things, this will make the marking of osds down->out stop once it
hits the min in ratio.
Signed-off-by: Sage Weil <sage@newdream.net>