Merge branch 'wip-quorum'

This commit is contained in:
Sage Weil 2012-05-22 17:13:28 -07:00
commit 55c21de537
34 changed files with 1162 additions and 201 deletions

199
doc/dev/mon-bootstrap.rst Normal file
View File

@ -0,0 +1,199 @@
===================
Monitor bootstrap
===================
Terminology:
* ``cluster``: a set of monitors
* ``quorum``: an active set of monitors consisting of a majority of the cluster
In order to initialize a new monitor, it must always be fed:
#. a logical name
#. secret keys
#. a cluster fsid (uuid)
In addition, a monitor needs to know two things:
#. what address to bind to
#. who its peers are (if any)
There are a range of ways to do both.
Logical id
==========
The logical id should be unique across the cluster. It will be
appended to ``mon.`` to logically describe the monitor in the Ceph
cluster. For example, if the logical id is ``foo``, the monitor's
name will be ``mon.foo``.
For most users, there is no more than one monitor per host, which
makes the short hostname logical choice.
Secret keys
===========
The ``mon.`` secret key is stored a ``keyring`` file in the ``mon data`` directory. It can be generated
with a command like::
ceph-authtool --create /path/to/keyring --gen-key -n mon.
When creating a new monitor cluster, the keyring should also contain a ``client.admin`` key that can be used
to administer the system::
ceph-authtool /path/to/keyring --gen-key -n client.admin
The resulting keyring is fed to ``ceph-mon --mkfs`` with the ``--keyring <keyring>`` command-line argument.
Cluster fsid
============
The cluster fsid is a normal uuid, like that generated by the ``uuidgen`` command. It
can be provided to the monitor in two ways:
#. via the ``--fsid <uuid>`` command-line argument (or config file option)
#. via a monmap provided to the new monitor via the ``--monmap <path>`` command-line argument.
Monitor address
===============
The monitor address can be provided in several ways.
#. via the ``--public-addr <ip[:port]>`` command-line option (or config file option)
#. via the ``--public-network <cidr>`` command-line option (or config file option)
#. via the monmap provided via ``--monmap <path>``, if it includes a monitor with our name
#. via the bootstrap monmap (provided via ``--monmap <path>`` or generated from ``--mon-host <list>``) if it includes a monitor with no name (``noname-<something>``) and an address configured on the local host.
Peers
=====
The monitor peers are provided in several ways:
#. via the initial monmap, provided via ``--monmap <filename>``
#. via the bootstrap monmap generated from ``--mon-host <list>``
#. via the bootstrap monmap generated from ``[mon.*]`` sections with ``mon addr`` in the config file
#. dynamically via the admin socket
However, these methods are not completely interchangeable because of
the complexity of creating a new monitor cluster without danger of
races.
Cluster creation
================
There are three basic approaches to creating a cluster:
#. Create a new cluster by specifying the monitor names and addresses ahead of time.
#. Create a new cluster by specifying the monitor names ahead of time, and dynamically setting the addresses as ``ceph-mon`` daemons configure themselves.
#. Create a new cluster by specifying the monitor addresses ahead of time.
Names and addresses
-------------------
Generate a monmap using ``monmaptool`` with the names and addresses of the initial
monitors. The generated monmap will also include a cluster fsid. Feed that monmap
to each monitor daemon::
ceph-mon --mkfs -i <name> --monmap <initial_monmap> --keyring <initial_keyring>
When the daemons start, they will know exactly who they and their peers are.
Addresses only
--------------
The initial monitor addresses can be specified with the ``mon host`` configuration value,
either via a config file or the command-line argument. This method has the advantage that
a single global config file for the cluster can have a line like::
mon host = a.foo.com, b.foo.com, c.foo.com
and will also serve to inform any ceph clients or daemons who the monitors are.
The ``ceph-mon`` daemons will need to be fed the initial keyring and cluster fsid to
initialize themselves:
ceph-mon --mkfs -i <name> --fsid <uuid> --keyring <initial_keyring>
When the daemons first start up, they will share their names with each other and form a
new cluster.
Names only
----------
In dynamic "cloud" environments, the cluster creator may not (yet)
know what the addresses of the monitors are going to be. Instead,
they may want machines to configure and start themselves in parallel
and, as they come up, form a new cluster on their own. The problem is
that the monitor cluster relies on strict majorities to keep itself
consistent, and in order to "create" a new cluster, it needs to know
what the *initial* set of monitors will be.
This can be done with the ``mon initial members`` config option, which
should list the ids of the initial monitors that are allowed to create
the cluster::
mon initial members = foo, bar, baz
The monitors can then be initialized by providing the other pieces of
information (they keyring, cluster fsid, and a way of determining
their own address). For example::
ceph-mon --mkfs -i <name> --mon-initial-hosts 'foo,bar,baz' --keyring <initial_keyring> --public-addr <ip>
When these daemons are started, they will know their own address, but
not their peers. They can learn those addresses via the admin socket::
ceph --admin-daemon /var/run/ceph/mon.<id>.asok add_bootstrap_peer_hint <peer ip>
Once they learn enough of their peers from the initial member set,
they will be able to create the cluster.
Cluster expansion
=================
Cluster expansion is slightly less demanding than creation, because
the creation of the initial quorum is not an issue and there is no
worry about creating separately independent clusters.
New nodes can be forced to join an existing cluster in two ways:
#. by providing no initial monitor peers addresses, and feeding them dynamically.
#. by specifying the ``mon initial members`` config option to prevent the new nodes from forming a new, independent cluster, and feeding some existing monitors via any available method.
Initially peerless expansion
----------------------------
Create a new monitor and give it no peer addresses other than it's own. For
example::
ceph-mon --mkfs -i <myid> --fsid <fsid> --keyring <mon secret key> --public-addr <ip>
Once the daemon starts, you can give it one or more peer addresses to join with::
ceph --admin-daemon /var/run/ceph/mon.<id>.asok add_bootstrap_peer_hint <peer ip>
This monitor will never participate in cluster creation; it can only join an existing
cluster.
Expanding with initial members
------------------------------
You can feed the new monitor some peer addresses initially and avoid badness by also
setting ``mon initial members``. For example::
ceph-mon --mkfs -i <myid> --fsid <fsid> --keyring <mon secret key> --public-addr <ip> --mon-host foo,bar,baz
When the daemon is started, ``mon initial members`` must be set via the command line or config file::
ceph-mon -i <myid> --mon-initial-members foo,bar,baz
to prevent any risk of split-brain.

View File

@ -44,6 +44,22 @@ Options
will create a new monitor map with a new UUID (and with it, a new,
empty Ceph file system).
.. option:: --generate
generate a new monmap based on the values on the command line or specified
in the ceph configuration. This is, in order of preference,
#. ``--monmap filename`` to specify a monmap to load
#. ``--mon-host 'host1,ip2'`` to specify a list of hosts or ip addresses
#. ``[mon.foo]`` sections containing ``mon addr`` settings in the config
.. option:: --filter-initial-members
filter the initial monmap by applying the ``mon initial members``
setting. Monitors not present in that list will be removed, and
initial members not present in the map will be added with dummy
addresses.
.. option:: --add name ip:port
will add a monitor with the specified ip:port to the map.

29
qa/mon/bootstrap/host.sh Executable file
View File

@ -0,0 +1,29 @@
#!/bin/sh -ex
cwd=`pwd`
cat > conf <<EOF
[global]
mon host = 127.0.0.1:6789
[mon]
admin socket =
log file = $cwd/\$name.log
debug mon = 20
debug ms = 1
EOF
rm -f mm
fsid=`uuidgen`
rm -f keyring
ceph-authtool --create-keyring keyring --gen-key -n client.admin
ceph-authtool keyring --gen-key -n mon.
ceph-mon -c conf -i a --mkfs --fsid $fsid --mon-data mon.a -k keyring
ceph-mon -c conf -i a --mon-data $cwd/mon.a
ceph -c conf -k keyring health
killall ceph-mon
echo OK

View File

@ -0,0 +1,39 @@
#!/bin/sh -ex
cwd=`pwd`
cat > conf <<EOF
[mon]
admin socket =
log file = $cwd/\$name.log
debug mon = 20
debug ms = 1
mon initial members = a,b,d
EOF
rm -f mm
monmaptool --create mm \
--add a 127.0.0.1:6789 \
--add b 127.0.0.1:6790 \
--add c 127.0.0.1:6791
rm -f keyring
ceph-authtool --create-keyring keyring --gen-key -n client.admin
ceph-authtool keyring --gen-key -n mon.
ceph-mon -c conf -i a --mkfs --monmap mm --mon-data $cwd/mon.a -k keyring
ceph-mon -c conf -i b --mkfs --monmap mm --mon-data $cwd/mon.b -k keyring
ceph-mon -c conf -i c --mkfs --monmap mm --mon-data $cwd/mon.c -k keyring
ceph-mon -c conf -i a --mon-data $cwd/mon.a
ceph-mon -c conf -i c --mon-data $cwd/mon.b
ceph-mon -c conf -i b --mon-data $cwd/mon.c
ceph -c conf -k keyring --monmap mm health
ceph -c conf -k keyring --monmap mm health
if ceph -c conf -k keyring --monmap mm mon stat | grep a= | grep b= | grep c= ; then
break
fi
killall ceph-mon
echo OK

View File

@ -0,0 +1,66 @@
#!/bin/sh -ex
cwd=`pwd`
cat > conf <<EOF
[mon]
log file = $cwd/\$name.log
debug mon = 20
debug ms = 1
debug asok = 20
mon initial members = a,b,d
admin socket = $cwd/\$name.asok
EOF
rm -f mm
fsid=`uuidgen`
rm -f keyring
ceph-authtool --create-keyring keyring --gen-key -n client.admin
ceph-authtool keyring --gen-key -n mon.
ceph-mon -c conf -i a --mkfs --fsid $fsid --mon-data $cwd/mon.a -k keyring
ceph-mon -c conf -i b --mkfs --fsid $fsid --mon-data $cwd/mon.b -k keyring
ceph-mon -c conf -i c --mkfs --fsid $fsid --mon-data $cwd/mon.c -k keyring
ceph-mon -c conf -i a --mon-data $cwd/mon.a --public-addr 127.0.0.1:6789
ceph-mon -c conf -i b --mon-data $cwd/mon.c --public-addr 127.0.0.1:6790
ceph-mon -c conf -i c --mon-data $cwd/mon.b --public-addr 127.0.0.1:6791
sleep 1
if timeout 5 ceph -c conf -k keyring -m localhost mon stat | grep "a,b,c" ; then
echo WTF
exit 1
fi
ceph --admin-daemon mon.a.asok add_bootstrap_peer_hint 127.0.0.1:6790
while true; do
if ceph -c conf -k keyring -m 127.0.0.1 mon stat | grep 'a,b'; then
break
fi
sleep 1
done
ceph --admin-daemon mon.c.asok add_bootstrap_peer_hint 127.0.0.1:6790
while true; do
if ceph -c conf -k keyring -m 127.0.0.1 mon stat | grep 'a,b,c'; then
break
fi
sleep 1
done
ceph-mon -c conf -i d --mkfs --fsid $fsid --mon-data $cwd/mon.d -k keyring
ceph-mon -c conf -i d --mon-data $cwd/mon.d --public-addr 127.0.0.1:6792
ceph --admin-daemon mon.d.asok add_bootstrap_peer_hint 127.0.0.1:6790
while true; do
if ceph -c conf -k keyring -m 127.0.0.1 mon stat | grep 'a,b,c,d'; then
break
fi
sleep 1
done
killall ceph-mon
echo OK

36
qa/mon/bootstrap/simple.sh Executable file
View File

@ -0,0 +1,36 @@
#!/bin/sh -e
cwd=`pwd`
cat > conf <<EOF
[mon]
admin socket =
EOF
rm -f mm
monmaptool --create mm \
--add a 127.0.0.1:6789 \
--add b 127.0.0.1:6790 \
--add c 127.0.0.1:6791
rm -f keyring
ceph-authtool --create-keyring keyring --gen-key -n client.admin
ceph-authtool keyring --gen-key -n mon.
ceph-mon -c conf -i a --mkfs --monmap mm --mon-data $cwd/mon.a -k keyring
ceph-mon -c conf -i b --mkfs --monmap mm --mon-data $cwd/mon.b -k keyring
ceph-mon -c conf -i c --mkfs --monmap mm --mon-data $cwd/mon.c -k keyring
ceph-mon -c conf -i a --mon-data $cwd/mon.a
ceph-mon -c conf -i c --mon-data $cwd/mon.b
ceph-mon -c conf -i b --mon-data $cwd/mon.c
while true; do
ceph -c conf -k keyring --monmap mm health
if ceph -c conf -k keyring --monmap mm mon stat | grep 'quorum 0,1,2'; then
break
fi
sleep 1
done
killall ceph-mon
echo OK

View File

@ -0,0 +1,60 @@
#!/bin/sh -ex
cwd=`pwd`
cat > conf <<EOF
[mon]
admin socket =
log file = $cwd/\$name.log
debug mon = 20
debug ms = 1
EOF
rm -f mm
monmaptool --create mm \
--add a 127.0.0.1:6789 \
--add b 127.0.0.1:6790 \
--add c 127.0.0.1:6791
rm -f keyring
ceph-authtool --create-keyring keyring --gen-key -n client.admin
ceph-authtool keyring --gen-key -n mon.
ceph-mon -c conf -i a --mkfs --monmap mm --mon-data $cwd/mon.a -k keyring
ceph-mon -c conf -i b --mkfs --monmap mm --mon-data $cwd/mon.b -k keyring
ceph-mon -c conf -i c --mkfs --monmap mm --mon-data $cwd/mon.c -k keyring
ceph-mon -c conf -i a --mon-data $cwd/mon.a
ceph-mon -c conf -i c --mon-data $cwd/mon.b
ceph-mon -c conf -i b --mon-data $cwd/mon.c
ceph -c conf -k keyring --monmap mm health
## expand via a kludged monmap
monmaptool mm --add d 127.0.0.1:6792
ceph-mon -c conf -i d --mkfs --monmap mm --mon-data $cwd/mon.d -k keyring
ceph-mon -c conf -i d --mon-data $cwd/mon.d
while true; do
ceph -c conf -k keyring --monmap mm health
if ceph -c conf -k keyring --monmap mm mon stat | grep 'quorum 0,1,2,3'; then
break
fi
sleep 1
done
# again
monmaptool mm --add e 127.0.0.1:6793
ceph-mon -c conf -i e --mkfs --monmap mm --mon-data $cwd/mon.e -k keyring
ceph-mon -c conf -i e --mon-data $cwd/mon.e
while true; do
ceph -c conf -k keyring --monmap mm health
if ceph -c conf -k keyring --monmap mm mon stat | grep 'quorum 0,1,2,3,4'; then
break
fi
sleep 1
done
killall ceph-mon
echo OK

View File

@ -0,0 +1,44 @@
#!/bin/sh -ex
cwd=`pwd`
cat > conf <<EOF
[mon]
admin socket =
EOF
rm -f mm
monmaptool --create mm \
--add a 127.0.0.1:6789 \
--add b 127.0.0.1:6790 \
--add c 127.0.0.1:6791
rm -f keyring
ceph-authtool --create-keyring keyring --gen-key -n client.admin
ceph-authtool keyring --gen-key -n mon.
ceph-mon -c conf -i a --mkfs --monmap mm --mon-data $cwd/mon.a -k keyring
ceph-mon -c conf -i b --mkfs --monmap mm --mon-data $cwd/mon.b -k keyring
ceph-mon -c conf -i c --mkfs --monmap mm --mon-data $cwd/mon.c -k keyring
ceph-mon -c conf -i a --mon-data $cwd/mon.a
ceph-mon -c conf -i c --mon-data $cwd/mon.b
ceph-mon -c conf -i b --mon-data $cwd/mon.c
ceph -c conf -k keyring --monmap mm health
## expand via a kludged monmap
monmaptool mm --add d 127.0.0.1:6792
ceph-mon -c conf -i d --mkfs --monmap mm --mon-data $cwd/mon.d -k keyring
ceph-mon -c conf -i d --mon-data $cwd/mon.d
while true; do
ceph -c conf -k keyring --monmap mm health
if ceph -c conf -k keyring --monmap mm mon stat | grep d=; then
break
fi
sleep 1
done
killall ceph-mon
echo OK

View File

@ -0,0 +1,54 @@
#!/bin/sh -ex
cwd=`pwd`
cat > conf <<EOF
[mon]
admin socket =
log file = $cwd/\$name.log
debug mon = 20
debug ms = 1
EOF
rm -f mm
monmaptool --create mm \
--add a 127.0.0.1:6789
rm -f keyring
ceph-authtool --create-keyring keyring --gen-key -n client.admin
ceph-authtool keyring --gen-key -n mon.
ceph-mon -c conf -i a --mkfs --monmap mm --mon-data $cwd/mon.a -k keyring
ceph-mon -c conf -i a --mon-data $cwd/mon.a
ceph -c conf -k keyring --monmap mm health
## expand via a kludged monmap
monmaptool mm --add d 127.0.0.1:6702
ceph-mon -c conf -i d --mkfs --monmap mm --mon-data $cwd/mon.d -k keyring
ceph-mon -c conf -i d --mon-data $cwd/mon.d
while true; do
ceph -c conf -k keyring --monmap mm health
if ceph -c conf -k keyring --monmap mm mon stat | grep 'quorum 0,1'; then
break
fi
sleep 1
done
# again
monmaptool mm --add e 127.0.0.1:6793
ceph-mon -c conf -i e --mkfs --monmap mm --mon-data $cwd/mon.e -k keyring
ceph-mon -c conf -i e --mon-data $cwd/mon.e
while true; do
ceph -c conf -k keyring --monmap mm health
if ceph -c conf -k keyring --monmap mm mon stat | grep 'quorum 0,1,2'; then
break
fi
sleep 1
done
killall ceph-mon
echo OK

View File

@ -0,0 +1,40 @@
#!/bin/sh -ex
cwd=`pwd`
cat > conf <<EOF
[mon]
admin socket =
log file = $cwd/\$name.log
debug mon = 20
debug ms = 1
EOF
rm -f mm
ip=`host \`hostname\` | awk '{print $4}'`
monmaptool --create mm \
--add a $ip:6779
rm -f keyring
ceph-authtool --create-keyring keyring --gen-key -n client.admin
ceph-authtool keyring --gen-key -n mon.
ceph-mon -c conf -i a --mkfs --monmap mm --mon-data $cwd/mon.a -k keyring
ceph-mon -c conf -i a --mon-data $cwd/mon.a
ceph -c conf -k keyring --monmap mm health
## expand via a local_network
ceph-mon -c conf -i d --mkfs --monmap mm --mon-data $cwd/mon.d -k keyring
ceph-mon -c conf -i d --mon-data $cwd/mon.d --public-network 127.0.0.1/32
while true; do
ceph -c conf -k keyring --monmap mm health
if ceph -c conf -k keyring --monmap mm mon stat | grep 'quorum 0,1'; then
break
fi
sleep 1
done
killall ceph-mon
echo OK

29
qa/mon/bootstrap/single_host.sh Executable file
View File

@ -0,0 +1,29 @@
#!/bin/sh -ex
cwd=`pwd`
cat > conf <<EOF
[global]
mon host = 127.0.0.1:6789
[mon]
admin socket =
log file = $cwd/\$name.log
debug mon = 20
debug ms = 1
EOF
rm -f mm
fsid=`uuidgen`
rm -f keyring
ceph-authtool --create-keyring keyring --gen-key -n client.admin
ceph-authtool keyring --gen-key -n mon.
ceph-mon -c conf -i a --mkfs --fsid $fsid --mon-data $cwd/mon.a -k keyring
ceph-mon -c conf -i a --mon-data $cwd/mon.a
ceph -c conf -k keyring health
killall ceph-mon
echo OK

View File

@ -0,0 +1,39 @@
#!/bin/sh -ex
cwd=`pwd`
cat > conf <<EOF
[global]
[mon]
admin socket =
log file = $cwd/\$name.log
debug mon = 20
debug ms = 1
mon host = 127.0.0.1:6789 127.0.0.1:6790 127.0.0.1:6791
EOF
rm -f mm
fsid=`uuidgen`
rm -f keyring
ceph-authtool --create-keyring keyring --gen-key -n client.admin
ceph-authtool keyring --gen-key -n mon.
ceph-mon -c conf -i a --mkfs --fsid $fsid --mon-data $cwd/mon.a -k keyring --public-addr 127.0.0.1:6789
ceph-mon -c conf -i b --mkfs --fsid $fsid --mon-data $cwd/mon.b -k keyring --public-addr 127.0.0.1:6790
ceph-mon -c conf -i c --mkfs --fsid $fsid --mon-data $cwd/mon.c -k keyring --public-addr 127.0.0.1:6791
ceph-mon -c conf -i a --mon-data $cwd/mon.a
ceph-mon -c conf -i b --mon-data $cwd/mon.b
ceph-mon -c conf -i c --mon-data $cwd/mon.c
ceph -c conf -k keyring health -m 127.0.0.1
while true; do
if ceph -c conf -k keyring -m 127.0.0.1 mon stat | grep 'a,b,c'; then
break
fi
sleep 1
done
killall ceph-mon
echo OK

View File

@ -124,17 +124,18 @@ int main(int argc, const char **argv)
}
try {
monmap.decode(monmapbl);
// always mark seed/mkfs monmap as epoch 0
monmap.set_epoch(0);
}
catch (const buffer::error& e) {
cerr << argv[0] << ": error decoding monmap " << g_conf->monmap << ": " << e.what() << std::endl;
exit(1);
}
} else {
int err = MonClient::build_initial_monmap(g_ceph_context, monmap);
int err = monmap.build_initial(g_ceph_context, cerr);
if (err < 0) {
cerr << argv[0] << ": error generating initial monmap: " << cpp_strerror(err) << std::endl;
usage();
exit(1);
cerr << argv[0] << ": warning: no initial monitors; must use admin socket to feed hints" << std::endl;
}
// am i part of the initial quorum?
@ -346,7 +347,7 @@ int main(int argc, const char **argv)
ipaddr = g_conf->public_addr;
} else {
MonMap tmpmap;
int err = MonClient::build_initial_monmap(g_ceph_context, tmpmap);
int err = tmpmap.build_initial(g_ceph_context, cerr);
if (err < 0) {
cerr << argv[0] << ": error generating initial monmap: " << cpp_strerror(err) << std::endl;
usage();

View File

@ -299,11 +299,17 @@ bool AdminSocket::do_accept()
bool rval = false;
string firstword;
if (c.find(" ") == string::npos)
firstword = c;
else
firstword = c.substr(0, c.find(" "));
m_lock.Lock();
map<string,AdminSocketHook*>::iterator p = m_hooks.find(c);
map<string,AdminSocketHook*>::iterator p = m_hooks.find(firstword);
bufferlist out;
if (p == m_hooks.end()) {
lderr(m_cct) << "AdminSocket: request '" << c << "' not defined" << dendl;
lderr(m_cct) << "AdminSocket: request '" << firstword << "' not defined" << dendl;
} else {
bool success = p->second->call(c, out);
if (!success) {

View File

@ -96,6 +96,7 @@ OPTION(ms_rwthread_stack_bytes, OPT_U64, 1024 << 10)
OPTION(ms_tcp_read_timeout, OPT_U64, 900)
OPTION(ms_inject_socket_failures, OPT_U64, 0)
OPTION(mon_data, OPT_STR, "/var/lib/ceph/mon/$cluster-$id")
OPTION(mon_initial_members, OPT_STR, "") // list of initial cluster mon ids; if specified, need majority to form initial quorum and create new cluster
OPTION(mon_sync_fs_threshold, OPT_INT, 5) // sync() when writing this many objects; 0 to disable.
OPTION(mon_tick_interval, OPT_INT, 5)
OPTION(mon_subscribe_interval, OPT_DOUBLE, 300)

View File

@ -21,6 +21,9 @@
class MMonProbe : public Message {
public:
static const int HEAD_VERSION = 2;
static const int COMPAT_VERSION = 1;
enum {
OP_PROBE = 1,
OP_REPLY = 2,
@ -46,15 +49,18 @@ public:
set<int32_t> quorum;
bufferlist monmap_bl;
map<string, version_t> paxos_versions;
bool has_ever_joined;
string machine_name;
map<string, map<version_t,bufferlist> > paxos_values;
bufferlist latest_value;
version_t latest_version, newest_version, oldest_version;
MMonProbe() : Message(MSG_MON_PROBE) {}
MMonProbe(const uuid_d& f, int o, const string& n)
: Message(MSG_MON_PROBE), fsid(f), op(o), name(n),
MMonProbe()
: Message(MSG_MON_PROBE, HEAD_VERSION, COMPAT_VERSION) {}
MMonProbe(const uuid_d& f, int o, const string& n, bool hej)
: Message(MSG_MON_PROBE, HEAD_VERSION, COMPAT_VERSION),
fsid(f), op(o), name(n), has_ever_joined(hej),
latest_version(0), newest_version(0), oldest_version(0) {}
private:
~MMonProbe() {}
@ -69,6 +75,8 @@ public:
out << " versions " << paxos_versions;
if (machine_name.length())
out << " machine_name " << machine_name << " " << oldest_version << "-" << newest_version;
if (!has_ever_joined)
out << " new";
out << ")";
}
@ -85,6 +93,7 @@ public:
::encode(paxos_values, payload);
::encode(latest_value, payload);
::encode(latest_version, payload);
::encode(has_ever_joined, payload);
}
void decode_payload() {
bufferlist::iterator p = payload.begin();
@ -100,6 +109,10 @@ public:
::decode(paxos_values, p);
::decode(latest_value, p);
::decode(latest_version, p);
if (header.version >= 2)
::decode(has_ever_joined, p);
else
has_ever_joined = false;
}
};

View File

@ -308,6 +308,13 @@ void Elector::dispatch(Message *m)
return;
}
if (!mon->monmap->contains(m->get_source_addr())) {
dout(1) << "discarding election message: " << m->get_source_addr()
<< " not in my monmap " << *mon->monmap << dendl;
m->put();
return;
}
MonMap *peermap = new MonMap;
peermap->decode(em->monmap_bl);
if (peermap->epoch > mon->monmap->epoch) {

View File

@ -73,128 +73,10 @@ MonClient::~MonClient()
delete rotating_secrets;
}
/*
* build an initial monmap with any known monitor
* addresses.
*/
int MonClient::build_initial_monmap(CephContext *cct, MonMap &monmap)
{
const md_config_t *conf = cct->_conf;
// file?
if (!conf->monmap.empty()) {
int r;
try {
r = monmap.read(conf->monmap.c_str());
}
catch (const buffer::error &e) {
r = -EINVAL;
}
if (r >= 0)
return 0;
cerr << "unable to read/decode monmap from " << conf->monmap
<< ": " << cpp_strerror(-r) << std::endl;
return r;
}
// fsid from conf?
if (!cct->_conf->fsid.is_zero()) {
monmap.fsid = cct->_conf->fsid;
}
// -m foo?
if (!conf->mon_host.empty()) {
vector<entity_addr_t> addrs;
if (parse_ip_port_vec(conf->mon_host.c_str(), addrs)) {
for (unsigned i=0; i<addrs.size(); i++) {
char n[2];
n[0] = 'a' + i;
n[1] = 0;
if (addrs[i].get_port() == 0)
addrs[i].set_port(CEPH_MON_PORT);
string name = "noname-";
name += n;
monmap.add(name, addrs[i]);
}
return 0;
} else { //maybe they passed us a DNS-resolvable name
char *hosts = NULL;
hosts = resolve_addrs(conf->mon_host.c_str());
if (!hosts)
return -EINVAL;
bool success = parse_ip_port_vec(hosts, addrs);
free(hosts);
if (success) {
for (unsigned i=0; i<addrs.size(); i++) {
char n[2];
n[0] = 'a' + i;
n[1] = 0;
if (addrs[i].get_port() == 0)
addrs[i].set_port(CEPH_MON_PORT);
monmap.add(n, addrs[i]);
}
return 0;
} else cerr << "couldn't parse_ip_port_vec on " << hosts << std::endl;
}
cerr << "unable to parse addrs in '" << conf->mon_host << "'" << std::endl;
}
// What monitors are in the config file?
std::vector <std::string> sections;
int ret = conf->get_all_sections(sections);
if (ret) {
cerr << "Unable to find any monitors in the configuration "
<< "file, because there was an error listing the sections. error "
<< ret << std::endl;
return -ENOENT;
}
std::vector <std::string> mon_names;
for (std::vector <std::string>::const_iterator s = sections.begin();
s != sections.end(); ++s) {
if ((s->substr(0, 4) == "mon.") && (s->size() > 4)) {
mon_names.push_back(s->substr(4));
}
}
// Find an address for each monitor in the config file.
for (std::vector <std::string>::const_iterator m = mon_names.begin();
m != mon_names.end(); ++m) {
std::vector <std::string> sections;
std::string m_name("mon");
m_name += ".";
m_name += *m;
sections.push_back(m_name);
sections.push_back("mon");
sections.push_back("global");
std::string val;
int res = conf->get_val_from_conf_file(sections, "mon addr", val, true);
if (res) {
cerr << "failed to get an address for mon." << *m << ": error "
<< res << std::endl;
continue;
}
entity_addr_t addr;
if (!addr.parse(val.c_str())) {
cerr << "unable to parse address for mon." << *m
<< ": addr='" << val << "'" << std::endl;
continue;
}
if (addr.get_port() == 0)
addr.set_port(CEPH_MON_PORT);
monmap.add(m->c_str(), addr);
}
if (monmap.size() == 0) {
cerr << "unable to find any monitors in conf. "
<< "please specify monitors via -m monaddr or -c ceph.conf" << std::endl;
return -ENOENT;
}
return 0;
}
int MonClient::build_initial_monmap()
{
ldout(cct, 10) << "build_initial_monmap" << dendl;
return build_initial_monmap(cct, monmap);
return monmap.build_initial(cct, cerr);
}
int MonClient::get_monmap()

View File

@ -181,8 +181,6 @@ public:
log_client = clog;
}
static int build_initial_monmap(CephContext *cct, MonMap &monmap);
int build_initial_monmap();
int get_monmap();
int get_monmap_privately();

View File

@ -8,6 +8,11 @@
#include "common/Formatter.h"
#include "include/ceph_features.h"
#include "include/addr_parsing.h"
#include "common/ceph_argparse.h"
#include "common/errno.h"
#include "common/dout.h"
using ceph::Formatter;
@ -141,3 +146,173 @@ void MonMap::dump(Formatter *f) const
}
f->close_section();
}
int MonMap::build_from_host_list(std::string hostlist, std::string prefix)
{
vector<entity_addr_t> addrs;
if (parse_ip_port_vec(hostlist.c_str(), addrs)) {
for (unsigned i=0; i<addrs.size(); i++) {
char n[2];
n[0] = 'a' + i;
n[1] = 0;
if (addrs[i].get_port() == 0)
addrs[i].set_port(CEPH_MON_PORT);
string name = prefix;
name += n;
add(name, addrs[i]);
}
return 0;
}
// maybe they passed us a DNS-resolvable name
char *hosts = NULL;
hosts = resolve_addrs(hostlist.c_str());
if (!hosts)
return -EINVAL;
bool success = parse_ip_port_vec(hosts, addrs);
free(hosts);
if (!success)
return -EINVAL;
for (unsigned i=0; i<addrs.size(); i++) {
char n[2];
n[0] = 'a' + i;
n[1] = 0;
if (addrs[i].get_port() == 0)
addrs[i].set_port(CEPH_MON_PORT);
string name = prefix;
name += n;
add(name, addrs[i]);
}
return 0;
}
void MonMap::set_initial_members(CephContext *cct,
list<std::string>& initial_members,
string my_name, entity_addr_t my_addr,
set<entity_addr_t> *removed)
{
// remove non-initial members
unsigned i = 0;
while (i < size()) {
string n = get_name(i);
if (std::find(initial_members.begin(), initial_members.end(), n) != initial_members.end()) {
lgeneric_dout(cct, 1) << " keeping " << n << " " << get_addr(i) << dendl;
i++;
continue;
}
lgeneric_dout(cct, 1) << " removing " << get_name(i) << " " << get_addr(i) << dendl;
if (removed)
removed->insert(get_addr(i));
remove(n);
assert(!contains(n));
}
// add missing initial members
for (list<string>::iterator p = initial_members.begin(); p != initial_members.end(); ++p) {
if (!contains(*p)) {
if (*p == my_name) {
lgeneric_dout(cct, 1) << " adding self " << *p << " " << my_addr << dendl;
add(*p, my_addr);
} else {
entity_addr_t a;
a.set_family(AF_INET);
for (int n=1; ; n++) {
a.set_nonce(n);
if (!contains(a))
break;
}
lgeneric_dout(cct, 1) << " adding " << *p << " " << a << dendl;
add(*p, a);
}
assert(contains(*p));
}
}
}
int MonMap::build_initial(CephContext *cct, ostream& errout)
{
const md_config_t *conf = cct->_conf;
// file?
if (!conf->monmap.empty()) {
int r;
try {
r = read(conf->monmap.c_str());
}
catch (const buffer::error &e) {
r = -EINVAL;
}
if (r >= 0)
return 0;
errout << "unable to read/decode monmap from " << conf->monmap
<< ": " << cpp_strerror(-r) << std::endl;
return r;
}
// fsid from conf?
if (!cct->_conf->fsid.is_zero()) {
fsid = cct->_conf->fsid;
}
// -m foo?
if (!conf->mon_host.empty()) {
int r = build_from_host_list(conf->mon_host, "noname-");
if (r < 0)
errout << "unable to parse addrs in '" << conf->mon_host << "'" << std::endl;
}
// What monitors are in the config file?
std::vector <std::string> sections;
int ret = conf->get_all_sections(sections);
if (ret) {
errout << "Unable to find any monitors in the configuration "
<< "file, because there was an error listing the sections. error "
<< ret << std::endl;
return -ENOENT;
}
std::vector <std::string> mon_names;
for (std::vector <std::string>::const_iterator s = sections.begin();
s != sections.end(); ++s) {
if ((s->substr(0, 4) == "mon.") && (s->size() > 4)) {
mon_names.push_back(s->substr(4));
}
}
// Find an address for each monitor in the config file.
for (std::vector <std::string>::const_iterator m = mon_names.begin();
m != mon_names.end(); ++m) {
std::vector <std::string> sections;
std::string m_name("mon");
m_name += ".";
m_name += *m;
sections.push_back(m_name);
sections.push_back("mon");
sections.push_back("global");
std::string val;
int res = conf->get_val_from_conf_file(sections, "mon addr", val, true);
if (res) {
errout << "failed to get an address for mon." << *m << ": error "
<< res << std::endl;
continue;
}
entity_addr_t addr;
if (!addr.parse(val.c_str())) {
errout << "unable to parse address for mon." << *m
<< ": addr='" << val << "'" << std::endl;
continue;
}
if (addr.get_port() == 0)
addr.set_port(CEPH_MON_PORT);
add(m->c_str(), addr);
}
if (size() == 0) {
errout << "unable to find any monitors in conf. "
<< "please specify monitors via -m monaddr or -c ceph.conf" << std::endl;
return -ENOENT;
}
return 0;
}

View File

@ -103,6 +103,14 @@ class MonMap {
calc_ranks();
}
void rename(string oldname, string newname) {
assert(contains(oldname));
assert(!contains(newname));
mon_addr[newname] = mon_addr[oldname];
mon_addr.erase(oldname);
calc_ranks();
}
bool contains(const string& name) {
return mon_addr.count(name);
}
@ -120,6 +128,13 @@ class MonMap {
assert(n < rank_name.size());
return rank_name[n];
}
string get_name(entity_addr_t a) const {
map<entity_addr_t,string>::const_iterator p = addr_name.find(a);
if (p == addr_name.end())
return string();
else
return p->second;
}
int get_rank(const string& n) {
for (unsigned i=0; i<rank_name.size(); i++)
@ -140,13 +155,6 @@ class MonMap {
return true;
}
void rename(string oldname, string newname) {
assert(contains(oldname));
assert(!contains(newname));
mon_addr[newname] = mon_addr[oldname];
mon_addr.erase(oldname);
}
const entity_addr_t& get_addr(const string& n) {
assert(mon_addr.count(n));
return mon_addr[n];
@ -155,6 +163,11 @@ class MonMap {
assert(m < rank_addr.size());
return rank_addr[m];
}
void set_addr(const string& n, entity_addr_t a) {
assert(mon_addr.count(n));
mon_addr[n] = a;
calc_ranks();
}
entity_inst_t get_inst(const string& n) {
assert(mon_addr.count(n));
int m = get_rank(n);
@ -186,6 +199,50 @@ class MonMap {
int write(const char *fn);
int read(const char *fn);
/**
* build an initial bootstrap monmap from conf
*
* Build an initial bootstrap monmap from the config. This will
* try, in this order:
*
* 1 monmap -- an explicitly provided monmap
* 2 mon_host -- list of monitors
* 3 config [mon.*] sections, and 'mon addr' fields in those sections
*
* @param cct context (and associated config)
* @param errout ostream to send error messages too
*/
int build_initial(CephContext *cct, ostream& errout);
/**
* build a monmap from a list of hosts or ips
*
* Resolve dns as needed. Give mons dummy names.
*
* @param hosts list of hosts, space or comma separated
* @param prefix prefix to prepend to generated mon names
* @return 0 for success, -errno on error
*/
int build_from_host_list(std::string hosts, std::string prefix);
/**
* filter monmap given a set of initial members.
*
* Remove mons that aren't in the initial_members list. Add missing
* mons and give them dummy IPs (blank IPv4, with a non-zero
* nonce). If the name matches my_name, then my_addr will be used in
* place of a dummy addr.
*
* @param initial_members list of initial member names
* @param my_name name of self, can be blank
* @param my_addr my addr
* @param removed optional pointer to set to insert removed mon addrs to
*/
void set_initial_members(CephContext *cct,
list<std::string>& initial_members,
string my_name, entity_addr_t my_addr,
set<entity_addr_t> *removed);
void print(ostream& out) const;
void print_summary(ostream& out) const;
void dump(ceph::Formatter *f) const;

View File

@ -49,6 +49,7 @@
#include "include/color.h"
#include "include/ceph_fs.h"
#include "include/str_list.h"
#include "OSDMonitor.h"
#include "MDSMonitor.h"
@ -93,6 +94,7 @@ Monitor::Monitor(CephContext* cct_, string nm, MonitorStore *s, Messenger *m, Mo
messenger(m),
lock("Monitor::lock"),
timer(cct_, lock),
has_ever_joined(false),
logger(NULL), cluster_logger(NULL), cluster_logger_registered(false),
monmap(map),
clog(cct_, messenger, monmap, NULL, LogClient::FLAG_MON),
@ -210,6 +212,8 @@ void Monitor::do_admin_command(string command, ostream& ss)
_mon_status(ss);
else if (command == "quorum_status")
_quorum_status(ss);
else if (command.find("add_bootstrap_peer_hint") == 0)
_add_bootstrap_peer_hint(command, ss);
else
assert(0 == "bad AdminSocket command binding");
}
@ -225,8 +229,6 @@ int Monitor::init()
{
lock.Lock();
rank = monmap->get_rank(messenger->get_myaddr());
dout(1) << "init fsid " << monmap->fsid << dendl;
assert(!logger);
@ -278,6 +280,25 @@ int Monitor::init()
dout(10) << "features " << features << dendl;
}
// have we ever joined a quorum?
has_ever_joined = store->exists_bl_ss("joined");
dout(10) << "has_ever_joined = " << (int)has_ever_joined << dendl;
if (!has_ever_joined) {
// impose initial quorum restrictions?
list<string> initial_members;
get_str_list(g_conf->mon_initial_members, initial_members);
if (initial_members.size()) {
dout(1) << " initial_members " << initial_members << ", filtering seed monmap" << dendl;
monmap->set_initial_members(g_ceph_context, initial_members, name, messenger->get_myaddr(),
&extra_probe_peers);
dout(10) << " monmap is " << *monmap << dendl;
}
}
// init paxos
for (int i = 0; i < PAXOS_NUM; ++i) {
paxos[i]->init();
@ -325,6 +346,9 @@ int Monitor::init()
r = admin_socket->register_command("quorum_status", admin_hook,
"show current quorum status");
assert(r == 0);
r = admin_socket->register_command("add_bootstrap_peer_hint", admin_hook,
"add peer address as potential bootstrap peer for cluster bringup");
assert(r == 0);
// i'm ready!
messenger->add_dispatcher_tail(this);
@ -420,7 +444,7 @@ void Monitor::bootstrap()
int newrank = monmap->get_rank(messenger->get_myaddr());
if (newrank < 0 && rank >= 0) {
// was i ever part of the quorum?
if (store->exists_bl_ss("joined")) {
if (has_ever_joined) {
dout(0) << " removed from monmap, suicide." << dendl;
exit(0);
}
@ -448,14 +472,55 @@ void Monitor::bootstrap()
reset_probe_timeout();
// i'm outside the quorum
outside_quorum.insert(name);
if (monmap->contains(name))
outside_quorum.insert(name);
// probe monitors
dout(10) << "probing other monitors" << dendl;
for (unsigned i = 0; i < monmap->size(); i++) {
if ((int)i != rank)
messenger->send_message(new MMonProbe(monmap->fsid, MMonProbe::OP_PROBE, name), monmap->get_inst(i));
messenger->send_message(new MMonProbe(monmap->fsid, MMonProbe::OP_PROBE, name, has_ever_joined),
monmap->get_inst(i));
}
for (set<entity_addr_t>::iterator p = extra_probe_peers.begin();
p != extra_probe_peers.end();
++p) {
if (*p != messenger->get_myaddr()) {
entity_inst_t i;
i.name = entity_name_t::MON(-1);
i.addr = *p;
messenger->send_message(new MMonProbe(monmap->fsid, MMonProbe::OP_PROBE, name, has_ever_joined), i);
}
}
}
void Monitor::_add_bootstrap_peer_hint(string cmd, ostream& ss)
{
dout(10) << "_add_bootstrap_peer_hint '" << cmd << "'" << dendl;
if (is_leader() || is_peon()) {
ss << "mon already active; ignoring bootstrap hint";
return;
}
size_t off = cmd.find(" ");
if (off == std::string::npos) {
ss << "syntax is 'add_bootstrap_peer_hint ip[:port]'";
return;
}
entity_addr_t addr;
const char *end = 0;
if (!addr.parse(cmd.c_str() + off + 1, &end)) {
ss << "failed to parse addr '" << (cmd.c_str() + off + 1) << "'";
return;
}
if (addr.get_port() == 0)
addr.set_port(CEPH_MON_PORT);
extra_probe_peers.insert(addr);
ss << "adding peer " << addr << " to list: " << extra_probe_peers;
}
// called by bootstrap(), or on leader|peon -> electing
@ -543,36 +608,76 @@ void Monitor::handle_probe(MMonProbe *m)
void Monitor::handle_probe_probe(MMonProbe *m)
{
dout(10) << "handle_probe_probe " << m->get_source_inst() << *m << dendl;
MMonProbe *r = new MMonProbe(monmap->fsid, MMonProbe::OP_REPLY, name);
MMonProbe *r = new MMonProbe(monmap->fsid, MMonProbe::OP_REPLY, name, has_ever_joined);
r->name = name;
r->quorum = quorum;
monmap->encode(r->monmap_bl, m->get_connection()->get_features());
for (vector<Paxos*>::iterator p = paxos.begin(); p != paxos.end(); ++p)
r->paxos_versions[(*p)->get_machine_name()] = (*p)->get_version();
messenger->send_message(r, m->get_connection());
// did we discover a peer here?
if (!monmap->contains(m->get_source_addr())) {
dout(1) << " adding peer " << m->get_source_addr() << " to list of hints" << dendl;
extra_probe_peers.insert(m->get_source_addr());
}
m->put();
}
void Monitor::handle_probe_reply(MMonProbe *m)
{
dout(10) << "handle_probe_reply " << m->get_source_inst() << *m << dendl;
dout(10) << " monmap is " << *monmap << dendl;
if (!is_probing()) {
m->put();
return;
}
// newer map?
MonMap *newmap = new MonMap;
newmap->decode(m->monmap_bl);
if (newmap->get_epoch() > monmap->get_epoch()) {
dout(10) << " got new monmap epoch " << newmap->get_epoch()
<< " > my " << monmap->get_epoch() << dendl;
monmap->decode(m->monmap_bl);
m->put();
// newer map, or they've joined a quorum and we haven't?
bufferlist mybl;
monmap->encode(mybl, m->get_connection()->get_features());
// make sure it's actually different; the checks below err toward
// taking the other guy's map, which could cause us to loop.
if (!mybl.contents_equal(m->monmap_bl)) {
MonMap *newmap = new MonMap;
newmap->decode(m->monmap_bl);
if (m->has_ever_joined && (newmap->get_epoch() > monmap->get_epoch() ||
!has_ever_joined)) {
dout(10) << " got newer/committed monmap epoch " << newmap->get_epoch()
<< ", mine was " << monmap->get_epoch() << dendl;
delete newmap;
monmap->decode(m->monmap_bl);
m->put();
bootstrap();
return;
bootstrap();
return;
}
delete newmap;
}
// rename peer?
string peer_name = monmap->get_name(m->get_source_addr());
if (monmap->get_epoch() == 0 && peer_name.find("noname-") == 0) {
dout(10) << " renaming peer " << m->get_source_addr() << " "
<< peer_name << " -> " << m->name << " in my monmap"
<< dendl;
monmap->rename(peer_name, m->name);
} else {
dout(10) << " peer name is " << peer_name << dendl;
}
// new initial peer?
if (monmap->contains(m->name)) {
if (monmap->get_addr(m->name).is_blank_ip()) {
dout(1) << " learned initial mon " << m->name << " addr " << m->get_source_addr() << dendl;
monmap->set_addr(m->name, m->get_source_addr());
m->put();
bootstrap();
return;
}
}
// is there an existing quorum?
@ -608,11 +713,12 @@ void Monitor::handle_probe_reply(MMonProbe *m)
}
}
if (ok) {
if (monmap->contains(name)) {
if (monmap->contains(name) &&
!monmap->get_addr(name).is_blank_ip()) {
// i'm part of the cluster; just initiate a new election
start_election();
} else {
dout(10) << " ready to join, but i'm not in the monmap, trying to join" << dendl;
dout(10) << " ready to join, but i'm not in the monmap or my addr is blank, trying to join" << dendl;
messenger->send_message(new MMonJoin(monmap->fsid, name, messenger->get_myaddr()),
monmap->get_inst(*m->quorum.begin()));
}
@ -623,14 +729,21 @@ void Monitor::handle_probe_reply(MMonProbe *m)
}
} else {
// not part of a quorum
outside_quorum.insert(m->name);
if (monmap->contains(m->name))
outside_quorum.insert(m->name);
else
dout(10) << " mostly ignoring mon." << m->name << ", not part of monmap" << dendl;
unsigned need = monmap->size() / 2 + 1;
dout(10) << " outside_quorum now " << outside_quorum << ", need " << need << dendl;
if (outside_quorum.size() >= need) {
dout(10) << " that's enough to form a new quorum, calling election" << dendl;
start_election();
if (outside_quorum.count(name)) {
dout(10) << " that's enough to form a new quorum, calling election" << dendl;
start_election();
} else {
dout(10) << " that's enough to form a new quorum, but it does not include me; waiting" << dendl;
}
} else {
dout(10) << " that's not yet enough for a new quorum, waiting" << dendl;
}
@ -673,7 +786,7 @@ void Monitor::slurp()
if (!pax->is_slurping()) {
pax->start_slurping();
}
MMonProbe *m = new MMonProbe(monmap->fsid, MMonProbe::OP_SLURP, name);
MMonProbe *m = new MMonProbe(monmap->fsid, MMonProbe::OP_SLURP, name, has_ever_joined);
m->machine_name = p->first;
m->oldest_version = pax->get_first_committed();
m->newest_version = pax->get_version();
@ -687,7 +800,7 @@ void Monitor::slurp()
if (!pax->is_slurping()) {
pax->start_slurping();
}
MMonProbe *m = new MMonProbe(monmap->fsid, MMonProbe::OP_SLURP_LATEST, name);
MMonProbe *m = new MMonProbe(monmap->fsid, MMonProbe::OP_SLURP_LATEST, name, has_ever_joined);
m->machine_name = p->first;
m->oldest_version = pax->get_first_committed();
m->newest_version = pax->get_version();
@ -710,7 +823,7 @@ void Monitor::slurp()
MMonProbe *Monitor::fill_probe_data(MMonProbe *m, Paxos *pax)
{
MMonProbe *r = new MMonProbe(monmap->fsid, MMonProbe::OP_DATA, name);
MMonProbe *r = new MMonProbe(monmap->fsid, MMonProbe::OP_DATA, name, has_ever_joined);
r->machine_name = m->machine_name;
r->oldest_version = pax->get_first_committed();
r->newest_version = pax->get_version();
@ -874,8 +987,13 @@ void Monitor::finish_election()
update_logger();
register_cluster_logger();
// make note of the fact that i was, once, part of the quorum.
store->put_int(1, "joined");
// am i named properly?
string cur_name = monmap->get_name(messenger->get_myaddr());
if (cur_name != name) {
dout(10) << " renaming myself from " << cur_name << " -> " << name << dendl;
messenger->send_message(new MMonJoin(monmap->fsid, name, messenger->get_myaddr()),
monmap->get_inst(*quorum.begin()));
}
}
@ -1582,10 +1700,11 @@ bool Monitor::_ms_dispatch(Message *m)
dout(0) << "MMonElection received from entity without enough caps!"
<< s->caps << dendl;
}
if (!is_probing() && !is_slurping())
if (!is_probing() && !is_slurping()) {
elector.dispatch(m);
else
} else {
m->put();
}
break;
case MSG_FORWARD:

View File

@ -102,6 +102,11 @@ public:
Mutex lock;
SafeTimer timer;
/// true if we have ever joined a quorum. if false, we are either a
/// new cluster, a newly joining monitor, or a just-upgraded
/// monitor.
bool has_ever_joined;
PerfCounters *logger, *cluster_logger;
bool cluster_logger_registered;
@ -110,6 +115,8 @@ public:
MonMap *monmap;
set<entity_addr_t> extra_probe_peers;
LogClient clog;
KeyRing keyring;
KeyServer key_server;
@ -199,6 +206,12 @@ public:
epoch_t get_epoch();
int get_leader() { return leader; }
const set<int>& get_quorum() { return quorum; }
set<string> get_quorum_names() {
set<string> q;
for (set<int>::iterator p = quorum.begin(); p != quorum.end(); ++p)
q.insert(monmap->get_name(*p));
return q;
}
void bootstrap();
void reset();
@ -266,6 +279,7 @@ public:
bool _allowed_command(MonSession *s, const vector<std::string>& cmd);
void _mon_status(ostream& ss);
void _quorum_status(ostream& ss);
void _add_bootstrap_peer_hint(string cmd, ostream& ss);
void handle_command(class MMonCommand *m);
void handle_route(MRoute *m);

View File

@ -39,11 +39,8 @@ static ostream& _prefix(std::ostream *_dout, Monitor *mon) {
void MonmapMonitor::create_initial()
{
bufferlist bl;
mon->store->get_bl_ss(bl, "mkfs", "monmap");
pending_map.decode(bl);
dout(10) << "create_initial set fed epoch " << pending_map.epoch << dendl;
assert(pending_map.epoch == 0); // fix mkfs()
dout(10) << "create_initial using current monmap" << dendl;
pending_map = *mon->monmap;
pending_map.epoch = 1;
}
@ -105,6 +102,19 @@ void MonmapMonitor::encode_pending(bufferlist& bl)
pending_map.encode(bl, CEPH_FEATURES_ALL);
}
void MonmapMonitor::on_active()
{
if (paxos->get_version() >= 1 && !mon->has_ever_joined) {
// make note of the fact that i was, once, part of the quorum.
dout(10) << "noting that i was, once, part of an active quorum." << dendl;
mon->store->put_int(1, "joined");
mon->has_ever_joined = true;
}
if (mon->is_leader())
mon->clog.info() << "monmap " << *mon->monmap << "\n";
}
bool MonmapMonitor::preprocess_query(PaxosServiceMessage *m)
{
switch (m->get_type()) {
@ -133,7 +143,8 @@ bool MonmapMonitor::preprocess_command(MMonCommand *m)
if (m->cmd.size() > 1) {
if (m->cmd[1] == "stat") {
mon->monmap->print_summary(ss);
ss << ", election epoch " << mon->get_epoch() << ", quorum " << mon->get_quorum();
ss << ", election epoch " << mon->get_epoch() << ", quorum " << mon->get_quorum()
<< " " << mon->get_quorum_names();
r = 0;
}
else if (m->cmd.size() == 2 && m->cmd[1] == "getmap") {
@ -263,12 +274,6 @@ bool MonmapMonitor::prepare_update(PaxosServiceMessage *m)
return false;
}
void MonmapMonitor::on_active()
{
if (mon->is_leader())
mon->clog.info() << "monmap " << *mon->monmap << "\n";
}
bool MonmapMonitor::prepare_command(MMonCommand *m)
{
stringstream ss;
@ -331,12 +336,12 @@ bool MonmapMonitor::preprocess_join(MMonJoin *join)
{
dout(10) << "preprocess_join " << join->name << " at " << join->addr << dendl;
if (pending_map.contains(join->name)) {
if (pending_map.contains(join->name) && !pending_map.get_addr(join->name).is_blank_ip()) {
dout(10) << " already have " << join->name << dendl;
join->put();
return true;
}
if (pending_map.contains(join->addr)) {
if (pending_map.contains(join->addr) && pending_map.get_name(join->addr) == join->name) {
dout(10) << " already have " << join->addr << dendl;
join->put();
return true;
@ -345,7 +350,11 @@ bool MonmapMonitor::preprocess_join(MMonJoin *join)
}
bool MonmapMonitor::prepare_join(MMonJoin *join)
{
dout(0) << "adding " << join->name << " at " << join->addr << " to monitor cluster" << dendl;
dout(0) << "adding/updating " << join->name << " at " << join->addr << " to monitor cluster" << dendl;
if (pending_map.contains(join->name))
pending_map.remove(join->name);
if (pending_map.contains(join->addr))
pending_map.remove(pending_map.get_name(join->addr));
pending_map.add(join->name, join->addr);
pending_map.last_changed = ceph_clock_now(g_ceph_context);
join->put();
@ -370,7 +379,7 @@ void MonmapMonitor::get_health(list<pair<health_status_t, string> >& summary,
int actual = mon->get_quorum().size();
if (actual < max) {
ostringstream ss;
ss << (max-actual) << " mons down, quorum " << mon->get_quorum();
ss << (max-actual) << " mons down, quorum " << mon->get_quorum() << " " << mon->get_quorum_names();
summary.push_back(make_pair(HEALTH_WARN, ss.str()));
if (detail) {
set<int> q = mon->get_quorum();

View File

@ -53,7 +53,6 @@ class MonmapMonitor : public PaxosService {
void on_active();
bool preprocess_query(PaxosServiceMessage *m);
bool prepare_update(PaxosServiceMessage *m);

View File

@ -26,10 +26,11 @@ using namespace std;
#include "common/ceph_argparse.h"
#include "global/global_init.h"
#include "mon/MonMap.h"
#include "include/str_list.h"
void usage()
{
cout << " usage: [--print] [--create [--clobber][--fsid uuid]] [--add name 1.2.3.4:567] [--rm name] <mapfilename>" << std::endl;
cout << " usage: [--print] [--create [--clobber][--fsid uuid]] [--generate] [--set-initial-members] [--add name 1.2.3.4:567] [--rm name] <mapfilename>" << std::endl;
exit(1);
}
@ -45,6 +46,8 @@ int main(int argc, const char **argv)
bool create = false;
bool clobber = false;
bool modified = false;
bool generate = false;
bool filter = false;
map<string,entity_addr_t> add;
list<string> rm;
@ -63,6 +66,10 @@ int main(int argc, const char **argv)
create = true;
} else if (ceph_argparse_flag(args, i, "--clobber", (char*)NULL)) {
clobber = true;
} else if (ceph_argparse_flag(args, i, "--generate", (char*)NULL)) {
generate = true;
} else if (ceph_argparse_flag(args, i, "--set-initial-members", (char*)NULL)) {
filter = true;
} else if (ceph_argparse_flag(args, i, "--add", (char*)NULL)) {
string name = *i;
i = args.erase(i);
@ -130,6 +137,28 @@ int main(int argc, const char **argv)
}
modified = true;
}
if (generate) {
int r = monmap.build_initial(g_ceph_context, cerr);
if (r < 0)
return r;
}
if (filter) {
// apply initial members
list<string> initial_members;
get_str_list(g_conf->mon_initial_members, initial_members);
if (initial_members.size()) {
cout << "initial_members " << initial_members << ", filtering seed monmap" << std::endl;
set<entity_addr_t> removed;
monmap.set_initial_members(g_ceph_context, initial_members,
string(), entity_addr_t(),
&removed);
cout << "removed " << removed << std::endl;
}
modified = true;
}
if (!g_conf->fsid.is_zero()) {
monmap.fsid = g_conf->fsid;
cout << me << ": set fsid to " << monmap.fsid << std::endl;
@ -159,9 +188,6 @@ int main(int argc, const char **argv)
if (!print && !modified)
usage();
if (modified && !create)
monmap.epoch++;
if (print)
monmap.print(cout);

View File

@ -600,7 +600,7 @@ static int do_kernel_add(const char *poolname, const char *imgname,
const char *user)
{
MonMap monmap;
int r = MonClient::build_initial_monmap(g_ceph_context, monmap);
int r = monmap.build_initial(g_ceph_context, cerr);
if (r < 0)
return r;

View File

@ -7,16 +7,16 @@
$ monmaptool --add foo 2.3.4.5:6789 mymonmap
monmaptool: monmap file mymonmap
monmaptool: writing epoch 1 to mymonmap (1 monitors)
monmaptool: writing epoch 0 to mymonmap (1 monitors)
$ monmaptool --add foo 3.4.5.6:7890 mymonmap
monmaptool: monmap file mymonmap
monmaptool: map already contains mon.foo
usage: [--print] [--create [--clobber][--fsid uuid]] [--add name 1.2.3.4:567] [--rm name] <mapfilename>
usage: [--print] [--create [--clobber][--fsid uuid]] [--generate] [--set-initial-members] [--add name 1.2.3.4:567] [--rm name] <mapfilename>
[1]
$ monmaptool --print mymonmap
monmaptool: monmap file mymonmap
epoch 1
epoch 0
fsid [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} (re)
last_changed \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ (re)
created \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ (re)

View File

@ -7,17 +7,17 @@
$ monmaptool --add foo 2.3.4.5:6789 mymonmap
monmaptool: monmap file mymonmap
monmaptool: writing epoch 1 to mymonmap (1 monitors)
monmaptool: writing epoch 0 to mymonmap (1 monitors)
$ monmaptool --add bar 3.4.5.6:7890 mymonmap
monmaptool: monmap file mymonmap
monmaptool: writing epoch 2 to mymonmap (2 monitors)
monmaptool: writing epoch 0 to mymonmap (2 monitors)
$ monmaptool --add baz 4.5.6.7:8901 mymonmap
monmaptool: monmap file mymonmap
monmaptool: writing epoch 3 to mymonmap (3 monitors)
monmaptool: writing epoch 0 to mymonmap (3 monitors)
$ monmaptool --print mymonmap
monmaptool: monmap file mymonmap
epoch 3
epoch 0
fsid [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} (re)
last_changed \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ (re)
created \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ (re)

View File

@ -1,3 +1,3 @@
$ monmaptool --help
usage: [--print] [--create [--clobber][--fsid uuid]] [--add name 1.2.3.4:567] [--rm name] <mapfilename>
usage: [--print] [--create [--clobber][--fsid uuid]] [--generate] [--set-initial-members] [--add name 1.2.3.4:567] [--rm name] <mapfilename>
[1]

View File

@ -9,7 +9,7 @@
monmaptool: monmap file mymonmap
monmaptool: removing doesnotexist
monmaptool: map does not contain doesnotexist
usage: [--print] [--create [--clobber][--fsid uuid]] [--add name 1.2.3.4:567] [--rm name] <mapfilename>
usage: [--print] [--create [--clobber][--fsid uuid]] [--generate] [--set-initial-members] [--add name 1.2.3.4:567] [--rm name] <mapfilename>
[1]
$ monmaptool --print mymonmap

View File

@ -8,11 +8,11 @@
$ monmaptool --rm foo mymonmap
monmaptool: monmap file mymonmap
monmaptool: removing foo
monmaptool: writing epoch 1 to mymonmap (0 monitors)
monmaptool: writing epoch 0 to mymonmap (0 monitors)
$ monmaptool --print mymonmap
monmaptool: monmap file mymonmap
epoch 1
epoch 0
fsid [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} (re)
last_changed \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ (re)
created \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d+ (re)

View File

@ -1,4 +1,4 @@
$ monmaptool
monmaptool: must specify monmap filename
usage: [--print] [--create [--clobber][--fsid uuid]] [--add name 1.2.3.4:567] [--rm name] <mapfilename>
usage: [--print] [--create [--clobber][--fsid uuid]] [--generate] [--set-initial-members] [--add name 1.2.3.4:567] [--rm name] <mapfilename>
[1]

View File

@ -76,7 +76,7 @@ static void parse_cmd_args(vector<const char*> &args,
*admin_socket = val;
if (i == args.end())
usage();
*admin_socket_cmd = *i;
*admin_socket_cmd = *i++;
} else if (ceph_argparse_flag(args, i, "-s", "--status", (char*)NULL)) {
*mode = CEPH_TOOL_MODE_STATUS;
} else if (ceph_argparse_flag(args, i, "-w", "--watch", (char*)NULL)) {
@ -98,6 +98,9 @@ static void parse_cmd_args(vector<const char*> &args,
} else if (ceph_argparse_flag(args, i, "-h", "--help", (char*)NULL)) {
usage();
} else {
if (admin_socket_cmd && admin_socket_cmd->length()) {
*admin_socket_cmd += " " + string(*i);
}
++i;
}
}