A multipath device is detected because there is a
/sys/dev/block/M:m/dm/uuid file with the mpath- prefix (or part\w+-mpath
prefix).
When ceph-disk prepares data or journal devices on a multipath device,
it sets the partition typecode to MPATH_JOURNAL_UUID, MPATH_OSD_UUID and
MPATH_TOBE_UUID to
a) help the udev rules distinguish them from other devices in
devicemapper
b) allow ceph-disk to fail if an attempt is made to activate a device
with this type without accessing it via a multipath device
The 95-ceph-osd.rules call ceph-disk activate on partitions of type
MPATH_JOURNAL_UUID, MPATH_OSD_UUID. It relies on ceph-disk to do nothing
if the device is not accessed via multipath.
http://tracker.ceph.com/issues/11881Fixes: #11881
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The dm-* devices are not excluded and will have by-partuuid symlinks
etc. This will include devices managed by multipath as well as
others. Since this only is used on partitions:
# ignore partitions that span the entire disk
TEST=="whole_disk", GOTO="persistent_storage_end_two"
It may create symlinks for dm-* devices that are unrelated to Ceph and
we assume this is going to be ok.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The udev(7) man page states:
RUN
...
This can only be used for very short-running foreground tasks. Running
an event process for a long period of time may block all further
events for this or a dependent device.
Starting daemons or other long-running processes is not appropriate
for udev; the forked processes, detached or not, will be
unconditionally killed after the event handling has finished.
ceph-disk activate is far from a short-running task:
- check whether path is a block dev, for dirs call through to
activate_dir()
- call blkid to obtain the filesystem type for the block dev
- pull mount options from hard-coded ceph.conf file
- mount the OSD dev at a temporary path
- check the ceph magic for mounted filesystem
- read cluster uuid and locate corresponding /etc/ceph/{cluster}.conf
path
- read or generate (if missing) the OSD uuid
- create a file indicating init system usage (systemd)
- mount the device at a second (final) location
- umount (lazy) the temporary mount path
- enable the systemd ceph-osd@{osd_id} service
- start the systemd ceph-osd@{osd_id} service
This logic is therefore best left in a systemd service for execution. As
it is less limited in terms of execution time, and also allows for
improved event handling in future (fsck, dmcrypt mapping etc.).
This change sees 95-ceph-osd.rules.systemd trigger ceph-disk activate or
ceph-disk activate-journal via new ceph-disk-activate-journal@.service,
ceph-disk-activate@.service and ceph-disk-dmcrypt-activate@.service
systemd service files.
ceph-disk-dmcrypt-activate@.service makes use of the newly added
--dmcrypt parameter for ceph-disk activate.
Signed-off-by: David Disseldorp <ddiss@suse.de>
LUKS allows for validation of the key at mount time (rather than
simply mounting a random partition), specification of the encryption
parameters in the header and key rollover of the slot key (the one
that needs to be stored).
New parameters 'osd cryptsetup parameters' and 'osd dmcrypt key size' are
added. These allow these important policy choices to be overridden or
kept consistent per-site.
The previous default plain mode (rather than using LUKS) remains, select
LUKS by setting 'osd dmcrypt type = luks'
Signed-off-by: Andrew Bartlett <abartlet@catalyst.net.nz>
Activate an osd via its journal device. udev populates its symlinks and
triggers events in an order that is not related to whether the device is
an osd data partition or a journal. That means that triggering
'ceph-disk activate' can happen before the journal (or journal symlink)
is present and then fail.
Similarly, it may be that they are on different disks that are hotplugged
with the journal second.
This can be wired up to the journal partition type to ensure that osds are
started when the journal appears second.
Include the udev rules to trigger this.
Signed-off-by: Sage Weil <sage@inktank.com>
Wheezy's udev (175-7.2) has broken rules for the /dev/disk/by-partuuid/
symlinks that ceph-disk relies on. Install parallel rules that work. On
new udev, this is harmless; old older udev, this will make life better.
Fixes: #4865
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Two fixes for Centos 6.3 and other systems with udev versions
prior to 172. The disk peristant name using the GPT UUID does
not exist, so use the by_path persistent name instead for the
journal symlink.
The gpt label fields are not available for use in udev rules. Add
ceph-disk-udev wrapper script that extracts the partition
type guid from the label and calls ceph-disk-activate if it is
a ceph guid type. (Bug #4632)
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Automatically map encrypted journal partitions.
For encrypted OSD partitions, map them, wait for the mapped device to
appear, and then ceph-disk-activate.
This is much simpler than doing the work in ceph-disk-activate.
Signed-off-by: Sage Weil <sage@inktank.com>
Below is a patch which makes the ceph-rbdnamer script more robust and
fixes a problem with the rbd udev rules.
On our setup we encountered a symlink which was linked to the wrong rbd:
/dev/rbd/mypool/myrbd -> /dev/rbd1
While that link should have gone to /dev/rbd3 (on which a
partition /dev/rbd3p1 was present).
Now the old udev rule passes %n to the ceph-rbdnamer script, the problem
with %n is that %n results in a value of 3 (for rbd3), but in a value of
1 (for rbd3p1), so it seems it can't be depended upon for rbdnaming.
In the patch below the ceph-rbdnamer script is made more robust and it
now it can be called in various ways:
/usr/bin/ceph-rbdnamer /dev/rbd3
/usr/bin/ceph-rbdnamer /dev/rbd3p1
/usr/bin/ceph-rbdnamer rbd3
/usr/bin/ceph-rbdnamer rbd3p1
/usr/bin/ceph-rbdnamer 3
Even with all these different styles of calling the modified script, it
should now return the same rbdname. This change "has" to be combined
with calling it from udev with %k though.
With that fixed, we hit the second problem. We ended up with:
/dev/rbd/mypool/myrbd -> /dev/rbd3p1
So the rbdname was symlinked to the partition on the rbd instead of the
rbd itself. So what probably went wrong is udev discovering the disk and
running ceph-rbdnamer which resolved it to myrbd so the following
symlink was created:
/dev/rbd/mypool/myrbd -> /dev/rbd3
However partitions would be discovered next and ceph-rbdnamer would be
run with rbd3p1 (%k) as parameter, resulting in the name myrbd too, with
the previous correct symlink being overwritten with a faulty one:
/dev/rbd/mypool/myrbd -> /dev/rbd3p1
The solution to the problem is in differentiating between disks and
partitions in udev and handling them slightly differently. So with the
patch below partitions now get their own symlinks in the following style
(which is fairly consistent with other udev rules):
/dev/rbd/mypool/myrbd-part1 -> /dev/rbd3p1
Please let me know any feedback you have on this patch or the approach
used.
Regards,
Pascal de Bruijn
Unilogic B.V.
Signed-off-by: Pascal de Bruijn <pascal@unilogicnetworks.net>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
The device number depends on how many rbd images have been
mapped. Removing it makes the name determined solely by the name,
image, and snapshot that are mapped, for ease of scripting or persistence
across reboots.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>