ceph-disk activate-all walks /dev/disk/by-parttypeuuid at boot time. It
is not necessary when udev fires ADD event for each partition and
95-ceph-osd.rules gets a chance to activate a ceph disk or journal.
There are various reasons why udev ADD events may not be fired at
boot (for instance Debian Jessi 8.5 never does it and CentOS 7.2 seems
to be racy in that regard when a LVM root is being used).
Populating /dev/disk/by-parttypeuuid fixes ceph-disk activate-all that
would not work without it. And it guarantees disks are activated at boot
time regardless of wether udev fires ADD events at the right time (or at
all).
The new udev file is a partial resurection of the
60-ceph-partuuid-workaround-rules that was removed by
9f77244b8e0782921663e52005b725cca58a8753. It is given a name that
reflects its new purpose.
Fixes http://tracker.ceph.com/issues/16351
Signed-off-by: Loic Dachary <loic@dachary.org>
These were added to get /dev/disk/by-partuuid/ symlinks to work on
wheezy. They are no longer needed for the supported distros (el7+,
jessie+, trusty+), and they apparently break dm by opening devices they
should not.
Fixes: http://tracker.ceph.com/issues/15516
Signed-off-by: Sage Weil <sage@redhat.com>
Instead of storing the dmcrypt keys in the /etc/ceph/dmcrypt-keys
directory, they are stored in the monitor. If a machine with
OSDs created with ceph-disk prepare --dmcrypt is lost, it does
not contain the key that would allow to decrypt their content.
The dmcrypt key is retrieved from the monitor using a different keyring
for each OSD. It is stored in a small partition called the lockbox. At
boot time the lockbox is mounted
/var/lib/ceph/osd-lockbox/$uuid
and used when the $uuid partition is detected by udev to map it with
cryptsetup.
The OSDs that were prepared prior to the lockbox implementation are
supported by looking up the key found in /etc/ceph/dmcrypt-keys before
looking in /var/lib/ceph/osd-lockbox/$uuid.
http://tracker.ceph.com/issues/14669Fixes: #14669
Signed-off-by: Loic Dachary <loic@dachary.org>
Copy paste the journal code and s/journal/block/
More work will be needed to support multiple auxiliary
devices (block.wal etc). But the goal is to minimize the change because
this commit is part of a series of commits focusing on refactoring
prepare, not the entire ceph-disk codebase.
Signed-off-by: Loic Dachary <loic@dachary.org>
On udev change the owner of the device switch back to the default. If
that happens on a journal while an OSD is being activated, it will fail
with permission denied.
Make sure all ceph device types are chown to ceph on udev change.
http://tracker.ceph.com/issues/13000Fixes: #13000
Signed-off-by: Loic Dachary <ldachary@redhat.com>
A multipath device is detected because there is a
/sys/dev/block/M:m/dm/uuid file with the mpath- prefix (or part\w+-mpath
prefix).
When ceph-disk prepares data or journal devices on a multipath device,
it sets the partition typecode to MPATH_JOURNAL_UUID, MPATH_OSD_UUID and
MPATH_TOBE_UUID to
a) help the udev rules distinguish them from other devices in
devicemapper
b) allow ceph-disk to fail if an attempt is made to activate a device
with this type without accessing it via a multipath device
The 95-ceph-osd.rules call ceph-disk activate on partitions of type
MPATH_JOURNAL_UUID, MPATH_OSD_UUID. It relies on ceph-disk to do nothing
if the device is not accessed via multipath.
http://tracker.ceph.com/issues/11881Fixes: #11881
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The dm-* devices are not excluded and will have by-partuuid symlinks
etc. This will include devices managed by multipath as well as
others. Since this only is used on partitions:
# ignore partitions that span the entire disk
TEST=="whole_disk", GOTO="persistent_storage_end_two"
It may create symlinks for dm-* devices that are unrelated to Ceph and
we assume this is going to be ok.
Signed-off-by: Loic Dachary <ldachary@redhat.com>
The udev(7) man page states:
RUN
...
This can only be used for very short-running foreground tasks. Running
an event process for a long period of time may block all further
events for this or a dependent device.
Starting daemons or other long-running processes is not appropriate
for udev; the forked processes, detached or not, will be
unconditionally killed after the event handling has finished.
ceph-disk activate is far from a short-running task:
- check whether path is a block dev, for dirs call through to
activate_dir()
- call blkid to obtain the filesystem type for the block dev
- pull mount options from hard-coded ceph.conf file
- mount the OSD dev at a temporary path
- check the ceph magic for mounted filesystem
- read cluster uuid and locate corresponding /etc/ceph/{cluster}.conf
path
- read or generate (if missing) the OSD uuid
- create a file indicating init system usage (systemd)
- mount the device at a second (final) location
- umount (lazy) the temporary mount path
- enable the systemd ceph-osd@{osd_id} service
- start the systemd ceph-osd@{osd_id} service
This logic is therefore best left in a systemd service for execution. As
it is less limited in terms of execution time, and also allows for
improved event handling in future (fsck, dmcrypt mapping etc.).
This change sees 95-ceph-osd.rules.systemd trigger ceph-disk activate or
ceph-disk activate-journal via new ceph-disk-activate-journal@.service,
ceph-disk-activate@.service and ceph-disk-dmcrypt-activate@.service
systemd service files.
ceph-disk-dmcrypt-activate@.service makes use of the newly added
--dmcrypt parameter for ceph-disk activate.
Signed-off-by: David Disseldorp <ddiss@suse.de>
LUKS allows for validation of the key at mount time (rather than
simply mounting a random partition), specification of the encryption
parameters in the header and key rollover of the slot key (the one
that needs to be stored).
New parameters 'osd cryptsetup parameters' and 'osd dmcrypt key size' are
added. These allow these important policy choices to be overridden or
kept consistent per-site.
The previous default plain mode (rather than using LUKS) remains, select
LUKS by setting 'osd dmcrypt type = luks'
Signed-off-by: Andrew Bartlett <abartlet@catalyst.net.nz>
Activate an osd via its journal device. udev populates its symlinks and
triggers events in an order that is not related to whether the device is
an osd data partition or a journal. That means that triggering
'ceph-disk activate' can happen before the journal (or journal symlink)
is present and then fail.
Similarly, it may be that they are on different disks that are hotplugged
with the journal second.
This can be wired up to the journal partition type to ensure that osds are
started when the journal appears second.
Include the udev rules to trigger this.
Signed-off-by: Sage Weil <sage@inktank.com>
Wheezy's udev (175-7.2) has broken rules for the /dev/disk/by-partuuid/
symlinks that ceph-disk relies on. Install parallel rules that work. On
new udev, this is harmless; old older udev, this will make life better.
Fixes: #4865
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Two fixes for Centos 6.3 and other systems with udev versions
prior to 172. The disk peristant name using the GPT UUID does
not exist, so use the by_path persistent name instead for the
journal symlink.
The gpt label fields are not available for use in udev rules. Add
ceph-disk-udev wrapper script that extracts the partition
type guid from the label and calls ceph-disk-activate if it is
a ceph guid type. (Bug #4632)
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Automatically map encrypted journal partitions.
For encrypted OSD partitions, map them, wait for the mapped device to
appear, and then ceph-disk-activate.
This is much simpler than doing the work in ceph-disk-activate.
Signed-off-by: Sage Weil <sage@inktank.com>
Below is a patch which makes the ceph-rbdnamer script more robust and
fixes a problem with the rbd udev rules.
On our setup we encountered a symlink which was linked to the wrong rbd:
/dev/rbd/mypool/myrbd -> /dev/rbd1
While that link should have gone to /dev/rbd3 (on which a
partition /dev/rbd3p1 was present).
Now the old udev rule passes %n to the ceph-rbdnamer script, the problem
with %n is that %n results in a value of 3 (for rbd3), but in a value of
1 (for rbd3p1), so it seems it can't be depended upon for rbdnaming.
In the patch below the ceph-rbdnamer script is made more robust and it
now it can be called in various ways:
/usr/bin/ceph-rbdnamer /dev/rbd3
/usr/bin/ceph-rbdnamer /dev/rbd3p1
/usr/bin/ceph-rbdnamer rbd3
/usr/bin/ceph-rbdnamer rbd3p1
/usr/bin/ceph-rbdnamer 3
Even with all these different styles of calling the modified script, it
should now return the same rbdname. This change "has" to be combined
with calling it from udev with %k though.
With that fixed, we hit the second problem. We ended up with:
/dev/rbd/mypool/myrbd -> /dev/rbd3p1
So the rbdname was symlinked to the partition on the rbd instead of the
rbd itself. So what probably went wrong is udev discovering the disk and
running ceph-rbdnamer which resolved it to myrbd so the following
symlink was created:
/dev/rbd/mypool/myrbd -> /dev/rbd3
However partitions would be discovered next and ceph-rbdnamer would be
run with rbd3p1 (%k) as parameter, resulting in the name myrbd too, with
the previous correct symlink being overwritten with a faulty one:
/dev/rbd/mypool/myrbd -> /dev/rbd3p1
The solution to the problem is in differentiating between disks and
partitions in udev and handling them slightly differently. So with the
patch below partitions now get their own symlinks in the following style
(which is fairly consistent with other udev rules):
/dev/rbd/mypool/myrbd-part1 -> /dev/rbd3p1
Please let me know any feedback you have on this patch or the approach
used.
Regards,
Pascal de Bruijn
Unilogic B.V.
Signed-off-by: Pascal de Bruijn <pascal@unilogicnetworks.net>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
The device number depends on how many rbd images have been
mapped. Removing it makes the name determined solely by the name,
image, and snapshot that are mapped, for ease of scripting or persistence
across reboots.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>