On kernels that support it, and if 'rbd map' is given a chance to
modprobe, turn on single-major device number allocation scheme. For
users who for some reason don't want it, the workaround is to insert
the rbd module manually before executing the first 'rbd map' command.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
With the preparatory commits ("rbd: match against wholedisk device
numbers on unmap" and "rbd: match against both major and minor on unmap
on kernels >= 3.14") in, this amounts to chosing to work with new rbd
bus interfaces (/sys/bus/rbd/{add,remove}_single_major) if they are
available, instead of the old ones (/sys/bus/rbd/{add,remove}).
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
As described in commit "rbd: match against wholedisk device numbers on
unmap", currently we only match against major numbers. In preparation
for support for single-major device number allocation scheme, start
matching against minor numbers also, which newer kernels provide in
a /sys/bus/rbd/devices/<id>/minor sysfs attribute.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Currently the way 'rbd unmap' translates a user-provided block device
into an rbd id is it matches the major number of the specified device
against /sys/bus/rbd/devices/<id>/major for each rbd mapping and
declares success on the first match. This works for both entire disks
and partitions, because under the current device number allocation
scheme, each mapping means a new major number.
In preparation for support for single-major device number allocation
scheme, which would require matching both major and minor numbers, make
sure to always match against entire disk device numbers, by converting
the specified device major:minor pair into wholdedisk major:minor pair.
To achive that, use the libblkid library, which accomplishes this goal
by walking stable sysfs structures.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Use common/strict_strtol, which actually parses integers in a proper
way, instead of atoi for parsing /sys/bus/rbd/devices/<id>/major. This
is important, because the kernel apparently can write things like
"(none)" into that file, and in general is more bulletproof.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Clusters with many OSDs require a higher nofiles ulimit than the RHEL default. Increase it.
Tested-by: Dan van der Ster <daniel.vanderster@cern.ch>
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
This reverts commit e80ab94bf4.
We accept non-CephInt arguments again, now that we've got the monitors
handling differing APIs intelligently.
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
We ran into problems before when we made this a string because a mixed
cluster of mons might forward a client request with the wrong schema.
To make this work, we make the new code understand both the new and
old schema, and also backport a change to emperor and dumpling to
handle the new schema.
For the previous attempt to do this, see:
337195f0462fe0d0d97a
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Merge in changes to unify the API presented by the monitors and handle changes gracefully.
(Upgrade tests) Tested-by: Tamil Muthamizhan <tamil.muthamizhan@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Much like the CRUSH tunables, this first appears in kernel v3.9.
Unlike the CRUSH tunables, it does not appear in Ceph until v0.64
(post cuttlefish, pre dumpling).
Signed-off-by: Sage Weil <sage@inktank.com>
68fdcfa1cc changed the ObjectStore
interface in the 'next' branch, which was merged into master by
e5a02c33e2. Unfortunately the
Memstore (added via the master branch) was not corrected for this
interface change.
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
Replace
Ceph distributed file system
with
Ceph distributed storage system
to help reduce the idea that Ceph is just a file system.
Signed-off-by: Loic Dachary <loic@dachary.org>
We can easily deadlock if we put this in the Finisher thread behind other
work; do it synchronously!
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Support the presence of ITEM_NONE device numbers in the indep mapping as
proof of a bad mapping. Implement the associated unit tests.
Signed-off-by: Loic Dachary <loic@dachary.org>
The leader now checks to see if any monitors did not provide their
command set, and if so, shares the list of "classic" commands instead
of his own set. This will prevent users from seeing different commands
(depending on whether they connect to an old or new mon) while
performing upgrades, and will make it really obvious if they forgot
to upgrade one of the monitors!
Signed-off-by: Greg Farnum <greg@inktank.com>
We're about to use this at a basic level, to identify when we have
"classic" monitors in-quorum, but could also do something more
sophisticated like a set intersection on the commands.
Signed-off-by: Greg Farnum <greg@inktank.com>
If the Elector doesn't receive a set of commands from the elected leader, it
assumes the monitor is "classic" and uses the Dumpling command set as
the leader set.
Signed-off-by: Greg Farnum <greg@inktank.com>