the number of arguments to vfs_fsync has changed in kernel 2.6.35.
The S_BIAS macro (removed at about the same time in 2.6.35) is used to
detect whether vfs_fsync must be called with 2 or 3 args. In RHEL6
kernels 2.6.32, the removal of S_BIAS was backported, but the change
in vfs_fsync was not. So a check for RHEL_MAJOR < 7 is used in
addition to S_BIAS to find the correct number of args for vfs_fsync
call.
Signed-off-by: Thomas Schoebel-Theuer <tst@1und1.de>
The old code used mf->mf_max for correcting the file size, but that
was wrong for multiple writes in flight.
A really correct solution would have to remember all in-flight writes
and compute their minimum IO position. Since that would be too
costly, we just use the old size before any writes have started.
This might be too conservative for extremely high load patterns
(possible starvation problem). For now, take this and check whether
we really need higher effort.
By default, {dis}connect and {pause,resume}-{replay,sync} should
only switch the _local_ buttons. Otherwise, unexpected side-effects
could result at bigger clusters (#nodes >> 2) from a human point of view.
The new behaviour is different from DRBD, but DRBD was (until recently)
only working on _pairs_, so global spreadout was impossible.
Global switching may be requested at any time by appending suffix
"-global", which is just no longer the default in MARS.
If anyone has objections, it is straightforward to change the
defaults again.
Timestamps of symlinks are used for Lamport comparison
(any newer one overwrites any older one).
That concept should _not_ be used for any other comparison, since
there is no / not yet any "transactional" property of
bulk updates of symlinks (the Lamport condition treats each
symlink independently from any other).
Until such "transactions" are introduced at the strategy layer,
timestamp comparisons between _different_ symlinks are
unmeaningful in general.
Switch on only when all predecessor bricks are also on.
Failing to do so can result in fatal errors.
Similarly, switch only off if no successor exists any more.
Add infrastructure for splitting commands in multiple phases.
Usually, phase0 will check for some preconditions, while
phase1 will execute the command. The final result will only
be committed if nothing fails.
The difference to the old behaviour will only show up when combined
with 'all' resources. If anything fails in phase0, nothing will be
touched in phase1. The old behaviour could touch some resources,
but omit others when something failed.
The new behaviour is more transactional-like.