Fix typos

[small adaptations by Thomas Schoebel-Theuer, and
some problems with LyX-specific file format fixed]
This commit is contained in:
Andrea Gelmini 2023-03-06 12:55:28 +01:00 committed by Thomas Schoebel-Theuer
parent b5792db970
commit dd1e4e1323
9 changed files with 155 additions and 155 deletions

View File

@ -18,7 +18,7 @@ Finally we have [4/4] replicas.
3) handover the primary to one node of the _new_ pair.
4) use "marsadm leave-resource" to get rid of the _old_ two replicas.
New replica status is again [2/4], but the the replicas have been
New replica status is again [2/4], but the replicas have been
migrated to the new pair in the meantime.
5) Finally, use "marsadm split-cluster" to go back to [2/2].

View File

@ -31,7 +31,7 @@ This document including all attachments is under GNU FDL.
---++ 3. Processes
%IMAGE{"processes.png" size="1000"}%
After the initial synchronzation of the secondary with the primary the data written on the mars device /dev/mars/<resource-name> of a primary is at first written to sequential logfiles (residing in primary's directory /mars/resource-<resource-name>/) and then copied to the primary's underlying device. The secondary fetches the primary's logfiles in it's /mars/resource-<resource-name>/ directory and then copies the data to it's underlying device. %BR%
After the initial synchronzation of the secondary with the primary the data written on the mars device /dev/mars/<resource-name> of a primary is at first written to sequential logfiles (residing in primary's directory /mars/resource-<resource-name>/) and then copied to the primary's underlying device. The secondary fetches the primary's logfiles in its /mars/resource-<resource-name>/ directory and then copies the data to its underlying device. %BR%
We distinguish the following subprocesses of a replication:
* Sync: Synchronizes the underlying device of the secondary with the data of the underlying device of the primary. Triggered by: <verbatim>marsadm invalidate <resource-name></verbatim> <verbatim>marsadm join-resource <resource-name> /dev/...</verbatim> During sync the data on the secondary's underlying device is inconsistent, i.e. unusable.
@ -40,7 +40,7 @@ We distinguish the following subprocesses of a replication:
#ChapterFour
---++ 4. Process state
Mars works asynchronously. This means that each of the processes mentioned above does it's job without waiting for the others.%BR% Examples:
Mars works asynchronously. This means that each of the processes mentioned above does its job without waiting for the others.%BR% Examples:
* An application may write on the mars device on the primary (process 1) thereby filling the logfiles (process 2) independent from the process (the replay (process 3)) which writes the logfile data to the underlying device. If there is a gap between the writer and the replay (which occurs very rarely) this can be shown with <verbatim>marsadm view-1and1 <resource-name>|all</verbatim> or more specific <verbatim>marsadm view-replay-line-1and1 <resource-name>|all</verbatim> or if you are interested in numbers given in bytes <verbatim>marsadm view-replay-rest <resource-name>|all</verbatim>
* If the network connection between primary and secondary is slow the fetch process (process 4) lags behind, i.e. there are more logfile data on the primary than on the secondary. Replacing "replay" with "fetch" in the commands given above shows the relevant information for the fetch process.
* As you already guessed probably: The sync process can be watched by replacing "replay" with "sync" in the mentioned commands.

View File

@ -257,7 +257,7 @@ sub display_partner {
print_screen "$SStatus$SSpeed\n";
print_screen "\t\t---> WORK: Sync in progress = ($SStatus% < 100.00%)", "$Color_blue";
if ( "$SConnect" ne "OK" ){
print_screen ", transfered from $SConnect\n", "$Color_blue";
print_screen ", transferred from $SConnect\n", "$Color_blue";
} else {
print_screen "\n";
}
@ -333,7 +333,7 @@ sub display_partner {
print_screen "$RStatus$RSpeed\n";
print_screen "\t\t---> WORK: Replay in progress = ($RStatus% < 100.00%)", "$Color_blue";
if ( "$RConnect" ne "OK" ){
print_screen ", transfered from $RConnect\n", "$Color_blue";
print_screen ", transferred from $RConnect\n", "$Color_blue";
} else {
print_screen "\n";
}
@ -353,7 +353,7 @@ sub display_partner {
### replay - hints
if ($PLogFile[2] != 0) {
print_screen "\t\t---> WORK: Replay-Todo is actualy $PLogFile[2], ", "$Color_blue";
print_screen "\t\t---> WORK: Replay-Todo is actually $PLogFile[2], ", "$Color_blue";
if ( $PLogFile[2] < 0 ) {
print_screen "replaying backwards ??? Check this !!!\n", "$Color_red";
} elsif ( $PLogFile[2] > 0 ) {
@ -930,7 +930,7 @@ sub info_version {
### status
print_screen "MARS Status - $himself, $version", "$Color_blue";
if ( $params->{'resource'} ) { print_screen ", Ressource: $params->{'resource'}", "$Color_blue"; }
if ( $params->{'resource'} ) { print_screen ", Resource: $params->{'resource'}", "$Color_blue"; }
print_screen "\n";
### marsadm
@ -1061,14 +1061,14 @@ sub check_limit {
print_screen "$mars_limit_sol $LimitSolEin,";
### restliches
} elsif ( $mars_limit_sol < 1 ) {
print_screen "is now unsed,", "$Color_green";
print_screen "is now unused,", "$Color_green";
} else {
print_screen "is set to ";
print_screen "$mars_limit_sol $LimitSolEin,", "$Color_red";
}
} elsif ( !($LimitSolVar) && ($LimitIstVar) ) {
### only ist
print_screen "is actualy ";
print_screen "is actually ";
if ( $mars_limit_ist < 1 ) {
if ( $LimitIstEin eq "on/off" ) {
@ -1086,12 +1086,12 @@ sub check_limit {
# TODO fixen !
# } elsif ( ($LimitSolVar) && ($LimitIstVar) && ($mars_limit_sol < 1) ) {
# ### sol & ist = 0
# print_screen "is actualy unused(X),";
# print_screen "is actually unused(X),";
} else {
### sol & ist / rest ...
print_screen "is set to ";
print_screen "$mars_limit_sol $LimitSolEin", "$Color_red";
print_screen ", actualy used ";
print_screen ", actually used ";
print_screen "$mars_limit_ist $LimitIstEin,", "$Color_red";
}
@ -1147,7 +1147,7 @@ sub check_systemstatus {
my $mars_disk_space = `df '$mars_dir' | grep '$mars_dir'| awk '{print \$2}'`;
$mars_disk_space = sprintf("%01.2f", $mars_disk_space / 1024);
check_limit "-> Free-Space-Limit on /mars", "required_free_space_1_gb", "mb (actualy $mars_disk_space mb used)";
check_limit "-> Free-Space-Limit on /mars", "required_free_space_1_gb", "mb (actually $mars_disk_space mb used)";
print "\n";
}

View File

@ -172,7 +172,7 @@ Scope
\end_layout
\begin_layout Standard
The following topics are covered withing this document:
The following topics are covered within this document:
\end_layout
\begin_layout Itemize
@ -568,7 +568,7 @@ transaction logfile
\emph on
Any
\emph default
write reqeuest is treated like a transaction which changes the contents
write request is treated like a transaction which changes the contents
of your LV.
\end_layout
@ -717,7 +717,7 @@ fsync()
\family typewriter
/dev/mars/mydata
\family default
is signalled that the write was successful
is signaled that the write was successful
\begin_inset Foot
status open
@ -1309,7 +1309,7 @@ https://github.com/schoebel/blkreplay/raw/master/doc/blkreplay.pdf
\end_layout
\begin_layout Standard
For many application workloads, RAID-6 provides a good compromize between
For many application workloads, RAID-6 provides a good compromise between
cost and performance.
Reads are very fast due to RAID-6 striping, while the slow RAID-6 writes
are partially compensated by the MARS kernel memory buffer (see section
@ -1530,7 +1530,7 @@ dedicated HDD
with a capacity of 4 TB or more.
Typically, this will provide you with plenty of headroom even for bigger
networking incidents.
Performace of a single HDD over a BBU is typically good enough for
Performance of a single HDD over a BBU is typically good enough for
\family typewriter
/mars
\family default
@ -1565,7 +1565,7 @@ mars
\end_layout
\begin_layout Enumerate
For extemely high performance, separate SSD sets for the user data VG and
For extremely high performance, separate SSD sets for the user data VG and
for
\family typewriter
/mars
@ -1578,7 +1578,7 @@ For extemely high performance, separate SSD sets for the user data VG and
exist
\emph default
some workloads where sequntial IO to HDDs is faster than to SSDs.
Sometimes, there are hidden performance bottlenecks, such as SAS busses,
Sometimes, there are hidden performance bottlenecks, such as SAS buses,
or some old-generation RAID controllers.
\end_layout
@ -4188,7 +4188,7 @@ netstat --tcp | grep 777
\family default
Both variants should show up some healty connections.
Both variants should show up some healthy connections.
If not, fix your network configuration and/or firewalling etc.
Details are outside of the scope of this manual.
\end_layout
@ -4277,7 +4277,7 @@ noprefix "false"
\emph on
some dynamic behaviour
\emph default
like growth and hardware lifecyle.
like growth and hardware lifecycle.
Thus they need
\emph on
updates
@ -4666,7 +4666,7 @@ huge
\family typewriter
Football
\family default
for hardware lifecyle or for long-term load balancing over a very long
for hardware lifecycle or for long-term load balancing over a very long
time): newer versions of
\family typewriter
marsadm
@ -4839,7 +4839,7 @@ standalone mode
\end_inset
).
But you can also do so later after setup of (one ore many) secondaries.
But you can also do so later after setup of (one or many) secondaries.
\end_layout
\begin_layout Enumerate
@ -5720,7 +5720,7 @@ description-text
\family typewriter
%replay-code{}
\family default
) Typicially this indicates a checksum error in a transaction logfile, or
) Typically this indicates a checksum error in a transaction logfile, or
another (hardware / filesystem) defect.
This occurs extremely rarely in practice, but has been observed more frequently
during a massive failure of air conditioning in a datacenter, when disk
@ -5805,7 +5805,7 @@ When the damage is only at one of your secondaries, and the primary continues
\family typewriter
marsadm cron
\family default
, wait for the secondary to get this knowlege over the network, and try
, wait for the secondary to get this knowledge over the network, and try
\family typewriter
marsadm invalidate
@ -5998,7 +5998,7 @@ marsadm invalidate
\begin_inset Newline newline
\end_inset
There is an execption: shortly after
There is an exception: shortly after
\family typewriter
join-resource
\family default
@ -6375,7 +6375,7 @@ UnResponsive
\family typewriter
mars_main
\family default
did not do any noticable work for more than
did not do any noticeable work for more than
\family typewriter
%{window}
\family default
@ -6776,7 +6776,7 @@ fetch:
\family typewriter
F
\family default
= according to knowlege, fetched logfiles are up-to-date,
= according to knowledge, fetched logfiles are up-to-date,
\family typewriter
f
\family default
@ -7639,7 +7639,7 @@ right
\emph on
actual
\emph default
primary mode during that time, and the secondaries will sync therefrom.
primary mode during that time, and the secondaries will sync there from.
As soon as the local
\family typewriter
/dev/mars/mydata
@ -8097,7 +8097,7 @@ status open
\begin_layout Plain Layout
Notice: in certain network outage scenarios, you may not be able to remotely
login to the console and to check whether a server is running.
Therefore it may happen that you erronously think hostA is dead, while
Therefore it may happen that you erroneously think hostA is dead, while
in reality it continues running.
Even if you would know it, you might not be able to remotely kill it in
a STONITH-like manner.
@ -8292,7 +8292,7 @@ connection loss
\emph default
(e.g.
networking problems / network partitions), you may not be able to reliably
detect whether a split brain has actually occured, or not.
detect whether a split brain has actually occurred, or not.
\end_layout
\begin_layout Paragraph
@ -8747,7 +8747,7 @@ In rare cases (when
\family typewriter
/mars
\family default
is almost full somewhere, or when emergency mode has occured somewhere),
is almost full somewhere, or when emergency mode has occurred somewhere),
you may need to run
\family typewriter
marsadm cron
@ -8821,7 +8821,7 @@ On those cluster nodes where you want to retain some SPLIT BRAIN version
\begin_inset Quotes eld
\end_inset
emergengy backup
emergency backup
\begin_inset Quotes erd
\end_inset
@ -9079,7 +9079,7 @@ first
one
\emph default
of them.
Leave the other one intact, by not umounting
Leave the other one intact, by not unmounting
\family typewriter
/dev/mars/mydata
\family default
@ -10471,7 +10471,7 @@ write_throttle_start_percent
slowly
\emph default
.
Defaul value is 0, which means
Default value is 0, which means
\begin_inset Quotes eld
\end_inset
@ -10480,7 +10480,7 @@ off
\end_inset
.
Practical values for this coule be around 80%.
Practical values for this could be around 80%.
\end_layout
\begin_layout Description
@ -10844,7 +10844,7 @@ marsadm cron
\end_layout
\begin_layout Enumerate
As soon as emough space has been freed everywhere to leave the
As soon as enough space has been freed everywhere to leave the
\family typewriter
EMEGENCY MODE HYSTERESIS
\family default
@ -10914,7 +10914,7 @@ Wait until sync has finished at hostB.
\begin_layout Enumerate
If you have more than 2 replicas in total: proceed with step 4 at hostC,
and so on.
This time, you could join multipe resources in parallel, because you already
This time, you could join multiple resources in parallel, because you already
have a life replica at hostB.
\end_layout
@ -11182,7 +11182,7 @@ Don't use in scripts! Only use by hand!
\begin_layout Plain Layout
\size scriptsize
This option does not change the internal waiting logic for thois commands
This option does not change the internal waiting logic for this commands
which emulate synchronous behaviour on top of the asynchronous communication
paradigm.
Many commands are waiting until the desired effect has succeeded.
@ -11679,7 +11679,7 @@ planned
\end_inset
Careful when using this on extremely huge LVs where the sync may take serveral
Careful when using this on extremely huge LVs where the sync may take several
days, or weeks.
It is your sysadmin decision what you want to prefer: restarting the sync,
or planned handover.
@ -12819,7 +12819,7 @@ status open
\size scriptsize
Avoid any potential timeouts / hangs caused by networks or firewalls, by
explicitly disabling the old ssh-based communication method, and relying
on the new MARS communication protocol (by defaut on port 7777).
on the new MARS communication protocol (by default on port 7777).
\end_layout
\end_inset
@ -13008,7 +13008,7 @@ all local network interfaces are scanned by
\family typewriter
/sbin/ip
\family default
for IPv4 adresses, and the
for IPv4 addresses, and the
\emph on
first
\emph default
@ -13327,7 +13327,7 @@ Postcondition: the initial symlink tree is created in
/mars/uuid
\family default
symlink is created for later distribution in the cluster.
It uniquely indentifies the cluster in the world.
It uniquely identifies the cluster in the world.
\end_layout
\begin_layout Plain Layout
@ -13631,7 +13631,7 @@ In ancient MARS versions before mars0.1astable101 the kernel module
\emph on
must not
\emph default
be loaded, and a working ssh connecttion to
be loaded, and a working ssh connection to
\family typewriter
$host
\family default
@ -13823,7 +13823,7 @@ marsadm leave-resource
\family default
).
The kernel module should be loaded and the network should be operating
in order to also propogate the effect to the other cluster nodes.
in order to also propagate the effect to the other cluster nodes.
\end_layout
\begin_layout Plain Layout
@ -13898,7 +13898,7 @@ rmmod
\end_inset
passivley fetching the symlink tree.
passively fetching the symlink tree.
In order to really stop all communication, the kernel module should be
unloaded afterwards (rmmod mars).
The local
@ -13926,7 +13926,7 @@ zombies
\size scriptsize
In case of an unintended hardware destruction (e.g.
fire, water, ...) this command should be used on another healty cluster node
fire, water, ...) this command should be used on another healthy cluster node
$helper in order to finally remove $damaged from the cluster via the command
\family typewriter
@ -14902,7 +14902,7 @@ Postcondition: the
/mars/uuid
\family default
symlink is created for later distribution in the cluster.
It uniquely indentifies the cluster in the world.
It uniquely identifies the cluster in the world.
\end_layout
\begin_layout Plain Layout
@ -14967,7 +14967,7 @@ Instead of executing
\family typewriter
marsadm
\family default
commands serveral times for each resource argument, you may give the special
commands several times for each resource argument, you may give the special
resource argument
\family typewriter
all
@ -15328,8 +15328,8 @@ Postcondition: the resource
\family typewriter
$res
\family default
is created, the inital role of the current node is primary.
The corresponding symlink tree information is asynchonously distributed
is created, the initial role of the current node is primary.
The corresponding symlink tree information is asynchronously distributed
in the cluster (in the background).
The device
\family typewriter
@ -15544,7 +15544,7 @@ Postcondition: the current node becomes a member of resource
\family typewriter
$res
\family default
, the inital role is secondary.
, the initial role is secondary.
The initial full sync should start after a while.
\end_layout
@ -15693,7 +15693,7 @@ marsadm down
\family default
).
The kernel module should be loaded and the network should be operating
in order to also propogate the effect to the other cluster nodes.
in order to also propagate the effect to the other cluster nodes.
\end_layout
\begin_layout Plain Layout
@ -15931,7 +15931,7 @@ leave-resource
--host=somebodyelse
\family default
argument in order to desperately try to destroy remains of incomplete or
pysically damaged hardware.
physically damaged hardware.
\end_layout
\begin_layout Plain Layout
@ -16113,7 +16113,7 @@ half-dead
\begin_inset Quotes erd
\end_inset
zombie nodes (beware of shapshot / restores on virtual machines!!).
zombie nodes (beware of snapshot / restores on virtual machines!!).
MARS does its best to avoid problems even in case the new resource name
should equal the old one, but there can be
\emph on
@ -16422,7 +16422,7 @@ marsadm cron
\emph on
temporary
\emph default
purposes (in constrast to
purposes (in contrast to
\emph on
full
\emph default
@ -17121,7 +17121,7 @@ pause-sync-global
\size scriptsize
WARNING! After this, and ather having paused any remote data access, you
WARNING! After this, and other having paused any remote data access, you
might use the underlying disk for your own purposes, such as test-mounting
it in
\emph on
@ -17130,7 +17130,7 @@ readonly
mode.
\series bold
Don't modifiy
Don't modify
\series default
its contents in any way! Not even by an
\family typewriter
@ -17188,7 +17188,7 @@ primary
\emph default
side, you may choose to resolve the inconsistencies by
\family typewriter
marsadm invalide $res
marsadm invalid $res
\family default
on
\emph on
@ -20738,7 +20738,7 @@ marsadm secondary
not recommended
\emph default
), at least the old primary must be reachable.
The (old) primarie's virutal device
The (old) primarie's virtual device
\family typewriter
/dev/mars/mydata
\family default
@ -21628,7 +21628,7 @@ reference "subsec:Forced-Switching"
\emph on
really
\emph default
need this command: before finally destroying a resouce via the
need this command: before finally destroying a resource via the
\emph on
last
\emph default
@ -23399,7 +23399,7 @@ marked
\size scriptsize
THIS IS HIGLY DANGEROUS FOR DATA CONSISTENCY!
THIS IS HIGHLY DANGEROUS FOR DATA CONSISTENCY!
\end_layout
\begin_layout Plain Layout
@ -23954,7 +23954,7 @@ status open
\begin_layout Plain Layout
\size scriptsize
Deprectated, will vanish.
Deprecated, will vanish.
Use
\family typewriter
view-role
@ -24067,7 +24067,7 @@ status open
\begin_layout Plain Layout
\size scriptsize
Deprectated, will vanish.
Deprecated, will vanish.
Use
\family typewriter
view-state
@ -24180,7 +24180,7 @@ status open
\begin_layout Plain Layout
\size scriptsize
Deprectated, will vanish.
Deprecated, will vanish.
Use
\family typewriter
view-cstate
@ -24293,7 +24293,7 @@ status open
\begin_layout Plain Layout
\size scriptsize
Deprectated, will vanish.
Deprecated, will vanish.
Use
\family typewriter
view-dstate
@ -24406,7 +24406,7 @@ status open
\begin_layout Plain Layout
\size scriptsize
Deprectated.
Deprecated.
Use
\family typewriter
view-status
@ -24550,7 +24550,7 @@ status open
\begin_layout Plain Layout
\size scriptsize
Deprectated, will vanish.
Deprecated, will vanish.
Don't use it.
Use
\family typewriter
@ -24664,7 +24664,7 @@ status open
\begin_layout Plain Layout
\size scriptsize
Deprectated, will vanish.
Deprecated, will vanish.
Don't use it.
Use
\family typewriter
@ -24778,7 +24778,7 @@ status open
\begin_layout Plain Layout
\size scriptsize
Deprectated, will vanish.
Deprecated, will vanish.
Don't use it.
Implement your own macros instead.
\end_layout
@ -24888,7 +24888,7 @@ status open
\begin_layout Plain Layout
\size scriptsize
Deprectated, will vanish.
Deprecated, will vanish.
Use
\family typewriter
view-the-err-msg
@ -25005,7 +25005,7 @@ status open
\begin_layout Plain Layout
\size scriptsize
Write the file content to stdout, but replace all occurences of numeric
Write the file content to stdout, but replace all occurrences of numeric
timestamps converted to a human-readable format.
Thus is most useful for inspection of status and log files, e.g.
@ -25740,7 +25740,7 @@ status open
\begin_layout Plain Layout
\size scriptsize
Inquiry of the maxium sync concurrency.
Inquiry of the maximum sync concurrency.
See also the primitive macro
\family typewriter
%global-sync-limit-value{}
@ -26787,7 +26787,7 @@ marsadm
\emph on
tries
\emph default
to achieves the intended result (typicially, you may use this after the
to achieves the intended result (typically, you may use this after the
\family typewriter
is-
@ -30319,7 +30319,7 @@ Now we come to benchmarking
\end_layout
\begin_layout Standard
So you might expect that performace of
So you might expect that performance of
\family typewriter
/dev/mars/lv-0
\family default
@ -31193,7 +31193,7 @@ socket bundling
\end_layout
\begin_layout Standard
It is mostly intendend for lines showing high packet loss.
It is mostly intended for lines showing high packet loss.
By using multiple TCP sockets in parallel for emulating a single logical
connection, throughput can be significantly increased.
\end_layout
@ -31273,7 +31273,7 @@ random
\begin_layout Standard
The next graphics shows the same, but over a medium distance of about 50km.
This line is even more heavily loaded with respect to the number of TCP
connections running in parallel (probly some 10,000 or even 100,000 if
connections running in parallel (probably some 10,000 or even 100,000 if
not more), and there is some kind of
\begin_inset Quotes eld
\end_inset
@ -31517,7 +31517,7 @@ $class.*.status
\end_inset
Beware, any permamently present
Beware, any permanently present
\family typewriter
*.log
\family default
@ -31566,7 +31566,7 @@ syslog_min_class
\family default
(rw) The
\emph on
mimimum
minimum
\emph default
class number for
\emph on
@ -31574,7 +31574,7 @@ permanent
\emph default
syslogging.
By default, this is set to -1 in order to switch off perment logging completely.
Permament logging can easily flood your syslog with such huge amounts of
Permanent logging can easily flood your syslog with such huge amounts of
messages (in particular when class=0), that your system as a whole may
become unusable (because vital kernel threads may be blocked too long or
too often by the userspace syslog daemon).
@ -31605,7 +31605,7 @@ permanent
\family typewriter
syslog_flood_class
\family default
(rw) The mimimum class of flood-protected syslogging.
(rw) The minimum class of flood-protected syslogging.
The maximum class is always 4.
\end_layout
@ -31615,9 +31615,9 @@ syslog_flood_class
\family typewriter
syslog_flood_limit
\family default
(rw) The maxmimum number of messages after which the flood protection will
(rw) The maximum number of messages after which the flood protection will
start.
This is a hard limit for the the number of messages written to the syslog.
This is a hard limit for the number of messages written to the syslog.
\end_layout
\begin_layout Labeling
@ -32845,7 +32845,7 @@ $HOME/.marsadm/systemd-templates/
/usr/local/lib/marsadm/systemd-templates/
\family default
.
Futher places can be defined by overriding the $
Further places can be defined by overriding the $
\family typewriter
MARS_PATH
\family default
@ -34203,7 +34203,7 @@ man systemd
to fail seem to be different from general conflicts in unit dependencies.
Although not precisely documented, the observed behaviour luckily appears
to make HA more likely.
There remains some uncertainity caused by the documented failure possibility.
There remains some uncertainty caused by the documented failure possibility.
A new option called
\family typewriter
--job-mode=append
@ -34226,7 +34226,7 @@ unnecessary
\end_inset
, which resulted in some behavioural improvements) the
, which resulted in some behavioral improvements) the
\family typewriter
--job-mode=fail
\family default
@ -34291,7 +34291,7 @@ is-failed
\begin_layout Enumerate
Systemd lacks an important property called Idempotence.
Idempotence is a very common feature in big industry plants, where hundreds
of human workers may act on controlling hundrets of facilities.
of human workers may act on controlling hundreds of facilities.
Each alarm call may cause a different person to try to
\begin_inset Quotes eld
\end_inset
@ -34416,7 +34416,7 @@ controlled abortion
at all.
Care must be taken that failures caused by aborts will not occur too frequently
for HA.
When failures caused by aborts are occuring too frequently, the concept
When failures caused by aborts are occurring too frequently, the concept
of abort should be disabled.
\end_layout
@ -35268,7 +35268,7 @@ BindsTo=
\family typewriter
PartOf=
\family default
dependencies, peferably augmented with
dependencies, preferably augmented with
\family typewriter
After=
\family default
@ -35897,7 +35897,7 @@ marsadm
\family typewriter
{all,the}-global-{inf,wrn,err}-msg
\family default
Dito, but more specific.
Ditto, but more specific.
\end_layout
\begin_layout Labeling
@ -35906,7 +35906,7 @@ marsadm
\family typewriter
{all,the}-pretty-{global-,}{inf-,wrn-,err-,}msg
\family default
Dito, but show numerical timestamps in a human readable form.
Ditto, but show numerical timestamps in a human readable form.
\end_layout
\begin_layout Labeling
@ -35961,7 +35961,7 @@ todo-primary
get-primary
\family default
is equal to the current host.
Similary,
Similarly,
\family typewriter
todo-secondary
\family default
@ -36014,7 +36014,7 @@ get-resource-{fat,err,wrn}
\family typewriter
get-resource-{fat,err,wrn}-count
\family default
Dito, but get the number of lines instead of the text.
Ditto, but get the number of lines instead of the text.
\end_layout
\begin_layout Labeling
@ -36362,7 +36362,7 @@ device-opened
\family typewriter
/dev/mars/mydata
\family default
has been actually openend, e.g.
has been actually opened, e.g.
by
\family typewriter
mount
@ -36392,7 +36392,7 @@ device-nrflying
\family default
Show the number of currently flying IO requests.
This is an indicator of queueing at the low-level device.
When it is permenantly very high, it may point at IO problems, such as
When it is permanently very high, it may point at IO problems, such as
RAID degradation.
\end_layout
@ -36408,10 +36408,10 @@ disk-error
\emph on
known
\emph default
IO error, as reported upwards to applications, and before it was resetted
IO error, as reported upwards to applications, and before it was reset
for whatever reason.
For example, it may be the last open() error on the underlying disk, or
something else may have occured during operations, and sometimes it may
something else may have occurred during operations, and sometimes it may
have corrected itself.
Normally, this should be always zero.
When < 0 according to return-code conventions as explained at
@ -36433,7 +36433,7 @@ device-error
\emph on
known
\emph default
IO error, as reported upwards to applications, and before it was resetted
IO error, as reported upwards to applications, and before it was reset
for whatever reason.
Normally, this should be always zero.
When < 0 according to return-code conventions as explained at
@ -36927,8 +36927,8 @@ true state
\end_inset
does not exist at all in a distributed system.
Anything you can know in a distributed system is always local knowlege,
which races with other (remote) knowlege, and may be outdated at
Anything you can know in a distributed system is always local knowledge,
which races with other (remote) knowledge, and may be outdated at
\emph on
any
\emph default
@ -37169,7 +37169,7 @@ told
\emph on
believes
\emph default
it has commited the data in a reboot-safe way.
it has committed the data in a reboot-safe way.
Whether this is
\emph on
really
@ -37770,7 +37770,7 @@ mydata
\family typewriter
global-sync-limit-value
\family default
(global) Report the maxium parallelism degree of sync, as configurable
(global) Report the maximum parallelism degree of sync, as configurable
via
\family typewriter
set-global-sync-limit
@ -38550,7 +38550,7 @@ delimiter
index
\family default
\emph default
'th list element is the assigend to, or substituted by,
'th list element is the assigned to, or substituted by,
\family typewriter
\emph on
expression
@ -38582,7 +38582,7 @@ arg2
\emph default
}
\family default
Evaluates the arguments, inteprets them as numbers, and adds them together.
Evaluates the arguments, interprets them as numbers, and adds them together.
\end_layout
\begin_layout Itemize
@ -39936,7 +39936,7 @@ arg1
argn
\family default
\emph default
are evaluted in the
are evaluated in the
\emph on
old
\emph default
@ -40340,7 +40340,7 @@ The value given by the
\family typewriter
--timeout=
\family default
option, or the corresonding default value.
option, or the corresponding default value.
\end_layout
\begin_layout Itemize
@ -40352,7 +40352,7 @@ The value given by the
\family typewriter
--threshold=
\family default
option, or the corresonding default value.
option, or the corresponding default value.
\end_layout
\begin_layout Itemize
@ -40364,7 +40364,7 @@ The value given by the
\family typewriter
--window=
\family default
option, or the corresonding default value (60s).
option, or the corresponding default value (60s).
\end_layout
\begin_layout Itemize
@ -41988,7 +41988,7 @@ marsadm view-flags all
\family default
, it is known to be inconsistent (e.g.
during a sync).
When there is a dash instead, it usually means that the disk is detatched
When there is a dash instead, it usually means that the disk is detached
or misconfigured or the kernel module is not started.
Please fix these problems first before believing that your local disk is
unusable.
@ -42498,7 +42498,7 @@ old snapshot
\family typewriter
/mars/
\family default
and/or of some underly resource disk /
and/or of some underlie resource disk /
\emph on
underlying
\emph default
@ -42527,7 +42527,7 @@ some(!)
\family default
has survived and the storage is operational again.
For exampley, any defective RAID disks have been already replaced and the
underlaying RAID is now
underlying RAID is now
\emph on
rebuilt
\emph default
@ -43154,7 +43154,7 @@ longer
Do not blame MARS for anything which is outside its scope residing at kernel
level.
Wheter and when a failover is needed for whatever reason, and in which
Whether and when a failover is needed for whatever reason, and in which
parallelism degree, is clearly outside the scope of a
\emph on
component
@ -43558,7 +43558,7 @@ good
for emergency cases), don't re-join them all in parallel, but rather start
with the oldest / most outdated / worst / inconsistent version first.
It is recommended to start the next one only when the previous one has
sucessfully finished.
successfully finished.
\end_layout
\begin_layout Chapter
@ -43775,7 +43775,7 @@ It should be very hard to finally trash a secondary, because the transaction
md5
\family default
checksums for all data records.
Any attempt to replay currupted logfiles is refused by MARS.
Any attempt to replay corrupted logfiles is refused by MARS.
In addition, the sequence numbers of rotated logfiles (e.g.
via
\family typewriter
@ -44211,7 +44211,7 @@ The following is a further alternative for
experts
\series default
who really know what they are doing.
The method is very simple and therefore well-suited for coping with mass
The method is very simple and therefore well-suited for copying with mass
failures, e.g.
\series bold
@ -44499,7 +44499,7 @@ name "chap:Creating-Backups-via"
\end_layout
\begin_layout Standard
When all your secondaries are all homogenously located in a standby datacenter,
When all your secondaries are all homogeneously located in a standby datacenter,
they will be almost idle all the time.
This is a waste of computing resources.
\end_layout

View File

@ -65,7 +65,7 @@ config MARS_DEBUG
default n
help
Some of these checks and some additional error tracing may
consume noticable amounts of memory.
consume noticeable amounts of memory.
OFF for production systems. ON for testing!
config MARS_DEBUG_DEFAULT
@ -111,7 +111,7 @@ config MARS_DEFAULT_PORT
default 7777
help
Best practice is to uniformly use the same port number
in a cluster. Therefore, this is a compiletime constant.
in a cluster. Therefore, this is a compile time constant.
You may override this at insmod time via the mars_port= parameter.
config MARS_SEPARATE_PORTS
@ -175,24 +175,24 @@ config MARS_ROLLOVER_INTERVAL
depends on MARS
default 3
help
May influence the system load; dont use too low nubmers.
May influence the system load; don't use too low numbers.
config MARS_SCAN_INTERVAL
int "re-scanning of symlinks in /mars/ (in seconds)"
depends on MARS
default 5
help
May influence the system load; dont use too low nubmers.
May influence the system load; don't use too low numbers.
config MARS_PROPAGATE_INTERVAL
int "network propagation delay of changes in /mars/ (in seconds)"
depends on MARS
default 5
help
May influence the system load; dont use too low nubmers.
May influence the system load; don't use too low numbers.
config MARS_SYNC_FLIP_INTERVAL
int "interrpt sync by logfile update after (seconds)"
int "interrupt sync by logfile update after (seconds)"
depends on MARS
default 60
help

View File

@ -383,7 +383,7 @@ int _make_mref(struct copy_brick *brick,
st = &GET_STATE(brick, index);
old_mref = READ_ONCE(st->table[queue]);
if (unlikely(old_mref)) {
MARS_ERR("cannot overrride old_mref=%p at index=%u queue=%d pos=%lld+%lld flags=%d\n",
MARS_ERR("cannot override old_mref=%p at index=%u queue=%d pos=%lld+%lld flags=%d\n",
old_mref,
index, queue,
current_pos, diff, flags);
@ -570,7 +570,7 @@ restart:
brick->copy_shutdown_started.tv_sec) {
struct lamport_time force_when;
/* We use the force alrady after mars_copy_timeout / 2
/* We use the force already after mars_copy_timeout / 2
* because the shutdown itself may take some
* further time (e.g. over network).
*/
@ -743,7 +743,7 @@ restart:
* _starting_ the writes is in order.
* This is only correct when all lower bricks obey the
* order of ref_io() operations.
* Currenty, bio and aio are obeying this. Be careful when
* Currently, bio and aio are obeying this. Be careful when
* implementing new IO bricks!
*/
if (mars_copy_strict_write_order &&

View File

@ -2,7 +2,7 @@ GPLed software AS IS, sponsored by 1&1 Internet AG (www.1und1.de).
The test suite is work in progress.
The test suite was developped by Frank Liepold during his stay at 1&1.
The test suite was developed by Frank Liepold during his stay at 1&1.
The email address frank.liepold@1und1.de will no longer work, since Frank has
left 1&1 since May 2014.
@ -51,7 +51,7 @@ Contents
1.1. Global settings
------------------
The directory where this README resides (normally <git-repo>/test_suite) is
called base directory in the following. If relativ paths are given they refer to
called base directory in the following. If relative paths are given they refer to
this base directory.
The frame work of the test suite consists of the files README,
@ -119,11 +119,11 @@ subdirectory (we call it start directory) as follows:
branch start directory -> test directory) by default. It may be changed by
the option --config_root_dir=<my dir>.
If the start directory coincides with the test directory the file
<test directory name>.conf is included for conveniance (in the strict
<test directory name>.conf is included for convenience (in the strict
sense there are no true subdirectories of the start directory residing
above the test directory).
The <subdirnam>.conf files may reside in the test directory or any of
it's parent directories (up to 20 levels).
its parent directories (up to 20 levels).
Examples:
@ -212,7 +212,7 @@ The output consists of the following sections:
--------------------------------------
Calling start_test.sh --help from a test directory doesn't start a test but
gives you the output mentioned in 1.2.1 without the sections produced during
real test excecution.
real test execution.
In particular the section "Configuration variables" is printed. So you can
determine which functions are called via the variable run_list. These functions
should be commented extensively enough to be able to understand the test's

View File

@ -364,7 +364,7 @@ sub device_exists {
# Silent fallback to local detection for old kernel module versions
my $buildtag = get_alive_link("buildtag", $peer, 1);
if (!$buildtag) {
# VERY old MARS modules dont report their version
# VERY old MARS modules don't report their version
$buildtag = `cut -d' ' -f1 < /proc/sys/mars/version`;
# Sometimes "never touch a running system" is a BAD strategy...
lwarn "Please upgrade your EXTREMELY OLD module version '$buildtag'\n" if $buildtag;
@ -815,7 +815,7 @@ sub _scan_caches {
lhint "Peer '$this_peer' looks like decommissioned (or we are in split-cluster).\n";
}
}
# ABOLUTE NOGO: the currently running host CANNOT be deleted
# ABSOLUTE NOGO: the currently running host CANNOT be deleted
if ($this_peer eq $real_host &&
(!$raw_ip || get_link_stamp($path) < $now - $window)) {
lwarn "IMPORTANT: this script is running under the REAL hostname '$real_host'\n";
@ -854,7 +854,7 @@ sub _scan_caches {
next;
}
}
# All has been checked now: rember this peer.
# All has been checked now: remember this peer.
$total_peers{$this_peer} = {};
}
# Add all known resources to %total_resources but _not_ to %any_resources.
@ -1145,7 +1145,7 @@ my $generated_scripts_subdir = defined($ENV{SYSTEMD_SCRIPTS_SUBDIR}) ?
my $predefined_unit_path = "/etc/systemd/system,/run/systemd/system,/usr/lib/systemd/system";
my $systemd_system_dirs =
# prefer the "offical" systemd path as documented in "man systemd.unit"
# prefer the "official" systemd path as documented in "man systemd.unit"
defined($ENV{SYSTEMD_UNIT_PATH}) ?
join(",", split(":", $ENV{SYSTEMD_UNIT_PATH})) .
(
@ -2113,7 +2113,7 @@ sub systemd_commit {
# At the moment, the complete transitive closure is re-computed once
# a small detail has changed. This is on the safe side, but not optimal.
# There is certainly room for improvement. However be cautious
# with respect to correctness under all cirumstances.
# with respect to correctness under all circumstances.
#
# Knuth is cited: "I can do it in half the time if it doesn't have
# to be correct".
@ -2144,7 +2144,7 @@ sub __systemd_generate {
@res_list = ($res);
} else {
@res_list = get_any_resources($host);
# We can only delete when the full set of transitive dependecies is known.
# We can only delete when the full set of transitive dependencies is known.
$do_delete = ($make_want && $make_watcher);
}
# Create initial systemd units
@ -2607,7 +2607,7 @@ sub get_global_versions {
lwarn "using different minor versions is possible, but you should upgrade your kernel module ASAP\n";
}
}
# compute the mimimum of kernel features capabilities
# compute the minimum of kernel features capabilities
my $start_time = mars_time();
my $stone_age = $start_time;
if ($cron_autoclean_days) {
@ -2689,7 +2689,7 @@ sub get_alive_links {
next if $peer =~ $match_reserved_id;
# After join-cluster & co, links may take a while to appear
$peers{$peer} = 1 if $non_participating_peers;
# peer must be a candiate matching the hosts spec
# peer must be a candidate matching the hosts spec
if ($hosts && $hosts ne "*") {
next unless $peer =~ m/(^|[+,{}])$hosts($|[+,{}])/;
}
@ -3384,7 +3384,7 @@ sub check_not_primary {
if (!$is_primary_recent || $desginated_primary_recent) {
if ($max_retry-- < 0) {
lwarn "Sorry, the primary status on resource '$res' is UNSTABLE or FLIPPING AROUND\n";
ldie "Please check whether there are DISTRIBUED RACES or amok-running scripts etc.\n" unless $force;
ldie "Please check whether there are DISTRIBUTED RACES or amok-running scripts etc.\n" unless $force;
lwarn "You said --force, I will continue AT YOUR RISK\n"
} else {
_trigger();
@ -4893,7 +4893,7 @@ sub senseless_cmd {
sub forbidden_cmd {
my ($cmd, $res) = @_;
ldie "command '$cmd' on resource '$res' cannot be used with MARS (migth affect too many hosts, lead to undesired consequences)\n";
ldie "command '$cmd' on resource '$res' cannot be used with MARS (might affect too many hosts, lead to undesired consequences)\n";
}
sub nyi_cmd {
@ -5318,11 +5318,11 @@ sub merge_cluster_old {
if ($other_uuid eq $uuid) {
lprint "Other cluster peer '$peer' has the same UUID.\n";
lprint "No resource name checking necessary.\n";
lprint "Operation '$cmd' will therfore work logically idempotent.\n";
lprint "Operation '$cmd' will therefore work logically idempotent.\n";
} else {
if (link_exists("$mars/tree-$peer")) {
lwarn "A valid tree signature '$mars/tree-$peer' already exists, thus it appears to be already merged!\n";
ldie "Aborting for saftey. Override via --force only if you know what you are doing!\n" unless $force;
ldie "Aborting for safety. Override via --force only if you know what you are doing!\n" unless $force;
}
# Check that both sets of resources are disjoint
lprint "Other cluster peer '$peer' has a different UUID, checking for resource name conflicts.\n";
@ -5344,7 +5344,7 @@ sub merge_cluster_old {
foreach my $res (@conflicts) {
lprint "\t$res\n";
}
ldie "Cannot $cmd: some resource directories exist at both clusters with same name.\nThis cannot be overriden.\nPlease resolve the conflict by hand.\n";
ldie "Cannot $cmd: some resource directories exist at both clusters with same name.\this cannot be overridden.\nPlease resolve the conflict by hand.\n";
}
lprint "List of total resources:\n";
foreach my $res (keys(%total_res)) {
@ -6134,7 +6134,7 @@ sub logrotate_res {
lwarn "logfile '$next' already exists - nothing to do\n";
return 0;
}
# safeguard defective /mars: the corresonding versionlink must exist.
# safeguard defective /mars: the corresponding versionlink must exist.
if (!is_link_recent($last)) {
my $vers_path = $last;
$vers_path =~ s:/log-:/version-:;
@ -6432,7 +6432,7 @@ sub link_purge_global {
# removal, because this would induce a _plethora_ of further changes to
# many reports / commands / interfaces / etc etc.
# Thus we _cannot_ use the _get_min_time() protection against dead / decommissioned
# peers here, UNFORTUNATLY :(
# peers here, UNFORTUNATELY :(
# Reason: this protection can only protect at more fine-grained layers, but it
# cannot protect the _base_ of all of this.
# Example: if you destroy the _foundation_ of a building, you have agreed to
@ -6567,7 +6567,7 @@ sub logdelete_res {
my $next = shift(@paths);
# never delete the very last logfile
last unless $next;
# safeguard: only delete logfiles having a minium age
# safeguard: only delete logfiles having a minimum age
last if !$force && is_link_recent($first);
$nr = $first;
$nr =~ s/^.*log-([0-9]+)-.+$/$1/;
@ -7822,7 +7822,7 @@ sub progress_bar {
sub make_numeric {
my $number = shift;
return 0 if (!defined($number) || $number eq "");
# skip followin parts of comma-separated lists
# skip following parts of comma-separated lists
$number =~ s/,.*//;
return $number;
}
@ -9742,7 +9742,7 @@ my %trivial_globs =
"occupied-size"
=> "",
"replay-code"
=> "When negative, this indidates that a replay/recovery error has occurred.",
=> "When negative, this indicates that a replay/recovery error has occurred.",
"errno-text"
=> "Convert errno numbers (positive or negative) into human readable text.",
"{sync,fetch,replay,work,syncpos}-{size,pos}"
@ -10235,7 +10235,7 @@ my %cmd_table =
"Deprecated.",
"Please use \"marsadm cron\" instead.",
"When possible, globally delete all old transaction logfiles which",
"are known to be superflous, i.e. all secondaries no longer need",
"are known to be superfluous, i.e. all secondaries no longer need",
"to replay them.",
"This must be regularly called by a cron job or similar, in order",
"to prevent overflow of the /mars/ directory.",
@ -10933,7 +10933,7 @@ marsadm [<global_options>] view[-<macroname>] [<resource_names> | all ]
--verbose
Increase speakyness of some commands.
--parallel
Only resonable when combined with \"all\".
Only reasonable when combined with \"all\".
For each resource, fork() a sub-process running independently
from other resources. May seepd up handover a lot.
However, several cluster managers are known to have problems
@ -10974,13 +10974,13 @@ marsadm [<global_options>] view[-<macroname>] [<resource_names> | all ]
--timeout=<seconds>
Current default: $timeout
Abort safety checks and waiting loops after timeout with an error.
When giving 'all' as resource agument, this works for each
When giving 'all' as resource argument, this works for each
resource independently.
The special value -1 means \"infinite\".
--window=<seconds>
Current default: $window
Treat other cluster nodes as healthy when some communcation has
occured during the given time window.
Treat other cluster nodes as healthy when some communication has
occurred during the given time window.
--stuck-seconds=<seconds>
Current default: $stuck_seconds
Some warnings, like stucking fetch or replay, will appear in
@ -11732,7 +11732,7 @@ if (ref($func) eq "ARRAY") {
sleep(1);
my $now = mars_time();
if ($now - $start_time > $timeout) {
lwarn "Condition '$headline' for resources '$res' not reached withing $timeout s\n";
lwarn "Condition '$headline' for resources '$res' not reached within $timeout s\n";
last;
}
}

View File

@ -16,7 +16,7 @@ the time to fully analyze all distros / distro versions and their udev
rules.
Since I am not an expert in writing udev rules (and I just needed
a quickfix for my own work), the files in this directoy should
a quickfix for my own work), the files in this directory should
be regarded as examples.
For example, the file 65-mars.rules should be copied to /lib/udev/rules.d/