user-manual: rework scripting advice

This commit is contained in:
Thomas Schoebel-Theuer 2019-09-10 12:38:59 +02:00 committed by Thomas Schoebel-Theuer
parent 72c1ad5d78
commit 2cde907480
1 changed files with 187 additions and 193 deletions

View File

@ -31718,6 +31718,193 @@ Same as the
firmly built in.
\end_layout
\begin_layout Section
Scripting Advice
\begin_inset CommandInset label
LatexCommand label
name "sec:Scripting-HOWTO"
\end_inset
\end_layout
\begin_layout Standard
Both the
\series bold
asynchronous communication model
\series default
of MARS including the Lamport clock, and the
\series bold
state model
\series default
(cf section
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:The-State-of"
\end_inset
) is something you
\emph on
definitely
\emph default
should have in mind when you want to do some scripting.
Here is some advice:
\end_layout
\begin_layout Itemize
Don't access anything on
\family typewriter
/mars/
\family default
directly, except for debugging purposes.
Use
\family typewriter
marsadm
\family default
.
\end_layout
\begin_layout Itemize
Avoid running scripts in parallel, other than for inspection / monitoring
purposes.
When you give two
\family typewriter
marsadm
\family default
commands in parallel (whether on the same host, or on different hosts belonging
to the same cluster), it is possible to produce a mess.
\family typewriter
marsadm
\family default
has no internal locking.
There is no cluster-wide locking at all, because if would cause trouble
during long-distance network outages.
Unfortunately, some systems like Pacemaker are violating this in many cases
(depending on their configuration).
Best is if you have a dedicated / more or less centralized
\series bold
control machine
\series default
which controls masses of your georedundant working servers.
This reduces the risk of running interfering actions in parallel.
Of course, you need backup machines for your control machines, and in different
locations.
Not obeying this advice can easily lead to problems such as complex races
which are very difficult to solve in long-distance distributed systems,
even in general (not limited to MARS).
\end_layout
\begin_layout Itemize
\family typewriter
marsadm wait-cluster
\family default
is your friend.
Whenever your (near-)central script has to switch between different hosts
\family typewriter
A
\family default
and
\family typewriter
B
\family default
(of the same cluster), use it in the following way:
\begin_inset Newline newline
\end_inset
\family typewriter
ssh A
\begin_inset Quotes eld
\end_inset
marsadm action1
\begin_inset Quotes erd
\end_inset
; ssh B
\begin_inset Quotes eld
\end_inset
marsadm wait-cluster; marsadm action2
\begin_inset Quotes erd
\end_inset
\begin_inset Newline newline
\end_inset
\family default
\begin_inset Graphics
filename images/MatieresCorrosives.png
lyxscale 50
scale 17
\end_inset
Don't ignore this advice! Interference is almost
\emph on
sure
\emph default
! As a rule of thumb, precede almost any action command with some appropriate
waiting command!
\end_layout
\begin_layout Itemize
Further friends are any
\family typewriter
marsadm wait-*
\family default
commands, such as
\family typewriter
wait-umount
\family default
.
\end_layout
\begin_layout Itemize
In some places, busy-wait loops might be needed, e.g.
for waiting until a specific resource is
\family typewriter
UpToDate
\family default
or matches some other condition.
Examples of waiting conditions can be found under
\family typewriter
github.com/schoebel/test-suite
\family default
in subdirectory
\family typewriter
mars/modules/
\family default
, specifically
\family typewriter
02_predicates.sh
\family default
or similar.
\end_layout
\begin_layout Itemize
In case of network problems, some command may hang (forever), if you don't
set the
\family typewriter
--timeout=
\family default
option.
Don't forget the check the return state of any failed / timeouted commands,
and to take appropriate measures!
\end_layout
\begin_layout Itemize
Test your scripts in failure scenarios!
\end_layout
\begin_layout Chapter
Troubleshooting
\end_layout
@ -32014,199 +32201,6 @@ name "chap:The-Macro-Processor"
\end_layout
\begin_layout Section
Scripting HOWTO
\begin_inset CommandInset label
LatexCommand label
name "sec:Scripting-HOWTO"
\end_inset
\end_layout
\begin_layout Standard
Both the
\series bold
asynchronous communication model
\series default
of MARS (cf section
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:The-Lamport-Clock"
\end_inset
) including the Lamport clock, and the
\series bold
state model
\series default
(cf section
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:The-State-of"
\end_inset
) is something you
\emph on
definitely
\emph default
should have in mind when you want to do some scripting.
Here is some further concrete advice:
\end_layout
\begin_layout Itemize
Don't access anything on
\family typewriter
/mars/
\family default
directly, except for debugging purposes.
Use
\family typewriter
marsadm
\family default
.
\end_layout
\begin_layout Itemize
Avoid running scripts in parallel, other than for inspection / monitoring
purposes.
When you give two
\family typewriter
marsadm
\family default
commands in parallel (whether on the same host, or on different hosts belonging
to the same cluster), it is very likely to produce a mess.
\family typewriter
marsadm
\family default
has no internal locking.
There is no cluster-wide locking at all.
Unfortunately, some systems like Pacemaker are violating this in many cases
(depending on their configuration).
Best is if you have a dedicated / more or less centralized
\series bold
control machine
\series default
which controls masses of your georedundant working servers.
This reduces the risk of running interfering actions in parallel.
Of course, you need backup machines for your control machines, and in different
locations.
Not obeying this advice can easily lead to problems such as complex races
which are very difficult to solve in long-distance distributed systems,
even in general (not limited to MARS).
\end_layout
\begin_layout Itemize
\family typewriter
marsadm wait-cluster
\family default
is your friend.
Whenever your (near-)central script has to switch between different hosts
\family typewriter
A
\family default
and
\family typewriter
B
\family default
(of the same cluster), use it in the following way:
\begin_inset Newline newline
\end_inset
\family typewriter
ssh A
\begin_inset Quotes eld
\end_inset
marsadm action1
\begin_inset Quotes erd
\end_inset
; ssh B
\begin_inset Quotes eld
\end_inset
marsadm wait-cluster; marsadm action2
\begin_inset Quotes erd
\end_inset
\begin_inset Newline newline
\end_inset
\family default
\begin_inset Graphics
filename images/MatieresCorrosives.png
lyxscale 50
scale 17
\end_inset
Don't ignore this advice! Interference is almost
\emph on
sure
\emph default
! As a rule of thumb, precede almost any action command with some appropriate
waiting command!
\end_layout
\begin_layout Itemize
Further friends are any
\family typewriter
marsadm wait-*
\family default
commands, such as
\family typewriter
wait-umount
\family default
.
\end_layout
\begin_layout Itemize
In some places, busy-wait loops might be needed, e.g.
for waiting until a specific resource is
\family typewriter
UpToDate
\family default
or matches some other condition.
Examples of waiting conditions can be found under
\family typewriter
github.com/schoebel/test-suite
\family default
in subdirectory
\family typewriter
mars/modules/
\family default
, specifically
\family typewriter
02_predicates.sh
\family default
or similar.
\end_layout
\begin_layout Itemize
In case of network problems, some command may hang (forever), if you don't
set the
\family typewriter
--timeout=
\family default
option.
Don't forget the check the return state of any failed / timeouted commands,
and to take appropriate measures!
\end_layout
\begin_layout Itemize
Test your scripts in failure scenarios!
\end_layout
\begin_layout Chapter
The Sysadmin Interface (
\family typewriter