doc: describe Football

This commit is contained in:
Thomas Schoebel-Theuer 2018-04-24 15:14:13 +02:00
parent c059daa99d
commit 2883c153e4
3 changed files with 724 additions and 0 deletions

60
docu/images/football.fig Normal file
View File

@ -0,0 +1,60 @@
#FIG 3.2 Produced by xfig version 3.2.5c
Landscape
Center
Metric
A4
100.00
Single
-2
1200 2
6 -135 855 495 1485
1 3 0 1 0 7 50 -1 -1 0.000 1 0.0000 180 1170 315 315 180 1170 495 1170
4 1 0 50 -1 18 11 0.0000 4 135 375 180 1305 CSV\001
4 1 0 50 -1 18 10 0.0000 4 120 495 180 1125 action\001
-6
1 2 0 1 0 7 50 -1 -1 0.000 1 0.0000 4950 3600 225 1125 4725 4725 5175 2475
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1710 2475 3465 2475 3465 2925 1710 2925 1710 2475
2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
1 1 2.00 75.00 120.00
2475 1800 2475 2475
2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
1 1 2.00 75.00 120.00
2700 2475 2700 1800
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1575 450 3600 450 3600 1800 1575 1800 1575 450
2 1 1 2 0 7 50 -1 -1 6.000 0 0 -1 1 0 2
1 1 2.00 75.00 120.00
4230 1080 3555 1080
2 1 1 2 0 7 50 -1 -1 6.000 0 0 -1 1 0 2
1 1 2.00 75.00 120.00
3600 1215 4230 1215
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1710 450 3465 450 3465 900 1710 900 1710 450
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1575 1395 3600 1395 3600 1800 1575 1800 1575 1395
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1125 450 1575 450 1575 1800 1125 1800 1125 450
2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
1 1 2.00 75.00 120.00
495 1125 1125 1125
2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
1 1 2.00 75.00 120.00
4050 3600 4680 3600
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3600 2475 4050 2475 4050 4725 3600 4725 3600 2475
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1575 4050 3600 4050 3600 4725 1575 4725 1575 4050
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1575 2475 3600 2475 3600 4050 1575 4050 1575 2475
4 1 0 50 -1 18 10 0.0000 4 150 1665 2610 2745 parameters / settings\001
4 1 0 50 -1 18 14 0.0000 4 180 1350 2565 1170 screener.sh\001
4 1 0 50 -1 18 12 0.0000 4 150 765 4725 1215 Humans\001
4 1 0 50 -1 18 10 0.0000 4 150 1665 2610 720 parameters / settings\001
4 1 0 50 -1 18 12 0.0000 4 195 1590 2610 1665 Screener plugins\001
4 1 0 50 -1 18 11 1.5708 4 165 960 1395 1125 input filters\001
4 1 0 50 -1 18 10 0.0000 4 120 285 4410 3465 ssh\001
4 1 0 50 -1 18 14 0.0000 4 180 1200 2655 3510 football.sh\001
4 1 0 50 -1 18 12 0.0000 4 195 1530 2655 4410 Football plugins\001
4 1 0 50 -1 18 11 1.5708 4 165 1140 4995 3600 machine pool\001
4 1 0 50 -1 18 11 1.5708 4 165 1200 3870 3600 output drivers\001

View File

@ -0,0 +1,45 @@
#FIG 3.2 Produced by xfig version 3.2.5c
Landscape
Center
Metric
A4
100.00
Single
-2
1200 2
6 810 1350 1170 1485
4 1 0 50 -1 18 10 0.0000 4 120 285 990 1485 ssh\001
-6
1 2 0 1 0 7 50 -1 -1 0.000 1 0.0000 450 1575 225 1350 225 2925 675 225
1 3 0 1 0 7 50 -1 -1 0.000 1 0.0000 5715 1575 315 315 5715 1575 6030 1575
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2250 225 2700 225 2700 2925 2250 2925 2250 225
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
4725 225 5175 225 5175 2925 4725 2925 4725 225
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1350 225 1800 225 1800 2925 1350 2925 1350 225
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2700 225 4725 225 4725 2250 2700 2250 2700 225
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2700 2250 4725 2250 4725 2925 2700 2925 2700 2250
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2835 225 4590 225 4590 675 2835 675 2835 225
2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
1 1 2.00 75.00 120.00
720 1575 1350 1575
2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
1 1 2.00 75.00 120.00
1800 1575 2205 1575
2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
1 1 2.00 75.00 120.00
5175 1575 5400 1575
4 1 0 50 -1 18 14 0.0000 4 225 1635 3735 1620 pool-optimizer\001
4 1 0 50 -1 18 11 1.5708 4 165 1140 540 1620 machine pool\001
4 1 0 50 -1 18 11 1.5708 4 135 1110 1620 1530 status cache\001
4 1 0 50 -1 18 11 1.5708 4 165 1080 2520 1575 input drivers\001
4 1 0 50 -1 18 11 1.5708 4 165 1200 4995 1575 output drivers\001
4 1 0 50 -1 18 12 0.0000 4 195 1335 3735 2655 action plugins\001
4 1 0 50 -1 18 14 0.0000 4 240 855 3735 1350 generic\001
4 1 0 50 -1 18 10 0.0000 4 150 1665 3735 495 parameters / settings\001
4 1 0 50 -1 18 11 0.0000 4 135 375 5715 1710 CSV\001
4 1 0 50 -1 18 10 0.0000 4 120 495 5715 1530 action\001

View File

@ -263,6 +263,24 @@ asynchronously
modes.
\end_layout
\begin_layout Abstract
MARS supports a new method for building Cloud Storage / Software Defined
Storage, called
\series bold
LV Football
\series default
.
\end_layout
\begin_layout Abstract
It comes with some automation scripts, leading to a similar functionality
than Kubernetes, but devoted to stateful LVs over
\series bold
virtual LVM pools
\series default
in the petabytes range.
\end_layout
\begin_layout Abstract
\paragraph_spacing double
\noindent
@ -38465,6 +38483,607 @@ By adding
, you can get a list of parameters for configuring and tweaking.
\end_layout
\begin_layout Section
Football Overview
\begin_inset CommandInset label
LatexCommand label
name "sec:Football-Overview"
\end_inset
\end_layout
\begin_layout Standard
Topmost architectural level:
\end_layout
\begin_layout Standard
\noindent
\align center
\begin_inset Graphics
filename images/pool-optimizer.fig
width 100col%
\end_inset
\end_layout
\begin_layout Standard
\noindent
The heart of the Football system is the generic pool optimizer, which aims
to provide a similar functionality than Kubernetes, but working on a sharding
architecture.
Instead of controlling
\emph on
stateless
\emph default
Docker containers, its designated goal is to control masses of LVs on thousands
of machines, creating a
\begin_inset Quotes eld
\end_inset
Virtually Distributed LVM pool
\begin_inset Quotes erd
\end_inset
(petabytes of total storage), and doing similar things than Software Defined
Storage (SDS) on the virtual pool.
\end_layout
\begin_layout Standard
In addition to load balancing of storage space (and its special cases like
hardware lifecycle), there are designated plugins for dealing with CPU
and RAM dimensions.
Further dimensions and a variety of goal functions could be added via future
plugins.
The optimizer itself aims to be as generic as possible, while functionality
and interfaces can be added via plugins and/or drivers.
Future versions might even support DRBD in addition to MARS.
The current version uses a simple greedy algorithm for solving the underlying
\begin_inset Formula ${\cal NP}$
\end_inset
-complete problem, but could be augmented with more sophisticated problem
solvers in future.
\end_layout
\begin_layout Standard
The automatic operations generated by pool-optimizer are not only customizable
by dozens of parameters, but also extendable by action plugins.
At the moment, the following
\family typewriter
football.sh
\family default
actions are implemented:
\end_layout
\begin_layout Description
\family typewriter
migrate
\family default
This will move an LV (together with its VM / LXC container / etc) to a
different machine in the machine pool.
This is the classical Football
\begin_inset Quotes eld
\end_inset
kick
\begin_inset Quotes erd
\end_inset
operation.
\end_layout
\begin_layout Description
\family typewriter
shrink
\family default
This decreases the occupied LV space of a filesystem (currently only
\family typewriter
xfs
\family default
implemented, but easily extendable) via creation of a smaller temporary
LV at the hypervisor, then transferring all data during operations via
local
\family typewriter
rsync
\family default
, then shutting down the VM for a short period, doing a final incremental
\family typewriter
rsync
\family default
, renaming the copied temporary LV to its original name, restarting the
VM on the new version (which contains the same data as before but wastes
less space), and finally re-establishing the MARS replicas (but of course
with smaller LV size).
\end_layout
\begin_layout Description
\family typewriter
extend
\family default
This is much easier than shrinking: it first increases the underlying LV
size dynamically on all replicas, then
\family typewriter
marsadm resize
\family default
, and finally calls
\family typewriter
xfs_growfs
\family default
while the filesystem remains mounted and while the VM / container is running.
\end_layout
\begin_layout Description
\family typewriter
migrate+shrink
\family default
Similar to
\family typewriter
migrate
\family default
immediately followed by
\family typewriter
shrink
\family default
, but produces less network traffic and runs faster.
\end_layout
\begin_layout Description
\family typewriter
migrate+shrink+back
\family default
Use this when there is not enough local temporary space for shrinking.
The LV is first migrated to a temporary host, then shrunk, and finally
migrated back to its original position.
\end_layout
\begin_layout Standard
By running the overall system in an endless loop, a control loop for permanent
optimization can be established.
Typical periods are each few days, or once a week.
In addition, manual triggering is also possible.
\end_layout
\begin_layout Standard
The result of an (incremental) pool-optimizer run is a CSV file, which may
be automatically forwarded to the execution engine
\family typewriter
football.sh
\family default
for
\emph on
manual
\emph default
execution, or to
\family typewriter
screener.sh
\family default
for mass execution on a common control machine.
Alternatively, intermediate steps like manual checking, filtering etc may
be inserted into the processing pipeline.
\end_layout
\begin_layout Standard
\noindent
\align center
\begin_inset Graphics
filename images/football.fig
width 90col%
\end_inset
\end_layout
\begin_layout Standard
\noindent
The so-called Screener is simply a generic program allowing mass execution
of arbitrary scripts in background
\family typewriter
screen
\family default
sessions.
This allows masses (several hundreds, possibly thousands) of long-lasting
processes (hours or days) to run
\emph on
unattended
\emph default
in background, while allowing a (larger) group of sysadmins to attach /
detach to
\family typewriter
screen
\family default
sessions at any time for corrective by-hand actions, e.g.
in case of failures or other problems, or for supervision, etc.
\end_layout
\begin_layout Standard
When Screener is combined with the Football execution engine
\family typewriter
football.sh
\family default
, more specialized functionality is available (via a variety of plugins):
\end_layout
\begin_layout Itemize
Optional waiting for sysadmin confirmation before some customer downtime
is initiated.
\end_layout
\begin_layout Itemize
Automatic generation of
\family typewriter
motd
\family default
status reporting to other sysadmins.
\end_layout
\begin_layout Itemize
Automatic sending of email alerts or status reports, e.g.
on errors or critical errors, etc.
By sending email to SMS gateways, real-time alerting can be configured
(e.g.
over the weekend).
\end_layout
\begin_layout Itemize
Generic interfacing to external scripts with configurable parameters, e.g.
for triggering monitoring systems, feeding external databases, etc.
\end_layout
\begin_layout Standard
Screener can detect and will automatically manage the following states (in
this example, all state lists are empty):
\end_layout
\begin_layout Standard
\begin_inset listings
inline false
status open
\begin_layout Plain Layout
$common_user> ./screener.sh list
\end_layout
\begin_layout Plain Layout
List of waiting:
\end_layout
\begin_layout Plain Layout
List of delayed:
\end_layout
\begin_layout Plain Layout
List of running:
\end_layout
\begin_layout Plain Layout
List of critical:
\end_layout
\begin_layout Plain Layout
List of serious:
\end_layout
\begin_layout Plain Layout
List of failed:
\end_layout
\begin_layout Plain Layout
List of done:
\end_layout
\end_inset
\end_layout
\begin_layout Standard
\noindent
Screener can discriminate the
\emph on
seriosity
\emph default
of errors as follows:
\end_layout
\begin_layout Description
\family typewriter
failed
\family default
An error occurred
\emph on
outside
\emph default
of critical sections, e.g.
during preparation of LV space etc.
During ordinary operations, VMs / containers are usually running continuously,
and there is no customer impact to be expected.
Typically,
\family typewriter
./screener.sh restart $resource
\family default
should fix the problem if it is only a temporary problem.
However, for maximum safety, manual inspection via .
\family typewriter
/screener.sh attach $resource
\family default
or inspection of the logfile via .
\family typewriter
/screener.sh show $resource
\family default
is recommended before trying an automatic restart.
\end_layout
\begin_layout Description
\family typewriter
serious
\family default
An error occured while a VM / container was temporarily stopped, which
\series bold
would
\series default
normally lead to customer downtime, but Football was able to
\emph on
compensate
\emph default
the problem
\emph on
for now
\emph default
by
\emph on
automatically
\emph default
restarting the VM.
Thus no long-lasting customer impact has likely occurred.
However, manual inspection and repair by sysadmins is likely necessary.
\end_layout
\begin_layout Description
\family typewriter
critical
\family default
An
\emph on
uncompensated
\emph default
error occured during customer downtime.
The VM / container is likely down.
This will need manual sysadmin actions ASAP, such as hardware replacement,
networking fixes, etc.
\end_layout
\begin_layout Standard
\noindent
Ordinary Screener states during execution:
\end_layout
\begin_layout Description
\family typewriter
running
\family default
This means that a (background) process is currently running.
You can attach to the screen session either manually via
\family typewriter
screen -x $pid.$resource
\family default
, or more comfortably via
\family typewriter
./screener.sh attach $resource
\family default
.
Then you can use
\family typewriter
screen
\family default
as documented in
\family typewriter
man screen
\family default
.
The most important operation is detaching via keystrokes
\family typewriter
Ctrl-a d
\family default
.
\begin_inset Newline newline
\end_inset
\begin_inset Graphics
filename images/MatieresCorrosives.png
lyxscale 50
scale 17
\end_inset
Notice: don't press
\family typewriter
Ctrl-c
\family default
unless you know what you are doing.
In most cases, this will terminate the running process, and in consequence
lead to
\family typewriter
\series bold
failed
\family default
\series default
or even
\family typewriter
\series bold
critical
\family default
\series default
state (depending on the moment of keypress).
Depending on parameter
\family typewriter
drop_shell
\family default
, the Screener session will also terminate, or you will get an interactive
shell for manual repair.
\end_layout
\begin_layout Description
\family typewriter
waiting
\family default
When the plugins
\family typewriter
football-waiting
\family default
and
\family typewriter
screener-waiting
\family default
are configured properly (which is
\emph on
not
\emph default
the default), the script execution will pause immediately before a customer
downtime action would be started.
Now any sysadmin from the larger group has a chance to
\family typewriter
./screener attach $resource
\family default
and to press RETURN to continue the waiting script and to personally watch
the course of the critical section.
There are some more comfortable variants like
\family typewriter
./screener continue $resource
\family default
for background continuation of a single session, or
\family typewriter
./screener continue 100
\family default
which can be used for continuing masses of waiting sessions.
There are further variants which are automatically attaching to sessions,
see Appendix
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:screenerhelp"
\end_inset
.
\end_layout
\begin_layout Description
\family typewriter
delayed
\family default
This state is only entered before
\family typewriter
lvremove $resource
\family default
is executed (which will destroy your old internal backup copy), and when
configured appropriately.
Typically, you also need to configure the
\family typewriter
$wait_before_cleanup
\family default
variable in order to avoid endless waiting.
Notice that old LV data gets soon outdated after a while, so please don't
unnecessarily prolong the running time of your scripts by choosing too
long
\family typewriter
$wait_before_cleanup
\family default
values.
\end_layout
\begin_layout Description
\family typewriter
done
\family default
This means that the script reported successful execution by exit status
\family typewriter
0
\family default
.
The background screen session terminated automatically.
You can inspect the logfile manually via
\family typewriter
./screener.sh show $resource
\family default
, or by looking into the directory
\family typewriter
$screener_logdir/done/
\family default
.
\end_layout
\begin_layout Standard
\begin_inset Graphics
filename images/lightbulb_brightlit_benj_.png
lyxscale 12
scale 7
\end_inset
Logfiles of other states can also be inspected (or monitored by standard
tools like
\family typewriter
grep
\family default
) by looking into sister directories, such as
\family typewriter
$screener_logdir/running/
\family default
.
\end_layout
\begin_layout Standard
\begin_inset Graphics
filename images/MatieresCorrosives.png
lyxscale 50
scale 17
\end_inset
When running Screener for several months or years, old logfiles will accumulate
in these directories over time.
Call
\family typewriter
./screener.sh purge
\family default
or
\family typewriter
./screener.sh cron
\family default
regularly via a cron job, or archieve your old logfiles from time to time
via another method.
\end_layout
\begin_layout Chapter
MARS for Developers
\end_layout