doc: appendix --help commands

2025-04-01 22:58:34 +00:00 · 2018-04-23 15:25:32 +02:00 · 2018-04-23 15:25:32 +02:00 · c059daa99d
commit c059daa99d
parent 63be004b9e
7 changed files with 2056 additions and 0 deletions
--- a/docu/football-verbose.help
+++ b/docu/football-verbose.help
@ -0,0 +1,574 @@
+\begin{verbatim}
+verbose=1
+Usage:
+  ./football.sh --help [--verbose]
+     Show help
+  ./football.sh --variable=<value>
+     Override any shell variable
+
+Actions for resource migration:
+
+  ./football.sh migrate         <resource> <target_primary> [<target_secondary>]
+     Run the sequence
+     migrate_prepare ; migrate_wait ; migrate_finish; migrate_cleanup.
+
+  ./football.sh migrate_prepare <resource> <target_primary> [<target_secondary>]
+     Allocate LVM space at the targets and start MARS replication.
+
+  ./football.sh migrate_wait    <resource> <target_primary> [<target_secondary>]
+     Wait until MARS replication reports UpToDate.
+
+  ./football.sh migrate_finish  <resource> <target_primary> [<target_secondary>]
+     Call hooks for handover to the targets.
+
+  ./football.sh migrate_cleanup <resource>
+     Remove old / currently unused LV replicas from MARS and deallocate
+     from LVM.
+
+Actions for (manual) repair in emergency situations:
+
+  ./football.sh manual_migrate_config  <resource> <target_primary> [<target_secondary>]
+     Transfer only the cluster config, without changing the MARS replicas.
+     This does no resource stopping / restarting.
+     Useful for reverting a failed migration.
+
+  ./football.sh manual_config_update <hostname>
+     Only update the cluster config, without changing anything else.
+     Useful for manual repair of failed migration.
+
+  ./football.sh manual_merge_cluster <hostname1> <hostname2>
+     Run "marsadm merge-cluster" for the given hosts.
+     Hostnames must be from different (former) clusters.
+
+  ./football.sh manual_split_cluster <hostname_list>
+     Run "marsadm split-cluster" at the given hosts.
+     Useful for fixing failed / asymmetric splits.
+     Hint: provide _all_ hostnames which have formerly participated
+     in the cluster.
+
+  ./football.sh repair_vm <resource> <primary_candidate_list>
+     Try to restart the VM <resource> on one of the given machines.
+     Useful during unexpected customer downtime.
+
+  ./football.sh repair_mars <resource> <primary_candidate_list>
+     Before restarting the VM like in repair_vm, try to find a local
+     LV where a stand-alone MARS resource can be found and built up.
+     Use this only when the MARS resources are gone, and when you are
+     desperate. Problem: this will likely create a MARS setup which is
+     not usable for production, and therefore must be corrected later
+     by hand. Use this only during an emergency situation in order to
+     get the customers online again, while buying the downsides of this
+     command.
+
+Actions for inplace FS shrinking:
+
+  ./football.sh shrink          <resource> <percent>
+     Run the sequence shrink_prepare ; shrink_finish ; shrink_cleanup.
+
+  ./football.sh shrink_prepare  <resource> [<percent>]
+     Allocate temporary LVM space (when possible) and create initial
+     raw FS copy.
+     Default percent value(when left out) is 85.
+
+  ./football.sh shrink_finish   <resource>
+     Incrementally update the FS copy, swap old <=> new copy with
+     small downtime.
+
+  ./football.sh shrink_cleanup  <resource>
+     Remove old FS copy from LVM.
+
+Actions for inplace FS extension:
+
+  ./football.sh extend          <resource> <percent>
+
+Combined actions:
+
+  ./football.sh migrate+shrink <resource> <target_primary> [<target_secondary>] [<percent>]
+     Similar to migrate ; shrink but produces less network traffic.
+     Default percent value (when left out) is 85.
+
+  ./football.sh migrate+shrink+back <resource> <tmp_primary> [<percent>]
+     Migrate temporarily to <tmp_primary>, then shrink there,
+     finally migrate back to old primary and secondaries.
+     Default percent value (when left out) is 85.
+
+Global maintenance:
+
+  ./football.sh lv_cleanup      <resource>
+
+General features:
+
+  - Instead of <percent>, an absolute amount of storage with suffix
+    'k' or 'm' or 'g' can be given.
+
+  - When <resource> is currently stopped, login to the container is
+    not possible, and in turn the hypervisor node and primary storage node
+    cannot be automatically determined. In such a case, the missing
+    nodes can be specified via the syntax
+        <resource>:<hypervisor>:<primary_storage>
+
+  - The following LV suffixes are used (naming convention):
+    -tmp = currently emerging version for shrinking
+    -preshrink = old version before shrinking took place
+
+  - By adding the option --screener, you can handover football execution
+    to ./screener.sh .
+    When some --enable_*_waiting is also added, then the critical
+    sections involving customer downtime are temporarily halted until
+    some sysadmins says "screener.sh continue $resource" or
+    attaches to the sessions and presses the RETURN key.
+
+  ## football_includes
+  # List of directories where football-*.conf files can be found.
+  football_includes="${football_includes:-/usr/lib/mars/plugins /etc/mars/plugins $script_dir/plugins $HOME/.mars/plugins ./plugins}"
+
+  ## dry_run
+  # When set, actions are only simulated.
+  dry_run=${dry_run:-0}
+
+  ## verbose
+  # increase speakiness.
+  verbose=${verbose:-0}
+
+  ## confirm
+  # Only for debugging: manually started operations can be
+  # manually checked and confirmed before actually starting opersions.
+  confirm=${confirm:-1}
+
+  ## force
+  # Normally, shrinking and extending will only be started if there
+  # is something to do.
+  # Enable this for debugging and testing: the check is then skipped.
+  force=${force:-0}
+
+  ## debug_injection_point
+  # RTFS don't set this unless you are a developer knowing what you are doing.
+  debug_injection_point="${debug_injection_point:-0}"
+
+  ## football_logdir
+  # Where the logfiles should be created.
+  # HINT: after playing Football in masses for a whiile, your $logdir will
+  # be easily populated with hundreds or thousands of logfiles.
+  # Set this to your convenience.
+  football_logdir="${football_logdir:-${logdir:-$HOME/football-logs}}"
+
+  ## screener
+  # When enabled, handover execution to the screener.
+  # Very useful for running Football in masses.
+  screener="${screener:-0}"
+
+  ## min_space
+  # When testing / debugging with extremely small LVs, it may happen
+  # that mkfs refuses to create extemely small filesystems.
+  # Use this to ensure a minimum size.
+  min_space="${min_space:-20000000}"
+
+  ## cache_repeat_lapse
+  # When using the waiting capabilities of screener, and when waits
+  # are lasting very long, your dentry cache may become cold.
+  # Use this for repeated refreshes of the dentry cache after some time.
+  cache_repeat_lapse="${cache_repeat_lapse:-120}" # Minutes
+
+  ## ssh_opt
+  # Useful for customization to your ssh environment.
+  ssh_opt="${ssh_opt:--4 -A -o StrictHostKeyChecking=no -o ForwardX11=no -o KbdInteractiveAuthentication=no -o VerifyHostKeyDNS=no}"
+
+  ## rsync_opt
+  # The rsync options in general.
+  # IMPORTANT: some intermediate progress report is absolutely needed,
+  # because otherwise a false-positive TIMEOUT may be assumed when
+  # no output is generated for several hours.
+  rsync_opt="${rsync_opt:- -aSH --info=progress2,STATS}"
+
+  ## rsync_opt_prepare
+  # Additional rsync options for preparation and updating
+  # of the temporary shrink mirror filesystem.
+  rsync_opt_prepare="${rsync_opt_prepare:---exclude='.filemon2' --delete}"
+
+  ## rsync_nice
+  # Typically, the preparation steps are run with background priority.
+  rsync_nice="${rsync_nice:-nice -19}"
+
+  ## rsync_repeat_prepare and rsync_repeat_hot
+  # Tuning: increases the reliability of rsync and ensures that the dentry cache
+  # remains hot.
+  rsync_repeat_prepare="${rsync_repeat_prepare:-5}"
+  rsync_repeat_hot="${rsync_repeat_hot:-3}"
+
+  ## wait_timeout
+  # Avoid infinite loops upon waiting.
+  wait_timeout="${wait_timeout:-$(( 24 * 60 ))}" # Minutes
+
+  ## lvremove_opt
+  # Some LVM versions are requiring this for unattended batch operations.
+  lvremove_opt="${lvremove_opt:--f}"
+
+  ## critical_status
+  # This is the "magic" exit code indicating _criticality_
+  # of a failed command.
+  critical_status="${critical_status:-199}"
+
+  ## serious_status
+  # This is the "magic" exit code indicating _seriosity_
+  # of a failed command.
+  serious_status="${serious_status:-198}"
+
+  ## pre_hand or --pre-hand=
+  # Set this to do an ordinary to a new start position before doing
+  # anything else. This may be used for handover to a different datacenter
+  # and running Football there.
+  pre_hand="${pre_hand:-}"
+
+  ## startup_when_locked
+  # When == 0:
+  #  Don't abort and don't wait when a lock is detected at startup.
+  # When == 1 and when enable_startup_waiting=1:
+  #  Wait until the lock is gone.
+  # When == 2:
+  #  Abort start of script execution when a lock is detected.
+  #  Later, when a locks are set _during_ execution, they will
+  #  be obeyed when enable_*_waiting is set (instead), and will
+  #  lead to waits instead of aborts.
+  startup_when_locked="${startup_when_locked:-1}"
+
+  ## user_name
+  # Normally automatically derived from ssh agent or from $LOGNAME.
+  # Please override this only when really necessary.
+  export user_name="${user_name:-$(get_real_ssh_user)}"
+  export user_name="${user_name:-$LOGNAME}"
+
+
+PLUGIN football-cm3
+
+   1&1 specfic plugin for dealing with the cm3 cluster manager
+   and its concrete operating enviroment (singleton instance).
+
+   Current maximum cluster size limit: 
+
+   Maximum #syncs running before migration can start: 
+
+   Following marsadm --version must be installed: 
+
+   Following mars kernel modules must be loaded: 
+
+  ## enable_cm3
+  # ShaHoLin-specifc plugin for working with the infong platform
+  # (istore, icpu, infong) via 1&1-specific clustermanager cm3
+  # and related toolsets. Much of it is bound to a singleton database
+  # instance (clustermw & siblings).
+  enable_cm3="${enable_cm3:-$(if [[ "$0" =~ tetris ]]; then echo 1; else echo 0; fi)}"
+
+  ## skip_resource_ping
+  # Enable this only for testing. Normally, a resource name denotes a
+  # container name == machine name which must be runnuing as a precondition,
+  # und thus must be pingable over network.
+  skip_resource_ping="${skip_resource_ping:-0}"
+
+  ## date_lock
+  # Don't enter critical sections at certain days of the week,
+  # and/or during certain hours.
+  # This is a regex matching against "date +%u_%H"
+  date_lock="${date_lock:-}"
+
+  ## workaround_firewall
+  # Documentation of technical debt for later generations:
+  # This is needed since July 2017. In the many years before, no firewalling
+  # was effective at the replication network, because it is a physically
+  # separate network from the rest of the networking infrastructure.
+  # An attacker would first need to gain root access to the _hypervisor_
+  # (not only to the LXC container and/or to KVM) before gaining access to
+  # those physical replication network interfaces.
+  # Since about that time, which is about the same time when the requirements
+  # for Container Football had been communicated, somebody introduced some
+  # unnecessary firewall rules, based on "security arguments".
+  # These arguments were however explicitly _not_ required by the _real_
+  # security responsible person, and explicitly _not_ recommended by him.
+  # Now the problem is that it is almost politically impossible to get
+  # rid of suchalike "security feature".
+  # Until the problem is resolved, Container Football requires
+  # the _entire_ local firewall to be _temporarily_ shut down in order to
+  # allow marsadm commands over ssh to work.
+  # Notice: this is _not_ increasing the general security in any way.
+  # LONGTERM solution / TODO: future versions of mars should no longer
+  # depend on ssh.
+  # Then this "feature" can be turned off.
+  workaround_firewall="${workaround_firewall:-1}"
+
+  ## ip_magic
+  # Similarly to workaround_firewall, this is needed since somebody
+  # introduced additional firewall rules also disabling sysadmin ssh
+  # connections at the _ordinary_ sysadmin network.
+  ip_magic="${ip_magic:-1}"
+
+  ## do_split_cluster
+  # The current MARS branch 0.1a.y is not yet constructed for forming
+  # a BigCluster constisting of several thousands of machines.
+  # When a future version of mars0.1b.y (or 0.2.y) will allow this,
+  # this can be disabled.
+  do_split_cluster="${do_split_cluster:-1}"
+
+  ## clustertool_host
+  # URL prefix of the internal configuation database REST interface.
+  clustertool_host="${clustertool_host:-http://clustermw:3042}"
+
+  ## clustertool_user
+  # Username for clustertool access.
+  # By default, scans for a *.password file (see next option).
+  clustertool_user="${clustertool_user:-$(shopt -u nullglob; ls *.password | head -1 | cut -d. -f1)}" ||    echo "cannot find a password file *.password for clustermw: you MUST supply the credentials via default curl config files (see man page)"
+
+  ## clustertool_passwd
+  # Here you can supply the encrpted password.
+  # By default, a file $clustertool_user.password is used
+  # containing the encrypted password.
+  clustertool_passwd="${clustertool_passwd:-$([[ -r $clustertool_user.password ]] && cat $clustertool_user.password)}"
+
+  ## do_migrate
+  # Keep this enabled. Only disable for testing.
+  do_migrate="${do_migrate:-1}" # must be enabled; disable for dry-run testing
+
+  ## always_migrate
+  # Only use for testing, or for special situation.
+  # This skip the test whether the resource has already migration.
+  always_migrate="${always_migrate:-0}" # only enable for testing
+
+  ## check_segments
+  # 0 = disabled
+  # 1 = only display the segment names
+  # 2 = check for equality
+  # WORKAROUND, potentially harmful when used inadequately.
+  # The historical physical segment borders need to be removed for
+  # Container Football.
+  # Unfortunately, the subproject aiming to accomplish this did not
+  # proceed for one year now. In the meantime, Container Football can
+  # be only played within the ancient segment borders.
+  # After this big impediment is eventually resolved, this option
+  # should be switched off.
+  check_segments="${check_segments:-1}"
+
+  ## backup_dir
+  # Directory for keeping JSON backups of clustermw.
+  backup_dir="${backup_dir:-.}"
+
+  ## enable_mod_deflate
+  # Internal, for support.
+  enable_mod_deflate="${enable_mod_deflate:-1}"
+
+  ## enable_segment_move
+  # Seems to be needed by some other tooling.
+  enable_segment_move="${enable_segment_move:-1}"
+
+  ## override_hwclass_id
+  # When necessary, override this from $include_dir/plugins/*.conf
+  override_hwclass_id="${override_hwclass_id:-25007}"
+
+  ## override_hvt_id
+  # When necessary, override this from $include_dir/plugins/*.conf
+  override_hvt_id="${override_hvt_id:-8059}"
+
+  ## iqn_base and iet_type and iscsi_eth and iscsi_tid
+  # Workaround: this is needed for _dynamic_ generation of iSCSI sessions
+  # bypassing the ordinary ones as automatically generated by the
+  # cm3 cluster manager (only at the old istore architecture).
+  # Notice: not needed for regular operations, only for testing.
+  # Normally, you dont want to shrink over a _shared_ 1MBit iSCSI line.
+  iqn_base="${iqn_base:-iqn.2000-01.info.test:test}"
+  iet_type="${iet_type:-blockio}"
+  iscsi_eth="${iscsi_eth:-eth1}"
+  iscsi_tid="${iscsi_tid:-4711}"
+
+  ## monitis_downtime_script
+  # ShaHoLin-internal
+  monitis_downtime_script="${monitis_downtime_script:-}"
+
+  ## monitis_downtime_duration
+  # ShaHoLin-internal
+  monitis_downtime_duration="${monitis_downtime_duration:-20}" # Minutes
+
+  ## shaholin_finished_log
+  # ShaHoLin-specific logfile, reporting _only_ successful completion
+  # of an action.
+  shaholin_finished_log="${shaholin_finished_log:-$football_logdir/shaholin-finished.log}"
+
+  ## ticket
+  # OPTIONAL: the meaning is ShaHoLin specific.
+  # This can be used for updating JIRA tickets.
+  # Can be set on the command line like "./tetris.sh $args --ticket=TECCM-4711
+  ticket="${ticket:-}"
+
+  ## ticket_get_cmd
+  # Optional: when set, this script can be used for retrieving ticket IDs
+  # in place of commandline option --ticket=
+  ticket_get_cmd="${ticket_get_cmd:-}"
+
+  ## ticket_update_cmd
+  # This can be used for calling an external command which updates
+  # the ticket(s) given by the $ticket parameter.
+  ticket_update_cmd="${ticket_update_cmd:-}"
+
+  ## shaholin_action
+  # OPTIONAL: specific action script with parameters.
+  shaholin_action="${shaholin_action:-}"
+
+
+PLUGIN football-basic
+
+   Generic driver for systemd-controlled MARS pools.
+   The current version supports only a flat model:
+   (1) There is a single "big cluster" at metadata level.
+       All cluster members are joined via merge-cluster.
+       All occurring names need to be globally unique.
+   (2) The network uses BGP or other means, thus any hypervisor
+       can (potentially) start any VM at any time.
+   (3) iSCSI or remote devices are not supported for now
+       (LocalSharding model). This may be extended in a future
+       release.
+   This plugin is exclusive-or with cm3.
+
+Plugin specific actions:
+
+   ./football.sh basic_add_host <hostname>
+      Manually add another host to the hostname cache.
+
+  ## pool_cache_dir
+  # Directory for caching the pool status.
+  pool_cache_dir="${pool_cache_dir:-$script_dir/pool-cache}"
+
+  ## initial_hostname_file
+  # This file must contain a list of storage and/or hypervisor hostnames
+  # where a /mars directory must exist.
+  # These hosts are then scanned for further cluster members,
+  # and the transitive closure of all host names is computed.
+  initial_hostname_file="${initial_hostname_file:-./hostnames.input}"
+
+  ## hostname_cache
+  # This file contains the transitive closure of all host names.
+  hostname_cache="${hostname_cache:-$pool_cache_dir/hostnames.cache}"
+
+  ## resources_cache
+  # This file contains the transitive closure of all resource names.
+  resources_cache="${resources_cache:-$pool_cache_dir/resources.cache}"
+
+  ## res2hyper_cache
+  # This file contains the association between resources and hypervisors.
+  res2hyper_cache="${res2hyper_cache:-$pool_cache_dir/res2hyper.assoc}"
+
+  ## enable_basic
+  # This plugin is exclusive-or with cm3.
+  enable_basic="${enable_basic:-$(if [[ "$0" =~ football ]]; then echo 1; else echo 0; fi)}"
+
+  ## ssh_port
+  # Set this for separating sysadmin access from customer access
+  ssh_port="${ssh_port:-}"
+
+  ## basic_mnt_dir
+  # Names the mountpoint directory at hypervisors.
+  # This must co-incide with the systemd mountpoints.
+  basic_mnt_dir="${basic_mnt_dir:-/mnt}"
+
+
+PLUGIN football-motd
+
+  Generic plugin for motd. Communicate that Football is running
+  at login via motd.
+
+  ## enable_motd
+  # whether to use the motd plugin.
+  enable_motd="${enable_motd:-0}"
+
+  ## update_motd_cmd
+  # Distro-specific command for generating motd from several sources.
+  # Only tested for Debian Jessie at the moment.
+  update_motd_cmd="${update_motd_cmd:-update-motd}"
+
+  ## download_motd_script and motd_script_dir
+  # When no script has been installed into /etc/update-motd.d/
+  # you can do it dynamically here, bypassing any "official" deployment
+  # methods. Use this only for testing!
+  # An example script (which should be deployed via your ordinary methods)
+  # can be found under $script_dir/update-motd.d/67-football-running
+  download_motd_script="${download_motd_script:-}"
+  motd_script_dir="${motd_script_dir:-/etc/update-motd.d}"
+
+  ## motd_file
+  # This will contain the reported motd message.
+  # It is created by this plugin.
+  motd_file="${motd_file:-/var/motd/football.txt}"
+
+  ## motd_color_on and motd_color_off
+  # ANSI escape sequences for coloring the generated motd message.
+  motd_color_on="${motd_color_on:-\\033[31m}"
+  motd_color_off="${motd_color_off:-\\033[0m}"
+
+
+PLUGIN football-report
+
+  Generic plugin for communication of reports.
+
+  ## report_cmd_{start,warning,failed,finished}
+  # External command which is called at start / failure / finish
+  # of Football.
+  # The following variables can be used (e.g. as parameters) when
+  # escaped with a backslash:
+  #  $res              = name of the resource (LV, container, etc)
+  #  $primary          = the current (old)
+  #  $secondary_list   = list of current (old) secondaries
+  #  $target_primary   = the target primary name
+  #  $target_secondary = list of target secondaries
+  #  $operation        = the operation name
+  #  $target_percent   = the value used for shrinking
+  #  $txt              = some informative text from Football
+  #  Further variables are possible by looking at the sourcecode, or by
+  #  defining your own variables or functions externally or via plugins.
+  # Empty = don't do anything
+  report_cmd_start="${report_cmd_start:-}"
+  report_cmd_warning="${report_cmd_warning:-$script_dir/screener.sh notify "$res" warning "$txt"}"
+  report_cmd_failed="${report_cmd_failed:-}"
+  report_cmd_finished="${report_cmd_finished:-}"
+
+
+PLUGIN football-waiting
+
+  Generic plugig, interfacing with screener: when this is used
+  by your script and enabled, then you will be able to wait for
+  "screener.sh continue" operations at certain points in your
+  script.
+
+  ## enable_*_waiting
+  #
+  # When this is enabled, and when Football had been started by screener,
+  # then football will delay the start of several operations until a sysadmin
+  # does one of the following manually:
+  #
+  #  a) ./screener.sh continue $session
+  #  b) ./screener.sh resume $session
+  #  c) ./screener.sh attach $session and press the RETURN key
+  #  d) doing nothing, and $wait_timeout has exceeded
+  #
+  # CONVENTION: football resource names are used as screener session ids.
+  # This ensures that only 1 operation can be started for the same resource,
+  # and it simplifies the handling for junior sysadmins.
+  #
+  enable_startup_waiting="${enable_startup_waiting:-0}"
+  enable_handover_waiting="${enable_handover_waiting:-0}"
+  enable_migrate_waiting="${enable_migrate_waiting:-0}"
+  enable_shrink_waiting="${enable_shrink_waiting:-0}"
+
+  ## enable_cleanup_delayed and wait_before_cleanup
+  # By setting this, you can delay the cleanup operations for some time.
+  # This way, you are keeping the old LV contents as a kind of "backup"
+  # for some limited time.
+  # HINT: dont set to wait_before_cleanuplarge values, because it can
+  # seriously slow down Football.
+  enable_cleanup_delayed="${enable_cleanup_delayed:-0}"
+  wait_before_cleanup="${wait_before_cleanup:-180}" # Minutes
+
+  ## reduce_wait_msg
+  # Instead of reporting the waiting status once per minute,
+  # decrease the frequency of resporting.
+  # Warning: dont increase this too much. Do not exceed
+  # session_timeout/2 from screener. Because of the Nyquist criterion,
+  # stay on the safe side by setting session_timeout at least to _twice_
+  # the time than here.
+  reduce_wait_msg="${reduce_wait_msg:-60}" # Minutes
+
+\end{verbatim}
--- a/docu/football.help
+++ b/docu/football.help
@ -0,0 +1,173 @@
+\begin{verbatim}
+Usage:
+  ./football.sh --help [--verbose]
+     Show help
+  ./football.sh --variable=<value>
+     Override any shell variable
+
+Actions for resource migration:
+
+  ./football.sh migrate         <resource> <target_primary> [<target_secondary>]
+     Run the sequence
+     migrate_prepare ; migrate_wait ; migrate_finish; migrate_cleanup.
+
+  ./football.sh migrate_prepare <resource> <target_primary> [<target_secondary>]
+     Allocate LVM space at the targets and start MARS replication.
+
+  ./football.sh migrate_wait    <resource> <target_primary> [<target_secondary>]
+     Wait until MARS replication reports UpToDate.
+
+  ./football.sh migrate_finish  <resource> <target_primary> [<target_secondary>]
+     Call hooks for handover to the targets.
+
+  ./football.sh migrate_cleanup <resource>
+     Remove old / currently unused LV replicas from MARS and deallocate
+     from LVM.
+
+Actions for (manual) repair in emergency situations:
+
+  ./football.sh manual_migrate_config  <resource> <target_primary> [<target_secondary>]
+     Transfer only the cluster config, without changing the MARS replicas.
+     This does no resource stopping / restarting.
+     Useful for reverting a failed migration.
+
+  ./football.sh manual_config_update <hostname>
+     Only update the cluster config, without changing anything else.
+     Useful for manual repair of failed migration.
+
+  ./football.sh manual_merge_cluster <hostname1> <hostname2>
+     Run "marsadm merge-cluster" for the given hosts.
+     Hostnames must be from different (former) clusters.
+
+  ./football.sh manual_split_cluster <hostname_list>
+     Run "marsadm split-cluster" at the given hosts.
+     Useful for fixing failed / asymmetric splits.
+     Hint: provide _all_ hostnames which have formerly participated
+     in the cluster.
+
+  ./football.sh repair_vm <resource> <primary_candidate_list>
+     Try to restart the VM <resource> on one of the given machines.
+     Useful during unexpected customer downtime.
+
+  ./football.sh repair_mars <resource> <primary_candidate_list>
+     Before restarting the VM like in repair_vm, try to find a local
+     LV where a stand-alone MARS resource can be found and built up.
+     Use this only when the MARS resources are gone, and when you are
+     desperate. Problem: this will likely create a MARS setup which is
+     not usable for production, and therefore must be corrected later
+     by hand. Use this only during an emergency situation in order to
+     get the customers online again, while buying the downsides of this
+     command.
+
+Actions for inplace FS shrinking:
+
+  ./football.sh shrink          <resource> <percent>
+     Run the sequence shrink_prepare ; shrink_finish ; shrink_cleanup.
+
+  ./football.sh shrink_prepare  <resource> [<percent>]
+     Allocate temporary LVM space (when possible) and create initial
+     raw FS copy.
+     Default percent value(when left out) is 85.
+
+  ./football.sh shrink_finish   <resource>
+     Incrementally update the FS copy, swap old <=> new copy with
+     small downtime.
+
+  ./football.sh shrink_cleanup  <resource>
+     Remove old FS copy from LVM.
+
+Actions for inplace FS extension:
+
+  ./football.sh extend          <resource> <percent>
+
+Combined actions:
+
+  ./football.sh migrate+shrink <resource> <target_primary> [<target_secondary>] [<percent>]
+     Similar to migrate ; shrink but produces less network traffic.
+     Default percent value (when left out) is 85.
+
+  ./football.sh migrate+shrink+back <resource> <tmp_primary> [<percent>]
+     Migrate temporarily to <tmp_primary>, then shrink there,
+     finally migrate back to old primary and secondaries.
+     Default percent value (when left out) is 85.
+
+Global maintenance:
+
+  ./football.sh lv_cleanup      <resource>
+
+General features:
+
+  - Instead of <percent>, an absolute amount of storage with suffix
+    'k' or 'm' or 'g' can be given.
+
+  - When <resource> is currently stopped, login to the container is
+    not possible, and in turn the hypervisor node and primary storage node
+    cannot be automatically determined. In such a case, the missing
+    nodes can be specified via the syntax
+        <resource>:<hypervisor>:<primary_storage>
+
+  - The following LV suffixes are used (naming convention):
+    -tmp = currently emerging version for shrinking
+    -preshrink = old version before shrinking took place
+
+  - By adding the option --screener, you can handover football execution
+    to ./screener.sh .
+    When some --enable_*_waiting is also added, then the critical
+    sections involving customer downtime are temporarily halted until
+    some sysadmins says "screener.sh continue $resource" or
+    attaches to the sessions and presses the RETURN key.
+
+
+PLUGIN football-cm3
+
+   1&1 specfic plugin for dealing with the cm3 cluster manager
+   and its concrete operating enviroment (singleton instance).
+
+   Current maximum cluster size limit: 
+
+   Maximum #syncs running before migration can start: 
+
+   Following marsadm --version must be installed: 
+
+   Following mars kernel modules must be loaded: 
+
+
+PLUGIN football-basic
+
+   Generic driver for systemd-controlled MARS pools.
+   The current version supports only a flat model:
+   (1) There is a single "big cluster" at metadata level.
+       All cluster members are joined via merge-cluster.
+       All occurring names need to be globally unique.
+   (2) The network uses BGP or other means, thus any hypervisor
+       can (potentially) start any VM at any time.
+   (3) iSCSI or remote devices are not supported for now
+       (LocalSharding model). This may be extended in a future
+       release.
+   This plugin is exclusive-or with cm3.
+
+Plugin specific actions:
+
+   ./football.sh basic_add_host <hostname>
+      Manually add another host to the hostname cache.
+
+
+PLUGIN football-motd
+
+  Generic plugin for motd. Communicate that Football is running
+  at login via motd.
+
+
+PLUGIN football-report
+
+  Generic plugin for communication of reports.
+
+
+PLUGIN football-waiting
+
+  Generic plugig, interfacing with screener: when this is used
+  by your script and enabled, then you will be able to wait for
+  "screener.sh continue" operations at certain points in your
+  script.
+
+\end{verbatim}
--- a/docu/make-help.sh
+++ b/docu/make-help.sh
@ -0,0 +1,18 @@
+#!/bin/bash
+
+football_dir="${football_dir:-../football}"
+
+function make_latex_include
+{
+    local cmd="$1"
+
+    echo '\begin{verbatim}'
+    eval "$cmd" | sed 's/\\/\\\\/g'
+    echo '\end{verbatim}'
+}
+
+make_latex_include "../userspace/marsadm --help" > marsadm.help
+make_latex_include "(cd $football_dir/ && ./football.sh --help)" > football.help
+make_latex_include "(cd $football_dir/ && ./football.sh --help --verbose)" > football-verbose.help
+make_latex_include "(cd $football_dir/ && ./screener.sh --help)" > screener.help
+make_latex_include "(cd $football_dir/ && ./screener.sh --help --verbose)" > screener-verbose.help
--- a/docu/mars-manual.lyx
+++ b/docu/mars-manual.lyx
@ -39413,6 +39413,13 @@ maximum 100 logfiles per resource

 \begin_layout Chapter
 Handout for Midnight Problem Solving
+\begin_inset CommandInset label
+LatexCommand label
+name "chap:Handout-for-Midnight"
+
+\end_inset
+
+
 \end_layout

 \begin_layout Standard
@ -42012,6 +42019,162 @@ A_{s,p,T}(k,n)=n^{s+1}*T*\sum_{\bar{k}=k}^{k*n}C(k,\bar{k},k*n)*\binom{k*n}{\bar
 \end_inset


+\end_layout
+
+\begin_layout Chapter
+Command Documentation for Userspace Tools
+\begin_inset CommandInset label
+LatexCommand label
+name "chap:Command-Documentation-for"
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Section
+
+\family typewriter
+marsadm --help
+\begin_inset CommandInset label
+LatexCommand label
+name "sec:marsadm-–help"
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+\begin_inset ERT
+status open
+
+\begin_layout Plain Layout
+
+
+\backslash
+input{marsadm.help}
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Section
+
+\family typewriter
+football.sh --help
+\begin_inset CommandInset label
+LatexCommand label
+name "sec:football-–help"
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+\begin_inset ERT
+status open
+
+\begin_layout Plain Layout
+
+
+\backslash
+input{football.help}
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Section
+
+\family typewriter
+football.sh --help --verbose
+\begin_inset CommandInset label
+LatexCommand label
+name "sec:football-help-verbose"
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+\begin_inset ERT
+status open
+
+\begin_layout Plain Layout
+
+
+\backslash
+input{football-verbose.help}
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Section
+
+\family typewriter
+screener.sh --help
+\begin_inset CommandInset label
+LatexCommand label
+name "sec:screener–help"
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+\begin_inset ERT
+status open
+
+\begin_layout Plain Layout
+
+
+\backslash
+input{screener.help}
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Section
+
+\family typewriter
+screener.sh --help --verbose
+\begin_inset CommandInset label
+LatexCommand label
+name "sec:screener-help-verbose"
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+\begin_inset ERT
+status open
+
+\begin_layout Plain Layout
+
+
+\backslash
+input{screener-verbose.help}
+\end_layout
+
+\end_inset
+
+
 \end_layout

 \begin_layout Chapter
--- a/docu/marsadm.help
+++ b/docu/marsadm.help
@ -0,0 +1,570 @@
+\begin{verbatim}
+
+Thorough documentation is in mars-manual.pdf. Please use the PDF manual
+as authoritative reference! Here is only a short summary of the most
+important sub-commands / options:
+
+marsadm [<global_options>] <command> [<resource_name> | all | <args> ]
+marsadm [<global_options>] view[-<macroname>] [<resource_name> | all ]
+
+<global_option> =
+  --force
+    Skip safety checks.
+    Use this only when you really know what you are doing!
+    Warning! This is dangerous! First try --dry-run.
+    Not combinable with 'all'.
+  --dry-run
+    Don't modify the symlink tree, but tell what would be done.
+    Use this before starting potentially harmful actions such as
+    'delete-resource'.
+  --verbose
+    Increase speakyness of some commands.
+  --logger=/path/to/usr/bin/logger
+    Use an alternative syslog messenger.
+    When empty, disable syslogging.
+  --max-deletions=<number>
+    When your network or your firewall rules are defective over a
+    longer time, too many deletion links may accumulate at
+    /mars/todo-global/delete-* and sibling locations.
+    This limit is preventing overflow of the filesystem as well
+    as overloading the worker threads.
+  --thresh-logfiles=<number>
+  --thresh-logsize=<number>
+    Prevention of too many small logfiles when secondaries are not
+    catching up. When more than thresh-logfiles are already present,
+    the next one is only created when the last one has at least
+    size thresh-logsize (in units of GB).
+  --timeout=<seconds>
+    Abort safety checks after timeout with an error.
+    When giving 'all' as resource agument, this works for each
+    resource independently.
+  --window=<seconds>
+    Treat other cluster nodes as healthy when some communcation has
+    occured during the given time window.
+  --threshold=<bytes>
+    Some macros like 'fetch-threshold-reached' use this for determining
+    their sloppyness.
+  --host=<hostname>
+    Act as if the command was running on cluster node <hostname>.
+    Warning! This is dangerous! First try --dry-run
+  --backup-dir=</absolute_path>
+    Only for experts.
+    Used by several special commands like merge-cluster, split-cluster
+    etc for creating backups of important data.
+  --ip=<ip>
+    Override the IP address stored in the symlink tree, as well as
+    the default IP determined from the list of network interfaces.
+    Usually you will need this only at 'create-cluster' or
+    'join-cluster' for resolving ambiguities.
+  --ssh-port=<port_nr>
+    Override the default ssh port (22) for ssh and rsync.
+    Useful for running {join,merge}-cluster on non-standard ssh ports.
+  --ssh-opts="<ssh_commandline_options>"
+    Override the default ssh commandline options. Also used for rsync.
+  --macro=<text>
+    Handy for testing short macro evaluations at the command line.
+
+<command> =
+  attach
+    usage: attach <resource_name>
+    Attaches the local disk (backing block device) to the resource.
+    The disk must have been previously configured at
+    {create,join}-resource.
+    When designated as a primary, /dev/mars/$res will also appear.
+    This does not change the state of {fetch,replay}.
+    For a complete local startup of the resource, use 'marsadm up'.
+
+  cat
+    usage: cat <path>
+    Print internal debug output in human readable form.
+    Numerical timestamps and numerical error codes are replaced
+    by more readable means.
+    Example: marsadm cat /mars/5.total.status
+
+  connect
+    usage: connect <resource_name>
+    See resume-fetch-local.
+
+  connect-global
+    usage: connect-global <resource_name>
+    Like resume-fetch-local, but affects all resource members
+    in the cluster (remotely).
+
+  connect-local
+    usage: connect-local <resource_name>
+    See resume-fetch-local.
+
+  create-cluster
+    usage: create-cluster (no parameters)
+    This must be called exactly once when creating a new cluster.
+    Don't call this again! Use join-cluster on the secondary nodes.
+    Please read the PDF manual for details.
+
+  create-resource
+    usage: create-resource <resource_name> </dev/lv/mydata>
+    (further syntax variants are described in the PDF manual).
+    Create a new resource out of a pre-existing disk (backing
+    block device) /dev/lv/mydata (or similar).
+    The current node will start in primary role, thus
+    /dev/mars/<resource_name> will appear after a short time, initially
+    showing the same contents as the underlying disk /dev/lv/mydata.
+    It is good practice to name the resource <resource_name> and the
+    disk name identical.
+
+  cron
+    usage: cron (no parameters)
+    Do all necessary regular housekeeping tasks.
+    This is equivalent to log-rotate all; sleep 5; log-delete-all all.
+
+  delete-resource
+    usage: delete-resource <resource_name>
+    CAUTION! This is dangerous when the network is somehow
+    interrupted, or when damaged nodes are later re-surrected
+    in any way.
+
+    Precondition: the resource must no longer have any members
+    (see leave-resource).
+    This is only needed when you _insist_ on re-using a damaged
+    resource for re-creating a new one with exactly the same
+    old <resource_name>.
+    HINT: best practice is to not use this, but just create a _new_
+    resource with a new <resource_name> out of your local disks.
+    Please read the PDF manual on potential consequences.
+
+  detach
+    usage: detach <resource_name>
+    Detaches the local disk (backing block device) from the
+    MARS resource.
+    Caution! you may read data from the local disk afterwards,
+    but ensure that no data is written to it!
+    Otherwise, you are likely to produce harmful inconsistencies.
+    When running in primary role, /dev/mars/$res will also disappear.
+    This does not change the state of {fetch,replay}.
+    For a complete local shutdown of the resource, use 'marsadm down'.
+
+  disconnect
+    usage: disconnect <resource_name>
+    See pause-fetch-local.
+
+  disconnect-global
+    usage: disconnect-global <resource_name>
+    Like pause-fetch-local, but affects all resource members
+    in the cluster (remotely).
+
+  disconnect-local
+    usage: disconnect-local <resource_name>
+    See pause-fetch-local.
+
+  down
+    usage: down <resource_name>
+    Shortcut for detach + pause-sync + pause-fetch + pause-replay.
+
+  get-emergency-limit
+    usage: get-emergency-limit <resource_name>
+    Counterpart of set-emergency-limit (per-resource emergency limit)
+
+  get-sync-limit-value
+    usage: get-sync-limit-value (no parameters)
+    For retrieval of the value set by set-sync-limit-value.
+
+  get-systemd-unit
+    usage: get-systemd-unit <resource_name>
+    Show the system units (for start and stop), or empty when unset.
+
+  invalidate
+    usage: invalidate <resource_name>
+    Only useful on a secondary node.
+    Forces MARS to consider the local replica disk as being
+    inconsistent, and therefore starting a fast full-sync from
+    the currently designated primary node (which must exist;
+    therefore avoid the 'secondary' command).
+    This is usually needed for resolving emergency mode.
+    When having k=2 replicas, this can be also used for
+    quick-and-simple split-brain resolution.
+    In other cases, or when the split-brain is not resolved by
+    this command, please use the 'leave-resource' / 'join-resource'
+    method as described in the PDF manual (in the right order as
+    described there).
+
+  join-cluster
+    usage: join-cluster <hostname_of_primary>
+    Establishes a new cluster membership.
+    This must be called once on any new cluster member.
+    This is a prerequisite for join-resource.
+
+  join-resource
+    usage: join-resource <resource_name> </dev/lv/mydata>
+    (further syntax variants are described in the PDF manual).
+    The resource <resource_name> must have been already created on
+    another cluster node, and the network must be healthy.
+    The contents of the local replica disk /dev/lv/mydata will be
+    overwritten by the initial fast full sync from the currently
+    designated primary node.
+    After the initial full sync has finished, the current host will
+    act in secondary role.
+    For details on size constraints etc, refer to the PDF manual.
+
+  leave-cluster
+    usage: leave-cluster (no parameters)
+    This can be used for final deconstruction of a cluster member.
+    Prior to this, all resources must have been left
+    via leave-resource.
+    Notice: this will never destroy the cluster UID on the /mars/
+    filesystem.
+    Please read the PDF manual for details.
+
+  leave-resource
+    usage: leave-resource <resource_name>
+    Precondition: the local host must be in secondary role.
+    Stop being a member of the resource, and thus stop all
+    replication activities. The status of the underlying disk
+    will remain in its current state (whatever it is).
+
+  log-delete
+    usage: log-delete <resource_name>
+    When possible, globally delete all old transaction logfiles which
+    are known to be superflous, i.e. all secondaries no longer need
+    to replay them.
+    This must be regularly called by a cron job or similar, in order
+    to prevent overflow of the /mars/ directory.
+    For regular maintainance cron jobs, please prefer 'marsadm cron'.
+    For details and best practices, please refer to the PDF manual.
+
+  log-delete-all
+    usage: log-delete-all <resource_name>
+    Alias for log-delete
+
+  log-delete-one
+    usage: log-delete-one <resource_name>
+    When possible, globally delete at most one old transaction logfile
+    which is known to be superfluous, i.e. all secondaries no longer
+    need to replay it.
+    Hint: use this only for testing and manual inspection.
+    For regular maintainance cron jobs, please prefer cron
+    or log-delete-all.
+
+  log-purge-all
+    usage: log-purge-all <resource_name>
+    This is potentially dangerous.
+    Use this only if you are really desperate in trying to resolve a
+    split brain. Use this only after reading the PDF manual!
+
+  log-rotate
+    usage: log-rotate <resource_name>
+    Only useful at the primary side.
+    Start writing transaction logs into a new transaction logfile.
+    This should be regularly called by a cron job or similar.
+    For regular maintainance cron jobs, please prefer 'marsadm cron'.
+    For details and best practices, please refer to the PDF manual.
+
+  lowlevel-delete-host
+    usage: lowlevel-delete-host <resource_name>
+    Delete cluster member.
+
+  lowlevel-ls-host-ips
+    usage: lowlevel-ls-host-ips <resource_name>
+    List cluster member names and IP addresses.
+
+  lowlevel-set-host-ip
+    usage: lowlevel-set-host-ip <resource_name>
+    Set IP for host.
+
+  merge-cluster
+    usage: merge-cluster <hostname_of_other_cluster>
+    Precondition: the resource names of both clusters must be disjoint.
+    Create the union of two clusters, consisting of the
+    union of all machines, and the union of all resources.
+    The members of each resource are _not_ changed by this.
+    This is useful for creating a big "virtual LVM cluster" where
+    resources can be almost arbitrarily migrated between machines via
+    later join-resource / leave-resource operations.
+
+  merge-cluster-check
+    usage: merge-cluster-check <hostname_of_other_cluster>
+    Check whether the resources of both clusters are disjoint.
+    Useful for checking in advance whether merge-cluster would be
+    possible.
+
+  merge-cluster-list
+    usage: merge-cluster-list
+    Determine the local list of resources.
+    Useful for checking or analysis of merge-cluster disjointness by hand.
+
+  pause-fetch
+    usage: pause-fetch <resource_name>
+    See pause-fetch-local.
+
+  pause-fetch-global
+    usage: pause-fetch-global <resource_name>
+    Like pause-fetch-local, but affects all resource members
+    in the cluster (remotely).
+
+  pause-fetch-local
+    usage: pause-fetch-local <resource_name>
+    Stop fetching transaction logfiles from the current
+    designated primary.
+    This is independent from any {pause,resume}-replay operations.
+    Only useful on a secondary node.
+
+  pause-replay
+    usage: pause-replay <resource_name>
+    See pause-replay-local.
+
+  pause-replay-global
+    usage: pause-replay-global <resource_name>
+    Like pause-replay-local, but affects all resource members
+    in the cluster (remotely).
+
+  pause-replay-local
+    usage: pause-replay-local <resource_name>
+    Stop replaying transaction logfiles for now.
+    This is independent from any {pause,resume}-fetch operations.
+    This may be used for freezing the state of your replica for some
+    time, if you have enough space on /mars/.
+    Only useful on a secondary node.
+
+  pause-sync
+    usage: pause-sync <resource_name>
+    See pause-sync-local.
+
+  pause-sync-global
+    usage: pause-sync-global <resource_name>
+    Like pause-sync-local, but affects all resource members
+    in the cluster (remotely).
+
+  pause-sync-local
+    usage: pause-sync-local <resource_name>
+    Pause the initial data sync at current stage.
+    This has only an effect if a sync is actually running (i.e.
+    there is something to be actually synced).
+    Don't pause too long, because the local replica will remain
+    inconsistent during the pause.
+    Use this only for limited reduction of system load.
+    Only useful on a secondary node.
+
+  primary
+    usage: primary <resource_name>
+    Promote the resource into primary role.
+    This is necessary for /dev/mars/$res to appear on the local host.
+    Notice: by concept there can be only _one_ designated primary
+    in a cluster at the same time.
+    The role change is automatically distributed to the other nodes
+    in the cluster, provided that the network is healthy.
+    The old primary node will _automatically_ go
+    into secondary role first. This is different from DRBD!
+    With MARS, you don't need an intermediate 'secondary' command
+    for switching roles.
+    It is usually better to _directly_ switch the primary roles
+    between both hosts.
+    When --force is not given, a planned handover is started:
+    the local host will only become actually primary _after_ the
+    old primary is gone, and all old transaction logs have been
+    fetched and replayed at the new designated priamry.
+    When --force is given, no handover is attempted. A a consequence,
+    a split brain situation is likely to emerge.
+    Thus, use --force only after an ordinary handover attempt has
+    failed, and when you don't care about the split brain.
+    For more details, please refer to the PDF manual.
+
+  resize
+    usage: resize <resource_name>
+    Prerequisite: all underlying disks (usually /dev/vg/$res) must
+    have been already increased, e.g. at the LVM layer (cf. lvresize).
+    Causes MARS to re-examine all sizing constraints on all members of
+    the resource, and increase the global logical size of the resource
+    accordingly.
+    Shrinking is currently not yet implemented.
+    When successful, /dev/mars/$res at the primary will be increased
+    in size. In addition, all secondaries will start an incremental
+    fast full-sync to get the enlarged parts from the primary.
+
+  resume-fetch
+    usage: resume-fetch <resource_name>
+    See resume-fetch-local.
+
+  resume-fetch-global
+    usage: resume-fetch-global <resource_name>
+    Like resume-fetch-local, but affects all resource members
+    in the cluster (remotely).
+
+  resume-fetch-local
+    usage: resume-fetch-local <resource_name>
+    Start fetching transaction logfiles from the current
+    designated primary node, if there is one.
+    This is independent from any {pause,resume}-replay operations.
+    Only useful on a secondary node.
+
+  resume-replay
+    usage: resume-replay <resource_name>
+    See resume-replay-local.
+
+  resume-replay-global
+    usage: resume-replay-global <resource_name>
+    Like resume-replay-local, but affects all resource members
+    in the cluster (remotely).
+
+  resume-replay-local
+    usage: resume-replay-local <resource_name>
+    Restart replaying transaction logfiles, when there is some
+    data left.
+    This is independent from any {pause,resume}-fetch operations.
+    This should be used for unfreezing the state of your local replica.
+    Only useful on a secondary node.
+
+  resume-sync
+    usage: resume-sync <resource_name>
+    See resume-sync-local.
+
+  resume-sync-global
+    usage: resume-sync-global <resource_name>
+    Like resume-sync-local, but affects all resource members
+    in the cluster (remotely).
+
+  resume-sync-local
+    usage: resume-sync-local <resource_name>
+    Resume any initial / incremental data sync at the stage where it
+    had been interrupted by pause-sync.
+    Only useful on a secondary node.
+
+  secondary
+    usage: secondary <resource_name>
+    Promote all cluster members into secondary role, globally.
+    In contrast to DRBD, this is not needed as an intermediate step
+    for planned handover between an old and a new primary node.
+    The only reasonable usage is before the last leave-resource of the
+    last cluster member, immediately before leave-cluster is executed
+    for final deconstruction of the cluster.
+    In all other cases, please prefer 'primary' for direct handover
+    between cluster nodes.
+    Notice: 'secondary' sets the global designated primary node
+    to '(none)' which in turn prevents the execution of 'invalidate'
+    or 'join-resource' or 'resize' anywhere in the cluster.
+    Therefore, don't unnecessarily give 'secondary'!
+
+  set-emergency-limit
+    usage: set-emergency-limit <resource_name> <value>
+    Set a per-resource emergency limit for disk space in /mars.
+    See PDF manual for details.
+
+  set-sync-limit-value
+    usage: set-sync-limit-value <new_value>
+    Set the maximum number of resources which should by syncing
+    concurrently.
+
+  set-systemd-unit
+    usage: set-systemd-unit <resource_name> <start_unit_name> [<stop_unit_name>]
+    This activates the systemd template engine of marsadm.
+    Please read mars-manual.pdf on this.
+    When <stop_unit_name> is omitted, it will be treated equal to
+    <start_unit_name>.
+
+  split-cluster
+    usage: split-cluster (no parameters)
+    NOT OFFICIALLY SUPPORTED - ONLY FOR EXPERTS.
+    RTFS = Read The Fucking Sourcecode.
+    Use this only if you know what you are doing.
+
+  up
+    usage: up <resource_name>
+    Shortcut for attach + resume-sync + resume-fetch + resume-replay.
+
+  wait-cluster
+    usage: wait-resource [<resource_name>]
+    Waits until a ping-pong communication has succeeded in the
+    whole cluster (or only the members of <resource_name>).
+    NOTICE: this is extremely useful for avoiding races when scripting
+    in a cluster.
+
+  wait-connect
+    usage: wait-connect [<resource_name>]
+    See wait-cluster.
+
+  wait-resource
+    usage: wait-resource <resource_name>
+                         [[attach|fetch|replay|sync][-on|-off]]
+    Wait until the given condition is met on the resource, locally.
+
+  wait-umount
+    usage: wait-umount <resource_name>
+    Wait until /dev/mars/<resource_name> has disappeared in the
+    cluster (even remotely).
+    Useful on both primary and secondary nodes.
+
+<resource_name> = name of resource or "all" for all resources
+
+
+<macroname> = <complex_macroname> | <primitive_macroname>
+
+<complex_macroname> =
+  1and1
+  comminfo
+  commstate
+  cstate
+  default
+  default-global
+  diskstate
+  diskstate-1and1
+  dstate
+  fetch-line
+  fetch-line-1and1
+  flags
+  flags-1and1
+  outdated-flags
+  outdated-flags-1and1
+  primarynode
+  primarynode-1and1
+  replay-line
+  replay-line-1and1
+  replinfo
+  replinfo-1and1
+  replstate
+  replstate-1and1
+  resource-errors
+  resource-errors-1and1
+  role
+  role-1and1
+  state
+  status
+  sync-line
+  sync-line-1and1
+  syncinfo
+  syncinfo-1and1
+  todo-role
+
+
+<primitive_macroname> =
+  deletable-size
+  device-opened
+  errno-text
+    Convert errno numbers (positive or negative) into human readable text.
+  get-log-status
+  get-resource-{fat,err,wrn}{,-count}
+  get-{disk,device}
+  is-{alive}
+  is-{split-brain,consistent,emergency}
+  occupied-size
+  present-{disk,device}
+    (deprecated, use *-present instead)
+  replay-basenr
+  replay-code
+    When negative, this indidates that a replay/recovery error has occurred.
+  rest-space
+  summary-vector
+  systemd-unit
+  tree
+  uuid
+  wait-{is,todo}-{attach,sync,fetch,replay,primary}-{on,off}
+  {alive,fetch,replay,work}-{timestamp,age,lag}
+  {all,the}-{pretty-,}{global-,}{{err,wrn,inf}-,}msg
+  {cluster,resource}-members
+  {disk,device}-present
+  {disk,resource,device}-size
+  {fetch,replay,work}-{lognr,logcount}
+  {get,actual}-primary
+  {is,todo}-{attach,sync,fetch,replay,primary}
+  {my,all}-resources
+  {sync,fetch,replay,work,syncpos}-{size,pos}
+  {sync,fetch,replay,work}-{rest,{almost-,threshold-,}reached,percent,permille,vector}
+  {sync,fetch,replay}-{rate,remain}
+  {time,real-time}
+\end{verbatim}
--- a/docu/screener-verbose.help
+++ b/docu/screener-verbose.help
@ -0,0 +1,365 @@
+\begin{verbatim}
+OVERRIDE verbose=1
+./screener.sh: Run _unattended_ processes in screen sessions.
+    Useful for MASS automation, running hundreds of unattended
+    commands in parallel.
+    HINT: for running more than ~500 sessions in parallel, you might need
+    some system tuning (e.g. rlimits, kernel patches etc) for creating
+    a huge number of file descritor / sockets / etc.
+    ADVANTAGE: You may attach to individual screens, kill them, or continue
+    some waiting commands.
+
+Synopsis:
+  ./screener.sh --help [--verbose]
+  ./screener.sh list-running
+  ./screener.sh list-waiting
+  ./screener.sh list-failed
+  ./screener.sh list-critical
+  ./screener.sh list-serious
+  ./screener.sh list-done
+  ./screener.sh list
+  ./screener.sh list-screens
+  ./screener.sh run <file.csv> [<condition_list>]
+  ./screener.sh start <screen_id> <cmd> <args...>
+  ./screener.sh [<options>] <operation> <screen_id>
+
+Inquiry operations:
+
+  ./screener.sh list-screens
+    Equivalent to screen -ls
+
+  ./screener.sh list-<type>
+    Show a list of currently running, waiting (for continuation), failed,
+    and done/completed screen sessions.
+
+  ./screener.sh list
+    First show a list of currently running screens, then
+    for each <type> a list of (old) failed / completed / sessions
+    (and so on).
+
+  ./screener.sh status <screen_id>
+    Like list-*, but filter <sceen_id> and dont report timestamps.
+
+  ./screener.sh show <screen_id>
+    Show the last logfile of <screen_id> at standard output.
+
+  ./screener.sh less <screen_id>
+    Show the last logfile of <screen_id> using "less -r".
+
+MASS starting of screen sessions:
+
+  ./screener.sh run <file.csv> <condition_list>
+    Commands are launched in screen sessions via "./screener.sh start" commands,
+    unless the same <screen_id> is already running,
+    or is in some error state, or is already done (see below).
+    The commands are given by a column with CSV header name
+    containing "command", or by the first column.
+    The <screen_id> needs to be given by a column with CSV header
+    name matching "screen_id|resource".
+    The number and type of commands to launch can be reduced via
+    any combination of the following filter conditions:
+
+      --max=<number>
+        Limit the number of _new_ sessions additionally started this time.
+
+      --<column_name>==<value>
+        Only select lines where an arbitrary CSV column (given by its
+        CSV header name in C identifier syntax) has the given value.
+
+      --<column_name>!=<value>
+        Only select lines where the colum has _not_ the given value.
+
+      --<column_name>=~<bash_regex>
+        Only select lines where the bash regular expression matches
+        at the given column.
+
+      --max-per=<number>
+        Limit the number per _distinct_ value of the column denoted by
+        the _next_ filter condition.
+        Example: ./screener.sh run test.csv --dry-run --max-per=2 --dst_network=~.
+        would launch only 2 Football processes per destination network.
+
+    Hint: filter conditions can be easily checked by giving --dry-run.
+
+Start / restart / kill / continue screen sessions:
+
+  ./screener.sh start <screen_id> <cmd> <args...>
+    Start a new screen session, running arbitrary <cmd> and <args...>
+    inside.
+
+  ./screener.sh restart <screen_id>
+    Works only when the last command for <screen_id> failed.
+    This will restart the old <cmd> and its <args...> as before.
+    Use only when you want to repeat the same command once again.
+
+  ./screener.sh kill <screen_id>
+    Terminate the running screen session forcibly.
+
+  ./screener.sh continue
+  ./screener.sh continue <screen_id> [<screen_id_list>]
+  ./screener.sh continue <number>
+    Useful for MASS automation of processes involving critical sections
+    such as customer downtime.
+    When giving a numerical <number> argument, up to that number
+    of sessions are resumed (ordered by age).
+    When no further arugment is given, _all_ currently waiting sessions
+    are continued.
+    When --auto-attach is given, it will sequentially resume the
+    sessions to be continued. By default, unless --force_attach is set,
+    it uses "screen -r" skipping those sessions which are already
+    attached to somebody else.
+    This feature works only with prepared scripts which are creating
+    an empty flagfile
+    /home/schoebel/mars/mars-migration.git/screener-logdir-testing/running/$screen_id.waiting
+    whenever they want to wait for manual intervention (for whatever reason).
+    Afterwards, the script must be polling this flagfile for removal.
+    This screener operation simply removes the flagfile, such that
+    the script will then continue afterwards.
+    Example: look into ./football.sh
+    and search for occurrences of substring "call_hook start_wait".
+
+  ./screener.sh wakeup
+  ./screener.sh wakeup <screen_id> [<screen_id_list>]
+  ./screener.sh wakeup <number>
+    Similar to continue, but refers to delayed commands waiting for
+    a timeout. This can be used to individually shorten the timeout
+    period.
+    Example: Football cleanup operations may be artificially delayed
+    before doing "lvremove", to keep some sort of 'backup' for a
+    limited time. When your project is under time pressure, these
+    delays may be hindering.
+    Use this for premature ending of such artificial delays.
+
+  ./screener.sh up <...>
+    Do both continue and wakeup.
+
+  ./screener.sh auto <...>
+    Equivalent to ./screener.sh --auto-attach up <...>
+    Remember that only session without current attachment will be
+    attached to.
+
+Attach to a running session:
+
+  ./screener.sh attach <screen_id>
+    This is equivalent to screen -x $screen_id
+
+  ./screener.sh resume <screen_id>
+    This is equivalent to screen -r $screen_id
+
+Communication:
+
+  ./screener.sh notify <screen_id> <txt>
+    May be called from external scripts to send emails etc.
+
+Locking (only when supported by <cmd>):
+
+  ./screener.sh lock
+  ./screener.sh unlock
+  ./screener.sh lock <screen_id>
+  ./screener.sh unlock <screen_id>
+
+Cleanup / bookkeeping:
+
+  ./screener.sh clear-critical <screen_id>
+  ./screener.sh clear-serious <screen_id>
+  ./screener.sh clear-failed  <screen_id>
+    Mark the status as "done" and move the logfile away.
+
+  ./screener.sh purge [<days>]
+    This will remove all old logfiles which are older than
+    <days>. By default, the variable $screener_log_purge_period
+    will be used, which is currently set to '30'.
+
+  ./screener.sh cron
+    You should call this regulary from a user cron job, in order
+    to purge old logfiles, or to detect hanging sessions, or to
+    automatically send pending emails, etc.
+
+Options:
+
+  --variable
+  --variable=$value
+    These must come first, in order to prevent mixup with
+    options of <cmd> <args...>.
+    Allows overriding of any internal shell variable.
+  --help --verbose
+    Show all overridable shell variables, also for plugins.
+
+  ## football_includes
+  # List of directories where screener-*.conf files can be found.
+  football_includes="${football_includes:-/usr/lib/mars/plugins /etc/mars/plugins $script_dir/plugins $HOME/.mars/plugins ./plugins}"
+
+  ## title
+  # Used as a title for startup of screen sessions, and later for
+  # display at list-*
+  title="${title:-}"
+
+  ## auto_attach
+  # Upon start or upon continue/wakuep/up, attach to the
+  # (newly created or existing) session.
+  auto_attach="${auto_attach:-0}"
+
+  ## auto_attach_grace
+  # Before attaching, wait this time in seconds.
+  # The user may abort within this sleep time by
+  # pressing Ctrl-C.
+  auto_attach_grace="${auto_attach_grace:-10}"
+
+  ## force_attach
+  # Use "screen -x" instead of "screen -r" allowing
+  # shared sessions between different users / end terminals.
+  force_attach="${force_attach:-0}"
+
+  ## drop_shell
+  # When a <cmd> fails, the screen session will not terminated immediately.
+  # Instead, an interactive bash is started, so can later attach and
+  # rectify any probllems.
+  # WARNING! only activate this if you regulary check for failed sessions
+  # and then manually attach to them. Don't use this when running hundreds
+  # or thousand in parallel.
+  drop_shell="${drop_shell:-0}"
+
+  ## session_timeout
+  # Detect hanging sessions when they don't produce any output anymore
+  # for a longer time. Hanging sessions are then marked as failed or critical.
+  session_timeout="${session_timeout:-$(( 3600 * 3 ))}" # seconds
+
+  ## screener_logdir or logdir
+  # Where the logfiles and all status information is kept.
+  export screener_logdir="${screener_logdir:-${logdir:-$HOME/screener-logs}}"
+
+  ## screener_command_log
+  # This logfile will accumulate all relevant $0 command invocations,
+  # including timestamps and ssh agent identities.
+  # To switch off, use /dev/null here.
+  screener_command_log="${screener_command_log:-$screener_logdir/commands.log}"
+
+  ## screener_cron_log
+  # Since "$0 cron" works silently, you won't notice any errors.
+  # This logfiles gives you a chance for checking any problems.
+  screener_cron_log="${screener_cron_log:-$screener_logdir/cron.log}"
+
+  ## screener_log_purge_period
+  # $0 cron or $0 purge will automatically remove all old logfiles
+  # from $screener_logdir/*/ when this period is exceeded.
+  screener_log_purge_period="${screener_log_purge_period:-30}" # Days
+
+  ## dry_run
+  # Dont actually start screen sessions when set.
+  dry_run="${dry_run:-0}"
+
+  ## verbose
+  # increase speakiness.
+  verbose=${verbose:-0}
+
+  ## debug
+  # Some additional debug messages.
+  debug="${debug:-0}"
+
+  ## sleep
+  # Workaround races by keeping sessions open for a few seconds.
+  # This is useful for debugging of immediate script failures.
+  # You have some short time window for attaching.
+  # HINT: instead, just inspect the logfiles in $screener_logdir/*/*.log
+  sleep="${sleep:-3}"
+
+  ## screen_cmd
+  # Customize the screen command (e.g. add some further options, etc).
+  screen_cmd="${screen_cmd:-screen}"
+
+  ## use_screenlog
+  # Add the -L option. Not really useful when running thousands of
+  # parallel screen sessions, because the automatically generated filenames
+  # are crap, and cannot be set in advance.
+  # Useful for basic debugging of setup problems etc.
+  use_screenlog="${use_screenlog:-0}"
+
+  ## waiting_txt and delay_txt
+  # RTFS Don't use this, unless you know what you are doing.
+  waiting_txt="${waiting_txt:-SCREENER_waiting_WAIT}"
+  delayed_txt="${delayed_txt:-SCREENER_delayed_WAIT}"
+
+  ## critical_status
+  # This is the "magic" exit code indicating _criticality_
+  # of a failed command.
+  critical_status="${critical_status:-199}"
+
+  ## serious_status
+  # This is the "magic" exit code indicating _seriosity_
+  # of a failed command.
+  serious_status="${serious_status:-198}"
+
+  ## less_cmd
+  # Used at $0 less $id
+  less_cmd="${less_cmd:-less -r}"
+
+  ## date_format
+  # Here you can customize the appearance of list-* commands
+  date_format="${date_format:-%Y-%m-%d %H:%M}"
+
+  ## csv_delimit
+  # The delimiter used for CSV file parsing
+  csv_delim="${csv_delim:-;}"
+
+  ## csv_cmd_fields
+  # Regex telling the field name for 'cmd'
+  csv_cmd_fields="${csv_cmd_fields:-command}"
+
+  ## csv_id_fields
+  # Regex telling the field name for 'screen_id'
+  csv_id_fields="${csv_id_fields:-screen_id|resource}"
+
+  ## csv_remove
+  # Regex for global removal of command options
+  csv_remove="${csv_remove:---screener}"
+
+  ## user_name
+  # Normally automatically derived from ssh agent or from $LOGNAME.
+  # Please override this only when really necessary.
+  export user_name="${user_name:-$(ssh-add -l | grep -o '[^ ]+@[^ ]+' | sort -u | tail -1)}"
+  export user_name="${user_name:-$LOGNAME}"
+
+  ## tmp_dir and tmp_stub
+  # Where temporary files are residing
+  tmp_dir="${tmp_dir:-/tmp}"
+  tmp_stub="${tmp_stub:-$tmp_dir/screener.$$}"
+
+Running hook: email_describe_plugin 
+
+PLUGIN screener-email
+
+  Generic plugin for sending emails (or SMS via gateways)
+  upon status changes, such as script failures.
+
+  ## email_*
+  # List of email addresses.
+  # Empty = don't send emails.
+  email_critical="${email_critical:-}"
+  email_serious="${email_serious:-}"
+  email_failed="${email_failed:-}"
+  email_warning="${email_warning:-}"
+  email_waiting="${email_waiting:-}"
+  email_done="${email_done:-}"
+
+  ## sms_*
+  # List of email addresses of SMS gateways.
+  # These may be distinct from email_*.
+  # Empty = don't send sms.
+  sms_critical="${sms_critical:-}"
+  sms_serious="${sms_serious:-}"
+  sms_failed="${sms_failed:-}"
+  sms_warning="${sms_warning:-}"
+  sms_waiting="${sms_waiting:-}"
+  sms_done="${sms_done:-}"
+
+  ## email_cmd
+  # Command for email sending.
+  # Please include your gateways etc here.
+  email_cmd="${email_cmd:-mailx -S smtp=mx.nowhere.org:587 -S smpt-auth-user=test}"
+
+  ## email_logfiles
+  # Whether to include logfiles in the body.
+  # Not used for sms_*.
+  email_logfiles="${email_logfiles:-1}"
+
+\end{verbatim}
--- a/docu/screener.help
+++ b/docu/screener.help
@ -0,0 +1,193 @@
+\begin{verbatim}
+./screener.sh: Run _unattended_ processes in screen sessions.
+    Useful for MASS automation, running hundreds of unattended
+    commands in parallel.
+    HINT: for running more than ~500 sessions in parallel, you might need
+    some system tuning (e.g. rlimits, kernel patches etc) for creating
+    a huge number of file descritor / sockets / etc.
+    ADVANTAGE: You may attach to individual screens, kill them, or continue
+    some waiting commands.
+
+Synopsis:
+  ./screener.sh --help [--verbose]
+  ./screener.sh list-running
+  ./screener.sh list-waiting
+  ./screener.sh list-failed
+  ./screener.sh list-critical
+  ./screener.sh list-serious
+  ./screener.sh list-done
+  ./screener.sh list
+  ./screener.sh list-screens
+  ./screener.sh run <file.csv> [<condition_list>]
+  ./screener.sh start <screen_id> <cmd> <args...>
+  ./screener.sh [<options>] <operation> <screen_id>
+
+Inquiry operations:
+
+  ./screener.sh list-screens
+    Equivalent to screen -ls
+
+  ./screener.sh list-<type>
+    Show a list of currently running, waiting (for continuation), failed,
+    and done/completed screen sessions.
+
+  ./screener.sh list
+    First show a list of currently running screens, then
+    for each <type> a list of (old) failed / completed / sessions
+    (and so on).
+
+  ./screener.sh status <screen_id>
+    Like list-*, but filter <sceen_id> and dont report timestamps.
+
+  ./screener.sh show <screen_id>
+    Show the last logfile of <screen_id> at standard output.
+
+  ./screener.sh less <screen_id>
+    Show the last logfile of <screen_id> using "less -r".
+
+MASS starting of screen sessions:
+
+  ./screener.sh run <file.csv> <condition_list>
+    Commands are launched in screen sessions via "./screener.sh start" commands,
+    unless the same <screen_id> is already running,
+    or is in some error state, or is already done (see below).
+    The commands are given by a column with CSV header name
+    containing "command", or by the first column.
+    The <screen_id> needs to be given by a column with CSV header
+    name matching "screen_id|resource".
+    The number and type of commands to launch can be reduced via
+    any combination of the following filter conditions:
+
+      --max=<number>
+        Limit the number of _new_ sessions additionally started this time.
+
+      --<column_name>==<value>
+        Only select lines where an arbitrary CSV column (given by its
+        CSV header name in C identifier syntax) has the given value.
+
+      --<column_name>!=<value>
+        Only select lines where the colum has _not_ the given value.
+
+      --<column_name>=~<bash_regex>
+        Only select lines where the bash regular expression matches
+        at the given column.
+
+      --max-per=<number>
+        Limit the number per _distinct_ value of the column denoted by
+        the _next_ filter condition.
+        Example: ./screener.sh run test.csv --dry-run --max-per=2 --dst_network=~.
+        would launch only 2 Football processes per destination network.
+
+    Hint: filter conditions can be easily checked by giving --dry-run.
+
+Start / restart / kill / continue screen sessions:
+
+  ./screener.sh start <screen_id> <cmd> <args...>
+    Start a new screen session, running arbitrary <cmd> and <args...>
+    inside.
+
+  ./screener.sh restart <screen_id>
+    Works only when the last command for <screen_id> failed.
+    This will restart the old <cmd> and its <args...> as before.
+    Use only when you want to repeat the same command once again.
+
+  ./screener.sh kill <screen_id>
+    Terminate the running screen session forcibly.
+
+  ./screener.sh continue
+  ./screener.sh continue <screen_id> [<screen_id_list>]
+  ./screener.sh continue <number>
+    Useful for MASS automation of processes involving critical sections
+    such as customer downtime.
+    When giving a numerical <number> argument, up to that number
+    of sessions are resumed (ordered by age).
+    When no further arugment is given, _all_ currently waiting sessions
+    are continued.
+    When --auto-attach is given, it will sequentially resume the
+    sessions to be continued. By default, unless --force_attach is set,
+    it uses "screen -r" skipping those sessions which are already
+    attached to somebody else.
+    This feature works only with prepared scripts which are creating
+    an empty flagfile
+    /home/schoebel/mars/mars-migration.git/screener-logdir-testing/running/$screen_id.waiting
+    whenever they want to wait for manual intervention (for whatever reason).
+    Afterwards, the script must be polling this flagfile for removal.
+    This screener operation simply removes the flagfile, such that
+    the script will then continue afterwards.
+    Example: look into ./football.sh
+    and search for occurrences of substring "call_hook start_wait".
+
+  ./screener.sh wakeup
+  ./screener.sh wakeup <screen_id> [<screen_id_list>]
+  ./screener.sh wakeup <number>
+    Similar to continue, but refers to delayed commands waiting for
+    a timeout. This can be used to individually shorten the timeout
+    period.
+    Example: Football cleanup operations may be artificially delayed
+    before doing "lvremove", to keep some sort of 'backup' for a
+    limited time. When your project is under time pressure, these
+    delays may be hindering.
+    Use this for premature ending of such artificial delays.
+
+  ./screener.sh up <...>
+    Do both continue and wakeup.
+
+  ./screener.sh auto <...>
+    Equivalent to ./screener.sh --auto-attach up <...>
+    Remember that only session without current attachment will be
+    attached to.
+
+Attach to a running session:
+
+  ./screener.sh attach <screen_id>
+    This is equivalent to screen -x $screen_id
+
+  ./screener.sh resume <screen_id>
+    This is equivalent to screen -r $screen_id
+
+Communication:
+
+  ./screener.sh notify <screen_id> <txt>
+    May be called from external scripts to send emails etc.
+
+Locking (only when supported by <cmd>):
+
+  ./screener.sh lock
+  ./screener.sh unlock
+  ./screener.sh lock <screen_id>
+  ./screener.sh unlock <screen_id>
+
+Cleanup / bookkeeping:
+
+  ./screener.sh clear-critical <screen_id>
+  ./screener.sh clear-serious <screen_id>
+  ./screener.sh clear-failed  <screen_id>
+    Mark the status as "done" and move the logfile away.
+
+  ./screener.sh purge [<days>]
+    This will remove all old logfiles which are older than
+    <days>. By default, the variable $screener_log_purge_period
+    will be used, which is currently set to '30'.
+
+  ./screener.sh cron
+    You should call this regulary from a user cron job, in order
+    to purge old logfiles, or to detect hanging sessions, or to
+    automatically send pending emails, etc.
+
+Options:
+
+  --variable
+  --variable=$value
+    These must come first, in order to prevent mixup with
+    options of <cmd> <args...>.
+    Allows overriding of any internal shell variable.
+  --help --verbose
+    Show all overridable shell variables, also for plugins.
+
+
+PLUGIN screener-email
+
+  Generic plugin for sending emails (or SMS via gateways)
+  upon status changes, such as script failures.
+
+\end{verbatim}