2018-04-23 13:25:32 +00:00
\begin{verbatim}
verbose=1
Usage:
./football.sh --help [--verbose]
Show help
./football.sh --variable=<value>
Override any shell variable
Actions for resource migration:
./football.sh migrate <resource> <target_primary> [<target_secondary>]
Run the sequence
migrate_prepare ; migrate_wait ; migrate_finish; migrate_cleanup.
2018-09-26 13:48:23 +00:00
Dto for testing (do not rely on it):
2018-07-24 06:59:19 +00:00
2018-04-23 13:25:32 +00:00
./football.sh migrate_prepare <resource> <target_primary> [<target_secondary>]
Allocate LVM space at the targets and start MARS replication.
./football.sh migrate_wait <resource> <target_primary> [<target_secondary>]
Wait until MARS replication reports UpToDate.
./football.sh migrate_finish <resource> <target_primary> [<target_secondary>]
Call hooks for handover to the targets.
./football.sh migrate_cleanup <resource>
Remove old / currently unused LV replicas from MARS and deallocate
from LVM.
2018-07-02 08:29:37 +00:00
Actions for inplace FS shrinking:
./football.sh shrink <resource> <percent>
Run the sequence shrink_prepare ; shrink_finish ; shrink_cleanup.
2018-09-26 13:48:23 +00:00
Dto for testing (do not rely on it):
2018-07-24 06:59:19 +00:00
2018-07-02 08:29:37 +00:00
./football.sh shrink_prepare <resource> [<percent>]
Allocate temporary LVM space (when possible) and create initial
raw FS copy.
Default percent value(when left out) is 85.
./football.sh shrink_finish <resource>
Incrementally update the FS copy, swap old <=> new copy with
small downtime.
./football.sh shrink_cleanup <resource>
Remove old FS copy from LVM.
Actions for inplace FS extension:
2018-07-10 12:02:58 +00:00
./football.sh expand <resource> <percent>
2018-07-02 08:29:37 +00:00
./football.sh extend <resource> <percent>
2018-07-10 12:02:58 +00:00
Increase mounted filesystem size during operations.
2018-07-02 08:29:37 +00:00
Combined actions:
./football.sh migrate+shrink <resource> <target_primary> [<target_secondary>] [<percent>]
Similar to migrate ; shrink but produces less network traffic.
Default percent value (when left out) is 85.
./football.sh migrate+shrink+back <resource> <tmp_primary> [<percent>]
Migrate temporarily to <tmp_primary>, then shrink there,
finally migrate back to old primary and secondaries.
Default percent value (when left out) is 85.
2018-04-23 13:25:32 +00:00
Actions for (manual) repair in emergency situations:
2018-07-24 06:59:19 +00:00
./football.sh manual_handover <resource> <target_primary>
This is useful in place of going to the machines and starting
handover on their command line. You dont need to log in.
All hooks (e.g. for downtime / reporting / etc) are automatically
called.
Notice: it will only work when there is already a replica
at <target_primary>, and when further constraints such as
clustermanager constraints will allow it.
For a full Football game between different clusters, use
"migrate" instead.
2018-04-23 13:25:32 +00:00
./football.sh manual_migrate_config <resource> <target_primary> [<target_secondary>]
Transfer only the cluster config, without changing the MARS replicas.
This does no resource stopping / restarting.
Useful for reverting a failed migration.
./football.sh manual_config_update <hostname>
Only update the cluster config, without changing anything else.
Useful for manual repair of failed migration.
./football.sh manual_merge_cluster <hostname1> <hostname2>
Run "marsadm merge-cluster" for the given hosts.
Hostnames must be from different (former) clusters.
./football.sh manual_split_cluster <hostname_list>
Run "marsadm split-cluster" at the given hosts.
Useful for fixing failed / asymmetric splits.
Hint: provide _all_ hostnames which have formerly participated
in the cluster.
./football.sh repair_vm <resource> <primary_candidate_list>
Try to restart the VM <resource> on one of the given machines.
Useful during unexpected customer downtime.
./football.sh repair_mars <resource> <primary_candidate_list>
Before restarting the VM like in repair_vm, try to find a local
LV where a stand-alone MARS resource can be found and built up.
Use this only when the MARS resources are gone, and when you are
desperate. Problem: this will likely create a MARS setup which is
not usable for production, and therefore must be corrected later
by hand. Use this only during an emergency situation in order to
get the customers online again, while buying the downsides of this
command.
2018-07-02 08:29:37 +00:00
./football.sh manual_lock <item> <host_list>
./football.sh manual_unlock <item> <host_list>
Manually lock or unlock an item at all of the given hosts, in
an atomic fashion. In most cases, use "ALL" for the item.
2018-04-23 13:25:32 +00:00
2018-09-26 13:48:23 +00:00
Only for testing / development (no stable interfaces):
./football.sh manual_call_hook <name> <args>
2018-04-23 13:25:32 +00:00
Global maintenance:
./football.sh lv_cleanup <resource>
General features:
- Instead of <percent>, an absolute amount of storage with suffix
'k' or 'm' or 'g' can be given.
- When <resource> is currently stopped, login to the container is
not possible, and in turn the hypervisor node and primary storage node
cannot be automatically determined. In such a case, the missing
nodes can be specified via the syntax
<resource>:<hypervisor>:<primary_storage>
- The following LV suffixes are used (naming convention):
-tmp = currently emerging version for shrinking
-preshrink = old version before shrinking took place
- By adding the option --screener, you can handover football execution
to ./screener.sh .
When some --enable_*_waiting is also added, then the critical
sections involving customer downtime are temporarily halted until
some sysadmins says "screener.sh continue $resource" or
attaches to the sessions and presses the RETURN key.
2018-07-24 06:59:19 +00:00
Configuration:
You can place shell variable definitions for overriding any
tunables into the following locations:
football_includes=/usr/lib/mars/plugins /etc/mars/plugins /home/schoebel/mars/football-master.git/plugins /home/schoebel/.mars/plugins ./plugins
football_confs=/usr/lib/mars/confs /etc/mars/confs /home/schoebel/mars/football-master.git/confs /home/schoebel/.mars/confs ./confs
football_creds=/usr/lib/mars/creds /etc/mars/creds /home/schoebel/mars/football-master.git/creds /home/schoebel/mars/football-master.git /home/schoebel/.mars/creds ./creds
Filenames should match the following patterns:
football-*.preconf Here you may change paths and enable_* variables.
football-*.conf Inteded for main parameters.
football-*.postconf For late overrides after sourcing modules.
football-*.reconf Modify runtime parameters during waits.
2018-04-23 13:25:32 +00:00
## football_includes
2018-07-02 08:29:37 +00:00
# List of directories where football-*.sh and football-*.conf
# files can be found.
2018-04-23 13:25:32 +00:00
football_includes="${football_includes:-/usr/lib/mars/plugins /etc/mars/plugins $script_dir/plugins $HOME/.mars/plugins ./plugins}"
2018-07-02 08:29:37 +00:00
## football_confs
# Another list of directories where football-*.conf files can be found.
# These are sourced in a second pass after $football_includes.
# Thus you can change this during the first pass.
football_confs="${football_confs:-/usr/lib/mars/confs /etc/mars/confs $script_dir/confs $HOME/.mars/confs ./confs}"
## football_creds
# List of directories where various credential files can be found.
football_creds="${football_creds:-/usr/lib/mars/creds /etc/mars/creds $script_dir/creds $script_dir $HOME/.mars/creds ./creds}"
2018-07-24 06:59:19 +00:00
## trap_signals
# List of signal names which should be trapped.
# Traps are importnatn for housekeeping, e.g. automatic
# removal of locks.
trap_signals="${trap_signals:-SIGINT}"
2018-04-23 13:25:32 +00:00
## dry_run
# When set, actions are only simulated.
dry_run=${dry_run:-0}
## verbose
# increase speakiness.
verbose=${verbose:-0}
## confirm
# Only for debugging: manually started operations can be
# manually checked and confirmed before actually starting opersions.
confirm=${confirm:-1}
## force
# Normally, shrinking and extending will only be started if there
# is something to do.
# Enable this for debugging and testing: the check is then skipped.
force=${force:-0}
## debug_injection_point
# RTFS don't set this unless you are a developer knowing what you are doing.
debug_injection_point="${debug_injection_point:-0}"
## football_logdir
# Where the logfiles should be created.
# HINT: after playing Football in masses for a whiile, your $logdir will
# be easily populated with hundreds or thousands of logfiles.
# Set this to your convenience.
football_logdir="${football_logdir:-${logdir:-$HOME/football-logs}}"
2018-07-10 12:02:58 +00:00
## football_backup_dir
# In this directory, various backups are created.
# Intended for manual repair.
football_backup_dir="${football_backup_dir:-$football_logdir/backups}"
2018-04-23 13:25:32 +00:00
## screener
2018-07-24 06:59:19 +00:00
# When enabled, delegate execution to the screener.
2018-04-23 13:25:32 +00:00
# Very useful for running Football in masses.
2018-07-24 06:59:19 +00:00
screener="${screener:-1}"
2018-04-23 13:25:32 +00:00
## min_space
# When testing / debugging with extremely small LVs, it may happen
# that mkfs refuses to create extemely small filesystems.
# Use this to ensure a minimum size.
min_space="${min_space:-20000000}"
## cache_repeat_lapse
# When using the waiting capabilities of screener, and when waits
# are lasting very long, your dentry cache may become cold.
# Use this for repeated refreshes of the dentry cache after some time.
cache_repeat_lapse="${cache_repeat_lapse:-120}" # Minutes
2018-07-24 06:59:19 +00:00
## remote_ping
# Before using ssh, ping the target.
# This is only useful in special cases.
remote_ping="${remote_ping:-0}"
## ping_opts
# Options for ping checks.
ping_opts="${ping_opts:--W 1 -c 1}"
2018-04-23 13:25:32 +00:00
## ssh_opt
# Useful for customization to your ssh environment.
ssh_opt="${ssh_opt:--4 -A -o StrictHostKeyChecking=no -o ForwardX11=no -o KbdInteractiveAuthentication=no -o VerifyHostKeyDNS=no}"
2018-07-24 06:59:19 +00:00
## ssh_auth
# Useful for extra -i options.
ssh_auth="${ssh_auth:-}"
2018-04-23 13:25:32 +00:00
## rsync_opt
# The rsync options in general.
# IMPORTANT: some intermediate progress report is absolutely needed,
# because otherwise a false-positive TIMEOUT may be assumed when
# no output is generated for several hours.
2018-09-26 13:48:23 +00:00
rsync_opt="${rsync_opt:- -aH --inplace --info=progress2,STATS}"
2018-04-23 13:25:32 +00:00
## rsync_opt_prepare
# Additional rsync options for preparation and updating
# of the temporary shrink mirror filesystem.
rsync_opt_prepare="${rsync_opt_prepare:---exclude='.filemon2' --delete}"
2018-07-10 12:02:58 +00:00
## rsync_opt_hot
# This is only used at the final rsync, immediately before going
# online again.
rsync_opt_hot="${rsync_opt_hot:---delete}"
2018-04-23 13:25:32 +00:00
## rsync_nice
# Typically, the preparation steps are run with background priority.
rsync_nice="${rsync_nice:-nice -19}"
## rsync_repeat_prepare and rsync_repeat_hot
# Tuning: increases the reliability of rsync and ensures that the dentry cache
# remains hot.
rsync_repeat_prepare="${rsync_repeat_prepare:-5}"
rsync_repeat_hot="${rsync_repeat_hot:-3}"
2018-07-02 08:29:37 +00:00
## rsync_skip_lines
# Number of rsync lines to skip in output (avoid overflow of logfiles).
rsync_skip_lines="${rsync_skip_lines:-1000}"
2018-07-10 12:02:58 +00:00
## use_tar
# Use incremental Gnu tar in place of rsync:
# 0 = don't use tar
# 1 = only use for the first (full) data transfer, then use rsync
# 2 = always use tar
# Experience: tar has better performance on local data than rsync, but
# it tends to produce false-positive failure return codes on online
# filesystems which are altered during tar.
# The combined mode 1 tries to find a good compromise between both
# alternatives.
use_tar="${use_tar:-1}"
## tar_exe
# Use this for activation of patched tar versions, such as the
# 1&1-internal patched spacetools-tar.
tar_exe="${tar_exe:-/bin/tar}"
## tar_options_src and tar_options_dst
# Here you may give different options for both sides of tar invocations
# (source and destination), such as verbosity options etc.
tar_options_src="${tar_options_src:-}"
tar_options_dst="${tar_options_dst:-}"
## tar_is_fixed
# Tell whether your tar version reports false-positive transfer errors,
# or not.
tar_is_fixed="${tar_is_fixed:-0}"
## tar_state_dir
# This directory is used for keeping incremental tar state information.
tar_state_dir="${tar_state_dir:-/var/tmp}"
## buffer_cmd
# Speed up tar by intermediate buffering.
buffer_cmd="${buffer_cmd:-buffer -m 16m -S 1024m || cat}"
2018-04-23 13:25:32 +00:00
## wait_timeout
# Avoid infinite loops upon waiting.
wait_timeout="${wait_timeout:-$(( 24 * 60 ))}" # Minutes
## lvremove_opt
# Some LVM versions are requiring this for unattended batch operations.
lvremove_opt="${lvremove_opt:--f}"
2018-07-10 12:02:58 +00:00
## automatic recovery options: enable_failure_*
enable_failure_restart_vm="${enable_failure_restart_vm:-1}"
enable_failure_recreate_cluster="${enable_failure_recreate_cluster:-0}"
enable_failure_rebuild_mars="${enable_failure_rebuild_mars:-1}"
2018-04-23 13:25:32 +00:00
## critical_status
# This is the "magic" exit code indicating _criticality_
# of a failed command.
critical_status="${critical_status:-199}"
## serious_status
# This is the "magic" exit code indicating _seriosity_
# of a failed command.
serious_status="${serious_status:-198}"
2018-09-26 13:48:23 +00:00
## interrupted_status
# This is the "magic" exit code indicating a manual interruption
# (e.g. keypress Ctl-c)
interrupted_status="${interrupted_status:-190}"
## illegal_status
# This is the "magic" exit code indicating an illegal command
# (e.g. syntax error, illegal arguments, etc)
illegal_status="${illegal_status:-191}"
2018-04-23 13:25:32 +00:00
## pre_hand or --pre-hand=
2018-07-02 08:29:37 +00:00
# Set this to do an ordinary handover to a new start position
# (in the source cluster) before doing anything else.
# This may be used for handover to a different datacenter,
# in order to minimize cross traffic between datacenters.
2018-04-23 13:25:32 +00:00
pre_hand="${pre_hand:-}"
2018-07-02 08:29:37 +00:00
## post_hand or --post-hand=
# Set this to do an ordinary handover to a final position
# (in the target cluster) after everything has successfully finished.
# This may be used to establish a uniform default running location.
post_hand="${post_hand:-}"
2018-07-24 06:59:19 +00:00
## tmp_suffix
# Only for experts.
tmp_suffix="${tmp_suffix:--tmp}"
## shrink_suffix_old
# Suffix for backup LVs. These are kept for wome time until
# *_cleanup operations will remove them.
shrink_suffix_old="${shrink_suffix_old:--preshrink}"
## start_regex
# At which $operation the hook football_start
# shoule be called
start_regex="${start_regex:-^(migrate_prepare|migrate|migrate+|shrink_prepare|shrink)}"
## finished_regex
# At which $operation the hook football_finished
# shoule be called
finished_regex="${finished_regex:-^(migrate_finish|migrate|migrate+|shrink_finish|shrink)}"
2018-09-26 13:48:23 +00:00
## call_finished
# Whether to call the hook football_failed at failures.
call_finished="${call_finished:-1}"
2018-07-02 08:29:37 +00:00
## lock_break_timeout
# When remote ssh commands are failing, remote locks may sustain forever.
# Avoid deadlocks by breaking remote locks after this timeout has elapsed.
# NOTICE: these type of locks are only intended for short-term locking.
lock_break_timeout="${lock_break_timeout:-3600}" # seconds
2018-04-23 13:25:32 +00:00
## startup_when_locked
# When == 0:
# Don't abort and don't wait when a lock is detected at startup.
# When == 1 and when enable_startup_waiting=1:
# Wait until the lock is gone.
# When == 2:
# Abort start of script execution when a lock is detected.
# Later, when a locks are set _during_ execution, they will
# be obeyed when enable_*_waiting is set (instead), and will
# lead to waits instead of aborts.
startup_when_locked="${startup_when_locked:-1}"
2018-07-24 06:59:19 +00:00
## resource_pre_check
# Useful for debugging of container problems.
# Normally not needed.
resource_pre_check="${resource_pre_check:-0}"
2018-09-26 13:48:23 +00:00
## enable_background_reporting
# Progress reporting to screener.
# Runs in the background, in parallel to forground processes
# like rsync or tar.
enable_background_reporting="${enable_background_reporting:-1}"
2018-07-24 06:59:19 +00:00
## condition_check_interval
# How often conditions should be re-evaluated.
condition_check_interval="${condition_check_interval:-180}" # Seconds
2018-09-26 13:48:23 +00:00
## lease_time
# Any intents (e.g. for creation of new resources) are recorded.
# This is needed for race avoidance, when multiple resources
# are migrated in _parallel_ to the _same_ target.
# This might lead to livelocks when there would be no lease time
# after which the intents are regarded as "invalid".
lease_time="${lease_time:-3600}" # seconds
2018-07-24 06:59:19 +00:00
## limit_syncs
# Limit the number of actually running syncs by waiting
# until less than this number of syncs are running at any
# target host.
limit_syncs="${limit_syncs:-4}"
## limit_shrinks
# Limit the number of actually running shrinks by waiting
# until less than this number of shrinks are running at any
# target host.
limit_shrinks="${limit_shrinks:-1}"
## count_shrinks_by_tmp_mount
# Only count the temporary mounts.
# Otherwise, LVs are counted. The latter may yield false positives
# because LVs may be created in advance (e.g. at another cluster member)
count_shrinks_by_tmp_mount="${count_shrinks_by_tmp_mount:-1}"
## limit_mars_logfile
# Dont handover when too much logfile data is missing at the
# new primary site.
limit_mars_logfile="${limit_mars_logfile:-1024}" # MiB
2018-09-26 13:48:23 +00:00
## shrink_min_ram_gb
# When set, check that the target machines for shrinking
# have enough RAM.
# Rationale: even incremental rsync needs the Dentry cache of the
# kernel. When there is not enough RAM, and when there are some millions
# of inodes, the customer downtime may rise to some hours or even some days
# instead of some minutes (only when the detnry+inode cache does not
# fit into kernel memory <<<=== this is the cruscial point)
shrink_min_ram_gb="${shrink_min_ram_gb:-0}" # GiB
2018-07-24 06:59:19 +00:00
## optimize_dentry_cache
# Don't umount the temporary shrink space unnecessarily.
# Try to shutdown the VM / container without umounting.
# Important for high speed.
optimize_dentry_cache="${optimize_dentry_cache:-1}"
## mkfs_cmd
# Tunable for creation of new filesystems.
mkfs_cmd="${mkfs_cmd:-mkfs.xfs -s size=4096 -d agcount=1024}"
## mount_opts
# Options for temporary mounts.
# Not used for ordinary clustermanager operations.
mount_opts="${mount_opts:--o rw,nosuid,noatime,attr2,inode64,usrquota}"
## reuse_mount
# Assume that already existing temporary mounts are the correct ones.
# This will speed up interrupted and repeated runs by factors.
reuse_mount="${reuse_mount:-1}"
## reuse_lv
# Assume that temporary LVs are reusable.
reuse_lv="${reuse_lv:-1}"
## reuse_lv_check
# When set, this command is executed for checking whether
# the LV can be reused.
reuse_lv_check="${reuse_lv_check:-xfs_db -c sb -c print -r}"
## do_quota
# Transfer xfs quota information.
# 0 = off
# 1 = global xfs quota transfer
# 2 = additionally local one
do_quota="${do_quota:-2}"
## xfs_dump_dir
# Temporary space for keeping xfs quota dumps.
xfs_dump_dir="${xfs_dump_dir:-$football_backup_dir/xfs-quota-$start_stamp}"
## xfs_quota_enable
# Command for re-enabling the quota system after shrink.
xfs_quota_enable="${xfs_quota_enable:-xfs_quota -x -c enable}"
## xfs_dump and xfs_restore
# Commands for transfer of xfs quota information.
xfs_dump="${xfs_dump:-xfs_quota -x -c dump}"
xfs_restore="${xfs_restore:-xfs_quota -x -c restore}"
2018-09-26 13:48:23 +00:00
## shortcut_tar_percent
# Percentage when a shrink space should no longer be considered
# as "inital" (or empty).
shortcut_tar_percent="${shortcut_tar_percent:-5}"
## max_rsync_downtime
# When set, check the _expected_ duration of customer downtime.
# if it takes longer than this limit, abort without causing
# customer downtime.
# Afterward, sysadmins need to decide what to do:
# For example, move the resource to faster hardware with more RAM, or similar.
max_rsync_downtime="${max_rsync_downtime:-0}" # seconds
## merge_shrink_secondaries
# This is only needed when targets are not yet pre-merged.
merge_shrink_secondaries="${merge_shrink_secondaries:-0}"
2018-07-24 06:59:19 +00:00
## fs_resize_cmd
# Command for online filesystem expansion.
fs_resize_cmd="${fs_resize_cmd:-xfs_growfs -d}"
## migrate_two_phase
# This is useful when the new hardware has a better replication network,
# e.g. 10GBit uplink instead of 1GBit.
# Instead of starting two or more syncs in parallel on the old hardware,
# run the syncs in two phases:
# 1. migrate data to the new primary only.
# 1b. handover to new primary.
# 2. now start migration of data to the new secondaries, over the better
# network attachment of the new hardware.
migrate_two_phase="${migrate_two_phase:-0}"
## migrate_always_all
# By default, migrate+shrink creates only 1 replica during the initial
# migration.
# When setting this, all replicas are created, which improves resilience,
# but worsens network performance.
migrate_always_all="${migrate_always_all:-0}"
## migrate_early_cleanup
# Early cleanup of old replicas when using migrate_always_all or
# migrate_two_phase.
# Only reasonable when combined with migrate+shrink.
# This is slightly less safe, but saves time when you want to
# decommission old hardware as fast as popssible.
# Early cleanup of the old replicase will only be done when
# at least 2 replicas are available at the new (target) side.
# These two new replicas can be created either by
# a) migrate_always_all=1 or
# b) migrate_two_phase=1 or automatically selected (or not) via
# c) auto_two_phase=1
migrate_early_cleanup="${migrate_early_cleanup:-1}"
2018-04-23 13:25:32 +00:00
## user_name
# Normally automatically derived from ssh agent or from $LOGNAME.
# Please override this only when really necessary.
export user_name="${user_name:-$(get_real_ssh_user)}"
export user_name="${user_name:-$LOGNAME}"
2018-07-02 08:29:37 +00:00
## replace_ssh_id_file
# When set, replace current ssh user with this one.
# The new user should hot have a passphrase.
# Useful for logging out the original user (interrupting the original
# ssh agent chain).
replace_ssh_id_file="${replace_ssh_id_file:-}"
2018-04-23 13:25:32 +00:00
2018-07-02 08:29:37 +00:00
PLUGIN football-1and1config
2018-04-23 13:25:32 +00:00
2018-07-02 08:29:37 +00:00
1&1 specfic plugin for dealing with the cm3 clusters
2018-07-10 12:02:58 +00:00
and its concrete configuration.
## enable_1and1config
# ShaHoLin-specifc plugin for working with the infong platform
# (istore, icpu, infong) via 1&1-specific clustermanager cm3
# and related toolsets. Much of it is bound to a singleton database
# instance (clustermw & siblings).
enable_1and1config="${enable_1and1config:-$(if [[ "$0" =~ tetris ]]; then echo 1; else echo 0; fi)}"
2018-07-24 06:59:19 +00:00
## runstack_host
# To be provided in a *.conf or *.preconf file.
runstack_host="${runstack_host:-}"
## runstack_cmd
# Command to be provided in a *.conf file.
runstack_cmd="${runstack_cmd:-}"
## runstack_ping
# Only call runstack when the container is pingable.
runstack_ping="${runstack_ping:-1}"
## dastool_host
# To be provided in a *.conf or *.preconf file.
dastool_host="${dastool_host:-}"
## dastool_cmd
# Command to be provided in a *.conf file.
dastool_cmd="${dastool_cmd:-}"
## update_host
# To be provided in a *.conf or *.preconf file.
update_host="${update_host:-}"
## update_cmd
# Command to be provided in a *.conf file.
update_cmd="${update_cmd:-}"
2018-07-10 12:02:58 +00:00
PLUGIN football-cm3
1&1 specfic plugin for dealing with the cm3 cluster manager
and its concrete operating enviroment (singleton instance).
Current maximum cluster size limit:
Maximum #syncs running before migration can start:
Following marsadm --version must be installed:
Following mars kernel modules must be loaded:
Specific actions for plugin football-cm3:
./football.sh clustertool {GET|PUT} <url>
Call through to the clustertool via REST.
Useful for manual inspection and repair.
2018-07-24 06:59:19 +00:00
Specific features with plugin football-cm3:
- Parameter syntax "cluster123" instead of "icpu456 icpu457"
This is an alternate specification syntax, which is
automatically replaced with the real machine names.
It tries to minimize datacenter cross-traffic by
taking the new $target_primary at the same datacenter
location where the container is currenty running.
2018-07-10 12:02:58 +00:00
## enable_cm3
# ShaHoLin-specifc plugin for working with the infong platform
# (istore, icpu, infong) via 1&1-specific clustermanager cm3
# and related toolsets. Much of it is bound to a singleton database
# instance (clustermw & siblings).
enable_cm3="${enable_cm3:-$(if [[ "$0" =~ tetris ]]; then echo 1; else echo 0; fi)}"
## skip_resource_ping
# Enable this only for testing. Normally, a resource name denotes a
# container name == machine name which must be runnuing as a precondition,
# und thus must be pingable over network.
skip_resource_ping="${skip_resource_ping:-0}"
2018-09-26 13:48:23 +00:00
## business_hours
# When set, critical sections are only entered during certain
# days of the week, and/or during certain hours.
# This is a regex matching against "date +%u_%H".
# Example regex: [1-5]_(0[8-9]|1[0-8])
# This means Monday to Friday from 8 to 18 o'clock.
business_hours="${business_hours:-}"
## cm3_stop_safeguard_cmd
# Workaround for a bug.
# Sometimes a systemd unit does not go away.
cm3_stop_safeguard_cmd="${cm3_stop_safeguard_cmd:-{ sleep 2; try=0; while (( try++ < 10 )) && systemctl show $res.scope | grep ActiveState | grep =active; do systemctl stop $res.scope; sleep 6; done; if mountpoint /vol/$res; then umount /vol/$res; fi; }}"
2018-07-10 12:02:58 +00:00
## check_ping_rounds
# Number of pings to try before a container is assumed to
# not respond.
check_ping_rounds="${check_ping_rounds:-5}"
2018-07-24 06:59:19 +00:00
## additional_runstack
# Do an additional runstack after startup of the new container.
# In turn, this will only do something when source and target are
# different.
additional_runstack="${additional_runstack:-1}"
2018-07-10 12:02:58 +00:00
## workaround_firewall
# Documentation of technical debt for later generations:
# This is needed since July 2017. In the many years before, no firewalling
# was effective at the replication network, because it is a physically
# separate network from the rest of the networking infrastructure.
# An attacker would first need to gain root access to the _hypervisor_
# (not only to the LXC container and/or to KVM) before gaining access to
# those physical replication network interfaces.
# Since about that time, which is about the same time when the requirements
# for Container Football had been communicated, somebody introduced some
# unnecessary firewall rules, based on "security arguments".
# These arguments were however explicitly _not_ required by the _real_
# security responsible person, and explicitly _not_ recommended by him.
# Now the problem is that it is almost politically impossible to get
# rid of suchalike "security feature".
# Until the problem is resolved, Container Football requires
# the _entire_ local firewall to be _temporarily_ shut down in order to
# allow marsadm commands over ssh to work.
# Notice: this is _not_ increasing the general security in any way.
# LONGTERM solution / TODO: future versions of mars should no longer
# depend on ssh.
# Then this "feature" can be turned off.
workaround_firewall="${workaround_firewall:-1}"
## ip_magic
# Similarly to workaround_firewall, this is needed since somebody
# introduced additional firewall rules also disabling sysadmin ssh
# connections at the _ordinary_ sysadmin network.
ip_magic="${ip_magic:-1}"
## do_split_cluster
# The current MARS branch 0.1a.y is not yet constructed for forming
# a BigCluster constisting of several thousands of machines.
# When a future version of mars0.1b.y (or 0.2.y) will allow this,
# this can be disabled.
2018-09-26 13:48:23 +00:00
# do_split_cluster >= 2 means that the resulting MARS clusters should
# not exceed these number of members, when possible.
do_split_cluster="${do_split_cluster:-2}"
2018-07-10 12:02:58 +00:00
## forbidden_hosts
# Regex for excluding hostnames from any Football actions.
# The script will fail when some of these is encountered.
forbidden_hosts="${forbidden_hosts:-}"
## forbidden_flavours
# Regex for excluding flavours from any Football actions.
# The script will fail when some of these is encountered.
forbidden_flavours="${forbidden_flavours:-}"
## forbidden_bz_ids
# PROVISIONARY regex for excluding certain bz_ids from any Football actions.
# NOTICE: bz_ids are deprecated and should not be used in future
# (technical debts).
# The script will fail when some of these is encountered.
forbidden_bz_ids="${forbidden_bz_ids:-}"
2018-07-24 06:59:19 +00:00
## auto_two_phase
# When this is set, override the global migrate_two_phase parameter
# at runtime by ShaHoLin-specific checks
auto_two_phase="${auto_two_phase:-1}"
2018-07-10 12:02:58 +00:00
## clustertool_host
# URL prefix of the internal configuation database REST interface.
2018-07-24 06:59:19 +00:00
# Set this via *.preconf config files.
clustertool_host="${clustertool_host:-}"
2018-07-10 12:02:58 +00:00
## clustertool_user
# Username for clustertool access.
# By default, scans for a *.password file (see next option).
clustertool_user="${clustertool_user:-$(get_cred_file "*.password" | head -1 | sed 's:.*/::g' | cut -d. -f1)}"
## clustertool_passwd_file
# Here you can supply the encrpted password.
# By default, a file $clustertool_user.password is used
# containing the encrypted password.
clustertool_passwd_file="${clustertool_passwd_file:-$(get_cred_file "$clustertool_user.password")}"
## clustertool_passwd
# Here you may override the password via config file.
# For security reasons, dont provide this at the command line.
clustertool_passwd="${clustertool_passwd:-$(< $clustertool_passwd_file)}" || echo "cannot read a password file *.password for clustermw: you MUST supply the credentials via default curl config files (see man page)"
## do_migrate
# Keep this enabled. Only disable for testing.
do_migrate="${do_migrate:-1}" # must be enabled; disable for dry-run testing
## always_migrate
# Only use for testing, or for special situation.
# This skip the test whether the resource has already migration.
always_migrate="${always_migrate:-0}" # only enable for testing
## check_segments
# 0 = disabled
# 1 = only display the segment names
# 2 = check for equality
# WORKAROUND, potentially harmful when used inadequately.
# The historical physical segment borders need to be removed for
# Container Football.
# Unfortunately, the subproject aiming to accomplish this did not
# proceed for one year now. In the meantime, Container Football can
# be only played within the ancient segment borders.
# After this big impediment is eventually resolved, this option
# should be switched off.
check_segments="${check_segments:-1}"
## enable_mod_deflate
# Internal, for support.
enable_mod_deflate="${enable_mod_deflate:-1}"
## enable_segment_move
# Seems to be needed by some other tooling.
enable_segment_move="${enable_segment_move:-1}"
## override_hwclass_id
# When necessary, override this from $include_dir/plugins/*.conf
override_hwclass_id="${override_hwclass_id:-}" # typically 25007
## override_hvt_id
# When necessary, override this from $include_dir/plugins/*.conf
override_hvt_id="${override_hvt_id:-}" # typically 8057 or 8059
## override_overrides
# When this is set and other override_* variables are not set,
# then try to _guess_ some values.
# No guarantees for correctness either.
override_overrides=${override_overrides:-1}
## iqn_base and iet_type and iscsi_eth and iscsi_tid
# Workaround: this is needed for _dynamic_ generation of iSCSI sessions
# bypassing the ordinary ones as automatically generated by the
# cm3 cluster manager (only at the old istore architecture).
# Notice: not needed for regular operations, only for testing.
# Normally, you dont want to shrink over a _shared_ 1MBit iSCSI line.
iqn_base="${iqn_base:-iqn.2000-01.info.test:test}"
iet_type="${iet_type:-blockio}"
iscsi_eth="${iscsi_eth:-eth1}"
iscsi_tid="${iscsi_tid:-4711}"
## monitis_downtime_script
# ShaHoLin-internal
monitis_downtime_script="${monitis_downtime_script:-}"
## monitis_downtime_duration
# ShaHoLin-internal
2018-09-26 13:48:23 +00:00
monitis_downtime_duration="${monitis_downtime_duration:-60}" # Minutes
## orwell_downtime_script
# ShaHoLin-internal
orwell_downtime_script="${orwell_downtime_script:-}"
## orwell_tz
# Deal with differences in clock timezones.
orwell_tz="${orwell_tz:-Europe/Berlin}"
## orwell_downtime_duration
# ShaHoLin-internal
orwell_downtime_duration="${orwell_downtime_duration:-20}" # Minutes
## orwell_workaround_sleep
# Workaround for a race condition in Orwell.
# Try to ensure that another check has been executed before
# the downtime is removed.
# 0 = dont remove the downtime at all.
orwell_workaround_sleep="${orwell_workaround_sleep:-300}" # Seconds
2018-07-10 12:02:58 +00:00
2018-07-24 06:59:19 +00:00
## shaholin_customer_report_cmd
# Action script when the hardware has improved.
shaholin_customer_report_cmd="${shaholin_customer_report_cmd:-}"
## shaholin_min_cpus and shaholin_dst_cpus
shaholin_src_cpus="${shaholin_src_cpus:-4}"
shaholin_dst_cpus="${shaholin_dst_cpus:-32}"
2018-09-26 13:48:23 +00:00
## ip_renumber_cmd
# Cross-call with another independent project.
ip_renumber_cmd="${ip_renumber_cmd:-}"
2018-07-10 12:02:58 +00:00
## shaholin_finished_log
# ShaHoLin-specific logfile, reporting _only_ successful completion
# of an action.
shaholin_finished_log="${shaholin_finished_log:-$football_logdir/shaholin-finished.log}"
2018-09-26 13:48:23 +00:00
## update_cmd
2018-07-24 06:59:19 +00:00
# OPTIONAL: specific action script with parameters.
2018-09-26 13:48:23 +00:00
update_cmd="${update_cmd:-}"
## update_host
# To be provided in a *.conf or *.preconf file.
update_host="${update_host:-}"
## parse_ticket
# Regex for identifying tickets from script outputs or arguments
parse_ticket="${parse_ticket:-TECCM-[0-9]+}"
## prefer_parsed_ticket
# Workaround bugs from getting inconsistent ticket IDs from different sources.
prefer_parsed_ticket="${prefer_parsed_ticket:-0}"
## translate_db_state
# Whether to use the following mapping definitions.
translate_db_state="${translate_db_state:-0}"
## db_state_*
# Map logical names to the ones in the database.
db_state_init="${db_state_init:-}"
db_state_prepare="${db_state_prepare:-}"
db_state_finish="${db_state_finish:-}"
db_state_cleanup="${db_state_cleanup:-}"
db_state_done="${db_state_done:-}"
## use_type_for_ticket
# Internal ticketing convention.
use_type_for_ticket="${use_type_for_ticket:-1}"
2018-07-24 06:59:19 +00:00
## auto_handover
# Load-balancing accross locations.
# Works only together with the new syntax "cluster123".
# Depending on the number of syncs currently running, this
# will internally add --pre-hand and --post_hand options
# dynamically at runtime. This will spread much of the sync
# traffic to per-datacenter local behaviour.
# Notice: this may produce more total customer downtime when
# running a high parallelism degree.
# Thus it tries to reduce unnecessary handovers to other locations.
auto_handover="${auto_handover:-1}"
2018-09-26 13:48:23 +00:00
## preferred_location
# When set, override any other pre-handover to this location.
# Useful for maintenance of a whole datacenter.
preferred_location="${preferred_location:-}"
2018-07-24 06:59:19 +00:00
PLUGIN football-ticket
Generic plugin for creating and updating tickets,
e.g. Jira tickets.
You will need to hook in some external scripts which are
then creating / updating the tickets.
Comment texts may be provided with following conventions:
comment.$ticket_state.txt
comment.$ticket_phase.$ticket_state.txt
Directories where comments may reside:
football_creds=/usr/lib/mars/creds /etc/mars/creds /home/schoebel/mars/football-master.git/creds /home/schoebel/mars/football-master.git /home/schoebel/.mars/creds ./creds
football_confs=/usr/lib/mars/confs /etc/mars/confs /home/schoebel/mars/football-master.git/confs /home/schoebel/.mars/confs ./confs
football_includes=/usr/lib/mars/plugins /etc/mars/plugins /home/schoebel/mars/football-master.git/plugins /home/schoebel/.mars/plugins ./plugins
## enable_ticket
enable_ticket="${enable_ticket:-$(if [[ "$0" =~ tetris ]]; then echo 1; else echo 0; fi)}"
2018-07-10 12:02:58 +00:00
## ticket
2018-07-24 06:59:19 +00:00
# OPTIONAL: the meaning is installation specific.
# This can be used for identifying JIRA tickets.
2018-07-10 12:02:58 +00:00
# Can be set on the command line like "./tetris.sh $args --ticket=TECCM-4711
ticket="${ticket:-}"
## ticket_get_cmd
# Optional: when set, this script can be used for retrieving ticket IDs
# in place of commandline option --ticket=
2018-07-24 06:59:19 +00:00
# Retrieval should be unique by resource names.
# You may use any defined bash varibale by escaping them like
# $res .
# Example: ticket_get_cmd="my-ticket-getter-script.pl "$res""
2018-07-10 12:02:58 +00:00
ticket_get_cmd="${ticket_get_cmd:-}"
2018-07-24 06:59:19 +00:00
## ticket_create_cmd
# Optional: when set, this script can be used for creating new tickets.
# It will be called when $ticket_get_cmd does not retrieve anything.
# Example: ticket_create_cmd="my-ticket-create-script.pl "$res" "$target_primary""
# Afterwards, the new ticket needs to be retrievable via $ticket_get_cmd.
ticket_create_cmd="${ticket_create_cmd:-}"
2018-07-10 12:02:58 +00:00
## ticket_update_cmd
# This can be used for calling an external command which updates
# the ticket(s) given by the $ticket parameter.
2018-07-24 06:59:19 +00:00
# Example: ticket_update_cmd="my-script.pl "$ticket" "$res" "$ticket_phase" "$ticket_state""
2018-07-10 12:02:58 +00:00
ticket_update_cmd="${ticket_update_cmd:-}"
2018-07-24 06:59:19 +00:00
## ticket_require_comment
# Only update a ticket when a comment file exists in one of the
# directories $football_creds $football_confs $football_includes
ticket_require_comment="${ticket_require_comment:-1}"
2018-07-10 12:02:58 +00:00
2018-09-26 13:48:23 +00:00
## ticket_for_migrate
# Optional 1&1-specific: separate ticket for migrate.
# Useful when migrate+shink need to post into separate tickets.
ticket_for_migrate="${ticket_for_migrate:-}"
## ticket_for_shrink
# Optional 1&1-specific: separate ticket for migrate.
# Useful when migrate+shink need to post into separate tickets.
ticket_for_shrink="${ticket_for_shrink:-}"
## ticket_prefer_cached
# Workaround a bug in ticket ID retrieval:
# Trust my own cached values more than trust the "inconsistent read".
ticket_prefer_cached="${ticket_prefer_cached:-1}"
## ticket_code
# List of operation:res:shard
ticket_code="${ticket_code:-}"
## get_ticket_code
get_ticket_code="${get_ticket_code:-}"
## max_start_ticket
# Maximum number of instances to start per call
max_start_ticket="${max_start_ticket:-1}"
2018-07-10 12:02:58 +00:00
PLUGIN football-basic
Generic driver for systemd-controlled MARS pools.
The current version supports only a flat model:
(1) There is a single "big cluster" at metadata level.
All cluster members are joined via merge-cluster.
All occurring names need to be globally unique.
(2) The network uses BGP or other means, thus any hypervisor
can (potentially) start any VM at any time.
(3) iSCSI or remote devices are not supported for now
(LocalSharding model). This may be extended in a future
release.
This plugin is exclusive-or with cm3.
Plugin specific actions:
./football.sh basic_add_host <hostname>
Manually add another host to the hostname cache.
## pool_cache_dir
# Directory for caching the pool status.
pool_cache_dir="${pool_cache_dir:-$script_dir/pool-cache}"
## initial_hostname_file
# This file must contain a list of storage and/or hypervisor hostnames
# where a /mars directory must exist.
# These hosts are then scanned for further cluster members,
# and the transitive closure of all host names is computed.
initial_hostname_file="${initial_hostname_file:-./hostnames.input}"
## hostname_cache
# This file contains the transitive closure of all host names.
hostname_cache="${hostname_cache:-$pool_cache_dir/hostnames.cache}"
## resources_cache
# This file contains the transitive closure of all resource names.
resources_cache="${resources_cache:-$pool_cache_dir/resources.cache}"
## res2hyper_cache
# This file contains the association between resources and hypervisors.
res2hyper_cache="${res2hyper_cache:-$pool_cache_dir/res2hyper.assoc}"
## enable_basic
# This plugin is exclusive-or with cm3.
enable_basic="${enable_basic:-$(if [[ "$0" =~ football ]]; then echo 1; else echo 0; fi)}"
## ssh_port
# Set this for separating sysadmin access from customer access
ssh_port="${ssh_port:-}"
## basic_mnt_dir
# Names the mountpoint directory at hypervisors.
# This must co-incide with the systemd mountpoints.
basic_mnt_dir="${basic_mnt_dir:-/mnt}"
PLUGIN football-downtime
Generic plugin for communication of customer downtime.
## downtime_cmd_{set,unset}
# External command for setting / unsetting (or communicating) a downtime
# Empty = don't do anything
downtime_cmd_set="${downtime_cmd_set:-}"
downtime_cmd_unset="${downtime_cmd_unset:-}"
PLUGIN football-motd
Generic plugin for motd. Communicate that Football is running
at login via motd.
## enable_motd
# whether to use the motd plugin.
enable_motd="${enable_motd:-0}"
## update_motd_cmd
# Distro-specific command for generating motd from several sources.
# Only tested for Debian Jessie at the moment.
update_motd_cmd="${update_motd_cmd:-update-motd}"
## download_motd_script and motd_script_dir
# When no script has been installed into /etc/update-motd.d/
# you can do it dynamically here, bypassing any "official" deployment
# methods. Use this only for testing!
# An example script (which should be deployed via your ordinary methods)
# can be found under $script_dir/update-motd.d/67-football-running
download_motd_script="${download_motd_script:-}"
motd_script_dir="${motd_script_dir:-/etc/update-motd.d}"
## motd_file
# This will contain the reported motd message.
# It is created by this plugin.
motd_file="${motd_file:-/var/motd/football.txt}"
## motd_color_on and motd_color_off
# ANSI escape sequences for coloring the generated motd message.
motd_color_on="${motd_color_on:-\\033[31m}"
motd_color_off="${motd_color_off:-\\033[0m}"
PLUGIN football-report
Generic plugin for communication of reports.
## report_cmd_{start,warning,failed,finished}
# External command which is called at start / failure / finish
# of Football.
# The following variables can be used (e.g. as parameters) when
# escaped with a backslash:
# $res = name of the resource (LV, container, etc)
# $primary = the current (old)
# $secondary_list = list of current (old) secondaries
# $target_primary = the target primary name
# $target_secondary = list of target secondaries
# $operation = the operation name
# $target_percent = the value used for shrinking
# $txt = some informative text from Football
# Further variables are possible by looking at the sourcecode, or by
# defining your own variables or functions externally or via plugins.
# Empty = don't do anything
report_cmd_start="${report_cmd_start:-}"
report_cmd_warning="${report_cmd_warning:-$script_dir/screener.sh notify "$res" warning "$txt"}"
report_cmd_failed="${report_cmd_failed:-}"
report_cmd_finished="${report_cmd_finished:-}"
PLUGIN football-waiting
Generic plugig, interfacing with screener: when this is used
by your script and enabled, then you will be able to wait for
"screener.sh continue" operations at certain points in your
script.
## enable_*_waiting
#
# When this is enabled, and when Football had been started by screener,
# then football will delay the start of several operations until a sysadmin
# does one of the following manually:
#
# a) ./screener.sh continue $session
# b) ./screener.sh resume $session
# c) ./screener.sh attach $session and press the RETURN key
# d) doing nothing, and $wait_timeout has exceeded
#
# CONVENTION: football resource names are used as screener session ids.
# This ensures that only 1 operation can be started for the same resource,
# and it simplifies the handling for junior sysadmins.
#
enable_startup_waiting="${enable_startup_waiting:-0}"
enable_handover_waiting="${enable_handover_waiting:-0}"
enable_migrate_waiting="${enable_migrate_waiting:-0}"
enable_shrink_waiting="${enable_shrink_waiting:-0}"
## enable_cleanup_delayed and wait_before_cleanup
# By setting this, you can delay the cleanup operations for some time.
# This way, you are keeping the old LV contents as a kind of "backup"
# for some limited time.
2018-09-26 13:48:23 +00:00
#
# HINT1: dont set wait_before_cleanup to very large values, because it can
# seriously slow down Football.
#
# HINT2: the waiting time starts when the last MARS replica was created.
# Only when the syncing times are _smaller_ than this value,
# an _additional_ delay will be produced.
2018-07-10 12:02:58 +00:00
enable_cleanup_delayed="${enable_cleanup_delayed:-0}"
wait_before_cleanup="${wait_before_cleanup:-180}" # Minutes
## reduce_wait_msg
# Instead of reporting the waiting status once per minute,
# decrease the frequency of resporting.
# Warning: dont increase this too much. Do not exceed
# session_timeout/2 from screener. Because of the Nyquist criterion,
# stay on the safe side by setting session_timeout at least to _twice_
# the time than here.
reduce_wait_msg="${reduce_wait_msg:-60}" # Minutes
2018-04-23 13:25:32 +00:00
\end{verbatim}