mirror of
https://github.com/schoebel/mars
synced 2024-12-17 20:24:52 +00:00
doc: appendix --help commands
This commit is contained in:
parent
63be004b9e
commit
c059daa99d
574
docu/football-verbose.help
Normal file
574
docu/football-verbose.help
Normal file
@ -0,0 +1,574 @@
|
||||
\begin{verbatim}
|
||||
verbose=1
|
||||
Usage:
|
||||
./football.sh --help [--verbose]
|
||||
Show help
|
||||
./football.sh --variable=<value>
|
||||
Override any shell variable
|
||||
|
||||
Actions for resource migration:
|
||||
|
||||
./football.sh migrate <resource> <target_primary> [<target_secondary>]
|
||||
Run the sequence
|
||||
migrate_prepare ; migrate_wait ; migrate_finish; migrate_cleanup.
|
||||
|
||||
./football.sh migrate_prepare <resource> <target_primary> [<target_secondary>]
|
||||
Allocate LVM space at the targets and start MARS replication.
|
||||
|
||||
./football.sh migrate_wait <resource> <target_primary> [<target_secondary>]
|
||||
Wait until MARS replication reports UpToDate.
|
||||
|
||||
./football.sh migrate_finish <resource> <target_primary> [<target_secondary>]
|
||||
Call hooks for handover to the targets.
|
||||
|
||||
./football.sh migrate_cleanup <resource>
|
||||
Remove old / currently unused LV replicas from MARS and deallocate
|
||||
from LVM.
|
||||
|
||||
Actions for (manual) repair in emergency situations:
|
||||
|
||||
./football.sh manual_migrate_config <resource> <target_primary> [<target_secondary>]
|
||||
Transfer only the cluster config, without changing the MARS replicas.
|
||||
This does no resource stopping / restarting.
|
||||
Useful for reverting a failed migration.
|
||||
|
||||
./football.sh manual_config_update <hostname>
|
||||
Only update the cluster config, without changing anything else.
|
||||
Useful for manual repair of failed migration.
|
||||
|
||||
./football.sh manual_merge_cluster <hostname1> <hostname2>
|
||||
Run "marsadm merge-cluster" for the given hosts.
|
||||
Hostnames must be from different (former) clusters.
|
||||
|
||||
./football.sh manual_split_cluster <hostname_list>
|
||||
Run "marsadm split-cluster" at the given hosts.
|
||||
Useful for fixing failed / asymmetric splits.
|
||||
Hint: provide _all_ hostnames which have formerly participated
|
||||
in the cluster.
|
||||
|
||||
./football.sh repair_vm <resource> <primary_candidate_list>
|
||||
Try to restart the VM <resource> on one of the given machines.
|
||||
Useful during unexpected customer downtime.
|
||||
|
||||
./football.sh repair_mars <resource> <primary_candidate_list>
|
||||
Before restarting the VM like in repair_vm, try to find a local
|
||||
LV where a stand-alone MARS resource can be found and built up.
|
||||
Use this only when the MARS resources are gone, and when you are
|
||||
desperate. Problem: this will likely create a MARS setup which is
|
||||
not usable for production, and therefore must be corrected later
|
||||
by hand. Use this only during an emergency situation in order to
|
||||
get the customers online again, while buying the downsides of this
|
||||
command.
|
||||
|
||||
Actions for inplace FS shrinking:
|
||||
|
||||
./football.sh shrink <resource> <percent>
|
||||
Run the sequence shrink_prepare ; shrink_finish ; shrink_cleanup.
|
||||
|
||||
./football.sh shrink_prepare <resource> [<percent>]
|
||||
Allocate temporary LVM space (when possible) and create initial
|
||||
raw FS copy.
|
||||
Default percent value(when left out) is 85.
|
||||
|
||||
./football.sh shrink_finish <resource>
|
||||
Incrementally update the FS copy, swap old <=> new copy with
|
||||
small downtime.
|
||||
|
||||
./football.sh shrink_cleanup <resource>
|
||||
Remove old FS copy from LVM.
|
||||
|
||||
Actions for inplace FS extension:
|
||||
|
||||
./football.sh extend <resource> <percent>
|
||||
|
||||
Combined actions:
|
||||
|
||||
./football.sh migrate+shrink <resource> <target_primary> [<target_secondary>] [<percent>]
|
||||
Similar to migrate ; shrink but produces less network traffic.
|
||||
Default percent value (when left out) is 85.
|
||||
|
||||
./football.sh migrate+shrink+back <resource> <tmp_primary> [<percent>]
|
||||
Migrate temporarily to <tmp_primary>, then shrink there,
|
||||
finally migrate back to old primary and secondaries.
|
||||
Default percent value (when left out) is 85.
|
||||
|
||||
Global maintenance:
|
||||
|
||||
./football.sh lv_cleanup <resource>
|
||||
|
||||
General features:
|
||||
|
||||
- Instead of <percent>, an absolute amount of storage with suffix
|
||||
'k' or 'm' or 'g' can be given.
|
||||
|
||||
- When <resource> is currently stopped, login to the container is
|
||||
not possible, and in turn the hypervisor node and primary storage node
|
||||
cannot be automatically determined. In such a case, the missing
|
||||
nodes can be specified via the syntax
|
||||
<resource>:<hypervisor>:<primary_storage>
|
||||
|
||||
- The following LV suffixes are used (naming convention):
|
||||
-tmp = currently emerging version for shrinking
|
||||
-preshrink = old version before shrinking took place
|
||||
|
||||
- By adding the option --screener, you can handover football execution
|
||||
to ./screener.sh .
|
||||
When some --enable_*_waiting is also added, then the critical
|
||||
sections involving customer downtime are temporarily halted until
|
||||
some sysadmins says "screener.sh continue $resource" or
|
||||
attaches to the sessions and presses the RETURN key.
|
||||
|
||||
## football_includes
|
||||
# List of directories where football-*.conf files can be found.
|
||||
football_includes="${football_includes:-/usr/lib/mars/plugins /etc/mars/plugins $script_dir/plugins $HOME/.mars/plugins ./plugins}"
|
||||
|
||||
## dry_run
|
||||
# When set, actions are only simulated.
|
||||
dry_run=${dry_run:-0}
|
||||
|
||||
## verbose
|
||||
# increase speakiness.
|
||||
verbose=${verbose:-0}
|
||||
|
||||
## confirm
|
||||
# Only for debugging: manually started operations can be
|
||||
# manually checked and confirmed before actually starting opersions.
|
||||
confirm=${confirm:-1}
|
||||
|
||||
## force
|
||||
# Normally, shrinking and extending will only be started if there
|
||||
# is something to do.
|
||||
# Enable this for debugging and testing: the check is then skipped.
|
||||
force=${force:-0}
|
||||
|
||||
## debug_injection_point
|
||||
# RTFS don't set this unless you are a developer knowing what you are doing.
|
||||
debug_injection_point="${debug_injection_point:-0}"
|
||||
|
||||
## football_logdir
|
||||
# Where the logfiles should be created.
|
||||
# HINT: after playing Football in masses for a whiile, your $logdir will
|
||||
# be easily populated with hundreds or thousands of logfiles.
|
||||
# Set this to your convenience.
|
||||
football_logdir="${football_logdir:-${logdir:-$HOME/football-logs}}"
|
||||
|
||||
## screener
|
||||
# When enabled, handover execution to the screener.
|
||||
# Very useful for running Football in masses.
|
||||
screener="${screener:-0}"
|
||||
|
||||
## min_space
|
||||
# When testing / debugging with extremely small LVs, it may happen
|
||||
# that mkfs refuses to create extemely small filesystems.
|
||||
# Use this to ensure a minimum size.
|
||||
min_space="${min_space:-20000000}"
|
||||
|
||||
## cache_repeat_lapse
|
||||
# When using the waiting capabilities of screener, and when waits
|
||||
# are lasting very long, your dentry cache may become cold.
|
||||
# Use this for repeated refreshes of the dentry cache after some time.
|
||||
cache_repeat_lapse="${cache_repeat_lapse:-120}" # Minutes
|
||||
|
||||
## ssh_opt
|
||||
# Useful for customization to your ssh environment.
|
||||
ssh_opt="${ssh_opt:--4 -A -o StrictHostKeyChecking=no -o ForwardX11=no -o KbdInteractiveAuthentication=no -o VerifyHostKeyDNS=no}"
|
||||
|
||||
## rsync_opt
|
||||
# The rsync options in general.
|
||||
# IMPORTANT: some intermediate progress report is absolutely needed,
|
||||
# because otherwise a false-positive TIMEOUT may be assumed when
|
||||
# no output is generated for several hours.
|
||||
rsync_opt="${rsync_opt:- -aSH --info=progress2,STATS}"
|
||||
|
||||
## rsync_opt_prepare
|
||||
# Additional rsync options for preparation and updating
|
||||
# of the temporary shrink mirror filesystem.
|
||||
rsync_opt_prepare="${rsync_opt_prepare:---exclude='.filemon2' --delete}"
|
||||
|
||||
## rsync_nice
|
||||
# Typically, the preparation steps are run with background priority.
|
||||
rsync_nice="${rsync_nice:-nice -19}"
|
||||
|
||||
## rsync_repeat_prepare and rsync_repeat_hot
|
||||
# Tuning: increases the reliability of rsync and ensures that the dentry cache
|
||||
# remains hot.
|
||||
rsync_repeat_prepare="${rsync_repeat_prepare:-5}"
|
||||
rsync_repeat_hot="${rsync_repeat_hot:-3}"
|
||||
|
||||
## wait_timeout
|
||||
# Avoid infinite loops upon waiting.
|
||||
wait_timeout="${wait_timeout:-$(( 24 * 60 ))}" # Minutes
|
||||
|
||||
## lvremove_opt
|
||||
# Some LVM versions are requiring this for unattended batch operations.
|
||||
lvremove_opt="${lvremove_opt:--f}"
|
||||
|
||||
## critical_status
|
||||
# This is the "magic" exit code indicating _criticality_
|
||||
# of a failed command.
|
||||
critical_status="${critical_status:-199}"
|
||||
|
||||
## serious_status
|
||||
# This is the "magic" exit code indicating _seriosity_
|
||||
# of a failed command.
|
||||
serious_status="${serious_status:-198}"
|
||||
|
||||
## pre_hand or --pre-hand=
|
||||
# Set this to do an ordinary to a new start position before doing
|
||||
# anything else. This may be used for handover to a different datacenter
|
||||
# and running Football there.
|
||||
pre_hand="${pre_hand:-}"
|
||||
|
||||
## startup_when_locked
|
||||
# When == 0:
|
||||
# Don't abort and don't wait when a lock is detected at startup.
|
||||
# When == 1 and when enable_startup_waiting=1:
|
||||
# Wait until the lock is gone.
|
||||
# When == 2:
|
||||
# Abort start of script execution when a lock is detected.
|
||||
# Later, when a locks are set _during_ execution, they will
|
||||
# be obeyed when enable_*_waiting is set (instead), and will
|
||||
# lead to waits instead of aborts.
|
||||
startup_when_locked="${startup_when_locked:-1}"
|
||||
|
||||
## user_name
|
||||
# Normally automatically derived from ssh agent or from $LOGNAME.
|
||||
# Please override this only when really necessary.
|
||||
export user_name="${user_name:-$(get_real_ssh_user)}"
|
||||
export user_name="${user_name:-$LOGNAME}"
|
||||
|
||||
|
||||
PLUGIN football-cm3
|
||||
|
||||
1&1 specfic plugin for dealing with the cm3 cluster manager
|
||||
and its concrete operating enviroment (singleton instance).
|
||||
|
||||
Current maximum cluster size limit:
|
||||
|
||||
Maximum #syncs running before migration can start:
|
||||
|
||||
Following marsadm --version must be installed:
|
||||
|
||||
Following mars kernel modules must be loaded:
|
||||
|
||||
## enable_cm3
|
||||
# ShaHoLin-specifc plugin for working with the infong platform
|
||||
# (istore, icpu, infong) via 1&1-specific clustermanager cm3
|
||||
# and related toolsets. Much of it is bound to a singleton database
|
||||
# instance (clustermw & siblings).
|
||||
enable_cm3="${enable_cm3:-$(if [[ "$0" =~ tetris ]]; then echo 1; else echo 0; fi)}"
|
||||
|
||||
## skip_resource_ping
|
||||
# Enable this only for testing. Normally, a resource name denotes a
|
||||
# container name == machine name which must be runnuing as a precondition,
|
||||
# und thus must be pingable over network.
|
||||
skip_resource_ping="${skip_resource_ping:-0}"
|
||||
|
||||
## date_lock
|
||||
# Don't enter critical sections at certain days of the week,
|
||||
# and/or during certain hours.
|
||||
# This is a regex matching against "date +%u_%H"
|
||||
date_lock="${date_lock:-}"
|
||||
|
||||
## workaround_firewall
|
||||
# Documentation of technical debt for later generations:
|
||||
# This is needed since July 2017. In the many years before, no firewalling
|
||||
# was effective at the replication network, because it is a physically
|
||||
# separate network from the rest of the networking infrastructure.
|
||||
# An attacker would first need to gain root access to the _hypervisor_
|
||||
# (not only to the LXC container and/or to KVM) before gaining access to
|
||||
# those physical replication network interfaces.
|
||||
# Since about that time, which is about the same time when the requirements
|
||||
# for Container Football had been communicated, somebody introduced some
|
||||
# unnecessary firewall rules, based on "security arguments".
|
||||
# These arguments were however explicitly _not_ required by the _real_
|
||||
# security responsible person, and explicitly _not_ recommended by him.
|
||||
# Now the problem is that it is almost politically impossible to get
|
||||
# rid of suchalike "security feature".
|
||||
# Until the problem is resolved, Container Football requires
|
||||
# the _entire_ local firewall to be _temporarily_ shut down in order to
|
||||
# allow marsadm commands over ssh to work.
|
||||
# Notice: this is _not_ increasing the general security in any way.
|
||||
# LONGTERM solution / TODO: future versions of mars should no longer
|
||||
# depend on ssh.
|
||||
# Then this "feature" can be turned off.
|
||||
workaround_firewall="${workaround_firewall:-1}"
|
||||
|
||||
## ip_magic
|
||||
# Similarly to workaround_firewall, this is needed since somebody
|
||||
# introduced additional firewall rules also disabling sysadmin ssh
|
||||
# connections at the _ordinary_ sysadmin network.
|
||||
ip_magic="${ip_magic:-1}"
|
||||
|
||||
## do_split_cluster
|
||||
# The current MARS branch 0.1a.y is not yet constructed for forming
|
||||
# a BigCluster constisting of several thousands of machines.
|
||||
# When a future version of mars0.1b.y (or 0.2.y) will allow this,
|
||||
# this can be disabled.
|
||||
do_split_cluster="${do_split_cluster:-1}"
|
||||
|
||||
## clustertool_host
|
||||
# URL prefix of the internal configuation database REST interface.
|
||||
clustertool_host="${clustertool_host:-http://clustermw:3042}"
|
||||
|
||||
## clustertool_user
|
||||
# Username for clustertool access.
|
||||
# By default, scans for a *.password file (see next option).
|
||||
clustertool_user="${clustertool_user:-$(shopt -u nullglob; ls *.password | head -1 | cut -d. -f1)}" || echo "cannot find a password file *.password for clustermw: you MUST supply the credentials via default curl config files (see man page)"
|
||||
|
||||
## clustertool_passwd
|
||||
# Here you can supply the encrpted password.
|
||||
# By default, a file $clustertool_user.password is used
|
||||
# containing the encrypted password.
|
||||
clustertool_passwd="${clustertool_passwd:-$([[ -r $clustertool_user.password ]] && cat $clustertool_user.password)}"
|
||||
|
||||
## do_migrate
|
||||
# Keep this enabled. Only disable for testing.
|
||||
do_migrate="${do_migrate:-1}" # must be enabled; disable for dry-run testing
|
||||
|
||||
## always_migrate
|
||||
# Only use for testing, or for special situation.
|
||||
# This skip the test whether the resource has already migration.
|
||||
always_migrate="${always_migrate:-0}" # only enable for testing
|
||||
|
||||
## check_segments
|
||||
# 0 = disabled
|
||||
# 1 = only display the segment names
|
||||
# 2 = check for equality
|
||||
# WORKAROUND, potentially harmful when used inadequately.
|
||||
# The historical physical segment borders need to be removed for
|
||||
# Container Football.
|
||||
# Unfortunately, the subproject aiming to accomplish this did not
|
||||
# proceed for one year now. In the meantime, Container Football can
|
||||
# be only played within the ancient segment borders.
|
||||
# After this big impediment is eventually resolved, this option
|
||||
# should be switched off.
|
||||
check_segments="${check_segments:-1}"
|
||||
|
||||
## backup_dir
|
||||
# Directory for keeping JSON backups of clustermw.
|
||||
backup_dir="${backup_dir:-.}"
|
||||
|
||||
## enable_mod_deflate
|
||||
# Internal, for support.
|
||||
enable_mod_deflate="${enable_mod_deflate:-1}"
|
||||
|
||||
## enable_segment_move
|
||||
# Seems to be needed by some other tooling.
|
||||
enable_segment_move="${enable_segment_move:-1}"
|
||||
|
||||
## override_hwclass_id
|
||||
# When necessary, override this from $include_dir/plugins/*.conf
|
||||
override_hwclass_id="${override_hwclass_id:-25007}"
|
||||
|
||||
## override_hvt_id
|
||||
# When necessary, override this from $include_dir/plugins/*.conf
|
||||
override_hvt_id="${override_hvt_id:-8059}"
|
||||
|
||||
## iqn_base and iet_type and iscsi_eth and iscsi_tid
|
||||
# Workaround: this is needed for _dynamic_ generation of iSCSI sessions
|
||||
# bypassing the ordinary ones as automatically generated by the
|
||||
# cm3 cluster manager (only at the old istore architecture).
|
||||
# Notice: not needed for regular operations, only for testing.
|
||||
# Normally, you dont want to shrink over a _shared_ 1MBit iSCSI line.
|
||||
iqn_base="${iqn_base:-iqn.2000-01.info.test:test}"
|
||||
iet_type="${iet_type:-blockio}"
|
||||
iscsi_eth="${iscsi_eth:-eth1}"
|
||||
iscsi_tid="${iscsi_tid:-4711}"
|
||||
|
||||
## monitis_downtime_script
|
||||
# ShaHoLin-internal
|
||||
monitis_downtime_script="${monitis_downtime_script:-}"
|
||||
|
||||
## monitis_downtime_duration
|
||||
# ShaHoLin-internal
|
||||
monitis_downtime_duration="${monitis_downtime_duration:-20}" # Minutes
|
||||
|
||||
## shaholin_finished_log
|
||||
# ShaHoLin-specific logfile, reporting _only_ successful completion
|
||||
# of an action.
|
||||
shaholin_finished_log="${shaholin_finished_log:-$football_logdir/shaholin-finished.log}"
|
||||
|
||||
## ticket
|
||||
# OPTIONAL: the meaning is ShaHoLin specific.
|
||||
# This can be used for updating JIRA tickets.
|
||||
# Can be set on the command line like "./tetris.sh $args --ticket=TECCM-4711
|
||||
ticket="${ticket:-}"
|
||||
|
||||
## ticket_get_cmd
|
||||
# Optional: when set, this script can be used for retrieving ticket IDs
|
||||
# in place of commandline option --ticket=
|
||||
ticket_get_cmd="${ticket_get_cmd:-}"
|
||||
|
||||
## ticket_update_cmd
|
||||
# This can be used for calling an external command which updates
|
||||
# the ticket(s) given by the $ticket parameter.
|
||||
ticket_update_cmd="${ticket_update_cmd:-}"
|
||||
|
||||
## shaholin_action
|
||||
# OPTIONAL: specific action script with parameters.
|
||||
shaholin_action="${shaholin_action:-}"
|
||||
|
||||
|
||||
PLUGIN football-basic
|
||||
|
||||
Generic driver for systemd-controlled MARS pools.
|
||||
The current version supports only a flat model:
|
||||
(1) There is a single "big cluster" at metadata level.
|
||||
All cluster members are joined via merge-cluster.
|
||||
All occurring names need to be globally unique.
|
||||
(2) The network uses BGP or other means, thus any hypervisor
|
||||
can (potentially) start any VM at any time.
|
||||
(3) iSCSI or remote devices are not supported for now
|
||||
(LocalSharding model). This may be extended in a future
|
||||
release.
|
||||
This plugin is exclusive-or with cm3.
|
||||
|
||||
Plugin specific actions:
|
||||
|
||||
./football.sh basic_add_host <hostname>
|
||||
Manually add another host to the hostname cache.
|
||||
|
||||
## pool_cache_dir
|
||||
# Directory for caching the pool status.
|
||||
pool_cache_dir="${pool_cache_dir:-$script_dir/pool-cache}"
|
||||
|
||||
## initial_hostname_file
|
||||
# This file must contain a list of storage and/or hypervisor hostnames
|
||||
# where a /mars directory must exist.
|
||||
# These hosts are then scanned for further cluster members,
|
||||
# and the transitive closure of all host names is computed.
|
||||
initial_hostname_file="${initial_hostname_file:-./hostnames.input}"
|
||||
|
||||
## hostname_cache
|
||||
# This file contains the transitive closure of all host names.
|
||||
hostname_cache="${hostname_cache:-$pool_cache_dir/hostnames.cache}"
|
||||
|
||||
## resources_cache
|
||||
# This file contains the transitive closure of all resource names.
|
||||
resources_cache="${resources_cache:-$pool_cache_dir/resources.cache}"
|
||||
|
||||
## res2hyper_cache
|
||||
# This file contains the association between resources and hypervisors.
|
||||
res2hyper_cache="${res2hyper_cache:-$pool_cache_dir/res2hyper.assoc}"
|
||||
|
||||
## enable_basic
|
||||
# This plugin is exclusive-or with cm3.
|
||||
enable_basic="${enable_basic:-$(if [[ "$0" =~ football ]]; then echo 1; else echo 0; fi)}"
|
||||
|
||||
## ssh_port
|
||||
# Set this for separating sysadmin access from customer access
|
||||
ssh_port="${ssh_port:-}"
|
||||
|
||||
## basic_mnt_dir
|
||||
# Names the mountpoint directory at hypervisors.
|
||||
# This must co-incide with the systemd mountpoints.
|
||||
basic_mnt_dir="${basic_mnt_dir:-/mnt}"
|
||||
|
||||
|
||||
PLUGIN football-motd
|
||||
|
||||
Generic plugin for motd. Communicate that Football is running
|
||||
at login via motd.
|
||||
|
||||
## enable_motd
|
||||
# whether to use the motd plugin.
|
||||
enable_motd="${enable_motd:-0}"
|
||||
|
||||
## update_motd_cmd
|
||||
# Distro-specific command for generating motd from several sources.
|
||||
# Only tested for Debian Jessie at the moment.
|
||||
update_motd_cmd="${update_motd_cmd:-update-motd}"
|
||||
|
||||
## download_motd_script and motd_script_dir
|
||||
# When no script has been installed into /etc/update-motd.d/
|
||||
# you can do it dynamically here, bypassing any "official" deployment
|
||||
# methods. Use this only for testing!
|
||||
# An example script (which should be deployed via your ordinary methods)
|
||||
# can be found under $script_dir/update-motd.d/67-football-running
|
||||
download_motd_script="${download_motd_script:-}"
|
||||
motd_script_dir="${motd_script_dir:-/etc/update-motd.d}"
|
||||
|
||||
## motd_file
|
||||
# This will contain the reported motd message.
|
||||
# It is created by this plugin.
|
||||
motd_file="${motd_file:-/var/motd/football.txt}"
|
||||
|
||||
## motd_color_on and motd_color_off
|
||||
# ANSI escape sequences for coloring the generated motd message.
|
||||
motd_color_on="${motd_color_on:-\\033[31m}"
|
||||
motd_color_off="${motd_color_off:-\\033[0m}"
|
||||
|
||||
|
||||
PLUGIN football-report
|
||||
|
||||
Generic plugin for communication of reports.
|
||||
|
||||
## report_cmd_{start,warning,failed,finished}
|
||||
# External command which is called at start / failure / finish
|
||||
# of Football.
|
||||
# The following variables can be used (e.g. as parameters) when
|
||||
# escaped with a backslash:
|
||||
# $res = name of the resource (LV, container, etc)
|
||||
# $primary = the current (old)
|
||||
# $secondary_list = list of current (old) secondaries
|
||||
# $target_primary = the target primary name
|
||||
# $target_secondary = list of target secondaries
|
||||
# $operation = the operation name
|
||||
# $target_percent = the value used for shrinking
|
||||
# $txt = some informative text from Football
|
||||
# Further variables are possible by looking at the sourcecode, or by
|
||||
# defining your own variables or functions externally or via plugins.
|
||||
# Empty = don't do anything
|
||||
report_cmd_start="${report_cmd_start:-}"
|
||||
report_cmd_warning="${report_cmd_warning:-$script_dir/screener.sh notify "$res" warning "$txt"}"
|
||||
report_cmd_failed="${report_cmd_failed:-}"
|
||||
report_cmd_finished="${report_cmd_finished:-}"
|
||||
|
||||
|
||||
PLUGIN football-waiting
|
||||
|
||||
Generic plugig, interfacing with screener: when this is used
|
||||
by your script and enabled, then you will be able to wait for
|
||||
"screener.sh continue" operations at certain points in your
|
||||
script.
|
||||
|
||||
## enable_*_waiting
|
||||
#
|
||||
# When this is enabled, and when Football had been started by screener,
|
||||
# then football will delay the start of several operations until a sysadmin
|
||||
# does one of the following manually:
|
||||
#
|
||||
# a) ./screener.sh continue $session
|
||||
# b) ./screener.sh resume $session
|
||||
# c) ./screener.sh attach $session and press the RETURN key
|
||||
# d) doing nothing, and $wait_timeout has exceeded
|
||||
#
|
||||
# CONVENTION: football resource names are used as screener session ids.
|
||||
# This ensures that only 1 operation can be started for the same resource,
|
||||
# and it simplifies the handling for junior sysadmins.
|
||||
#
|
||||
enable_startup_waiting="${enable_startup_waiting:-0}"
|
||||
enable_handover_waiting="${enable_handover_waiting:-0}"
|
||||
enable_migrate_waiting="${enable_migrate_waiting:-0}"
|
||||
enable_shrink_waiting="${enable_shrink_waiting:-0}"
|
||||
|
||||
## enable_cleanup_delayed and wait_before_cleanup
|
||||
# By setting this, you can delay the cleanup operations for some time.
|
||||
# This way, you are keeping the old LV contents as a kind of "backup"
|
||||
# for some limited time.
|
||||
# HINT: dont set to wait_before_cleanuplarge values, because it can
|
||||
# seriously slow down Football.
|
||||
enable_cleanup_delayed="${enable_cleanup_delayed:-0}"
|
||||
wait_before_cleanup="${wait_before_cleanup:-180}" # Minutes
|
||||
|
||||
## reduce_wait_msg
|
||||
# Instead of reporting the waiting status once per minute,
|
||||
# decrease the frequency of resporting.
|
||||
# Warning: dont increase this too much. Do not exceed
|
||||
# session_timeout/2 from screener. Because of the Nyquist criterion,
|
||||
# stay on the safe side by setting session_timeout at least to _twice_
|
||||
# the time than here.
|
||||
reduce_wait_msg="${reduce_wait_msg:-60}" # Minutes
|
||||
|
||||
\end{verbatim}
|
173
docu/football.help
Normal file
173
docu/football.help
Normal file
@ -0,0 +1,173 @@
|
||||
\begin{verbatim}
|
||||
Usage:
|
||||
./football.sh --help [--verbose]
|
||||
Show help
|
||||
./football.sh --variable=<value>
|
||||
Override any shell variable
|
||||
|
||||
Actions for resource migration:
|
||||
|
||||
./football.sh migrate <resource> <target_primary> [<target_secondary>]
|
||||
Run the sequence
|
||||
migrate_prepare ; migrate_wait ; migrate_finish; migrate_cleanup.
|
||||
|
||||
./football.sh migrate_prepare <resource> <target_primary> [<target_secondary>]
|
||||
Allocate LVM space at the targets and start MARS replication.
|
||||
|
||||
./football.sh migrate_wait <resource> <target_primary> [<target_secondary>]
|
||||
Wait until MARS replication reports UpToDate.
|
||||
|
||||
./football.sh migrate_finish <resource> <target_primary> [<target_secondary>]
|
||||
Call hooks for handover to the targets.
|
||||
|
||||
./football.sh migrate_cleanup <resource>
|
||||
Remove old / currently unused LV replicas from MARS and deallocate
|
||||
from LVM.
|
||||
|
||||
Actions for (manual) repair in emergency situations:
|
||||
|
||||
./football.sh manual_migrate_config <resource> <target_primary> [<target_secondary>]
|
||||
Transfer only the cluster config, without changing the MARS replicas.
|
||||
This does no resource stopping / restarting.
|
||||
Useful for reverting a failed migration.
|
||||
|
||||
./football.sh manual_config_update <hostname>
|
||||
Only update the cluster config, without changing anything else.
|
||||
Useful for manual repair of failed migration.
|
||||
|
||||
./football.sh manual_merge_cluster <hostname1> <hostname2>
|
||||
Run "marsadm merge-cluster" for the given hosts.
|
||||
Hostnames must be from different (former) clusters.
|
||||
|
||||
./football.sh manual_split_cluster <hostname_list>
|
||||
Run "marsadm split-cluster" at the given hosts.
|
||||
Useful for fixing failed / asymmetric splits.
|
||||
Hint: provide _all_ hostnames which have formerly participated
|
||||
in the cluster.
|
||||
|
||||
./football.sh repair_vm <resource> <primary_candidate_list>
|
||||
Try to restart the VM <resource> on one of the given machines.
|
||||
Useful during unexpected customer downtime.
|
||||
|
||||
./football.sh repair_mars <resource> <primary_candidate_list>
|
||||
Before restarting the VM like in repair_vm, try to find a local
|
||||
LV where a stand-alone MARS resource can be found and built up.
|
||||
Use this only when the MARS resources are gone, and when you are
|
||||
desperate. Problem: this will likely create a MARS setup which is
|
||||
not usable for production, and therefore must be corrected later
|
||||
by hand. Use this only during an emergency situation in order to
|
||||
get the customers online again, while buying the downsides of this
|
||||
command.
|
||||
|
||||
Actions for inplace FS shrinking:
|
||||
|
||||
./football.sh shrink <resource> <percent>
|
||||
Run the sequence shrink_prepare ; shrink_finish ; shrink_cleanup.
|
||||
|
||||
./football.sh shrink_prepare <resource> [<percent>]
|
||||
Allocate temporary LVM space (when possible) and create initial
|
||||
raw FS copy.
|
||||
Default percent value(when left out) is 85.
|
||||
|
||||
./football.sh shrink_finish <resource>
|
||||
Incrementally update the FS copy, swap old <=> new copy with
|
||||
small downtime.
|
||||
|
||||
./football.sh shrink_cleanup <resource>
|
||||
Remove old FS copy from LVM.
|
||||
|
||||
Actions for inplace FS extension:
|
||||
|
||||
./football.sh extend <resource> <percent>
|
||||
|
||||
Combined actions:
|
||||
|
||||
./football.sh migrate+shrink <resource> <target_primary> [<target_secondary>] [<percent>]
|
||||
Similar to migrate ; shrink but produces less network traffic.
|
||||
Default percent value (when left out) is 85.
|
||||
|
||||
./football.sh migrate+shrink+back <resource> <tmp_primary> [<percent>]
|
||||
Migrate temporarily to <tmp_primary>, then shrink there,
|
||||
finally migrate back to old primary and secondaries.
|
||||
Default percent value (when left out) is 85.
|
||||
|
||||
Global maintenance:
|
||||
|
||||
./football.sh lv_cleanup <resource>
|
||||
|
||||
General features:
|
||||
|
||||
- Instead of <percent>, an absolute amount of storage with suffix
|
||||
'k' or 'm' or 'g' can be given.
|
||||
|
||||
- When <resource> is currently stopped, login to the container is
|
||||
not possible, and in turn the hypervisor node and primary storage node
|
||||
cannot be automatically determined. In such a case, the missing
|
||||
nodes can be specified via the syntax
|
||||
<resource>:<hypervisor>:<primary_storage>
|
||||
|
||||
- The following LV suffixes are used (naming convention):
|
||||
-tmp = currently emerging version for shrinking
|
||||
-preshrink = old version before shrinking took place
|
||||
|
||||
- By adding the option --screener, you can handover football execution
|
||||
to ./screener.sh .
|
||||
When some --enable_*_waiting is also added, then the critical
|
||||
sections involving customer downtime are temporarily halted until
|
||||
some sysadmins says "screener.sh continue $resource" or
|
||||
attaches to the sessions and presses the RETURN key.
|
||||
|
||||
|
||||
PLUGIN football-cm3
|
||||
|
||||
1&1 specfic plugin for dealing with the cm3 cluster manager
|
||||
and its concrete operating enviroment (singleton instance).
|
||||
|
||||
Current maximum cluster size limit:
|
||||
|
||||
Maximum #syncs running before migration can start:
|
||||
|
||||
Following marsadm --version must be installed:
|
||||
|
||||
Following mars kernel modules must be loaded:
|
||||
|
||||
|
||||
PLUGIN football-basic
|
||||
|
||||
Generic driver for systemd-controlled MARS pools.
|
||||
The current version supports only a flat model:
|
||||
(1) There is a single "big cluster" at metadata level.
|
||||
All cluster members are joined via merge-cluster.
|
||||
All occurring names need to be globally unique.
|
||||
(2) The network uses BGP or other means, thus any hypervisor
|
||||
can (potentially) start any VM at any time.
|
||||
(3) iSCSI or remote devices are not supported for now
|
||||
(LocalSharding model). This may be extended in a future
|
||||
release.
|
||||
This plugin is exclusive-or with cm3.
|
||||
|
||||
Plugin specific actions:
|
||||
|
||||
./football.sh basic_add_host <hostname>
|
||||
Manually add another host to the hostname cache.
|
||||
|
||||
|
||||
PLUGIN football-motd
|
||||
|
||||
Generic plugin for motd. Communicate that Football is running
|
||||
at login via motd.
|
||||
|
||||
|
||||
PLUGIN football-report
|
||||
|
||||
Generic plugin for communication of reports.
|
||||
|
||||
|
||||
PLUGIN football-waiting
|
||||
|
||||
Generic plugig, interfacing with screener: when this is used
|
||||
by your script and enabled, then you will be able to wait for
|
||||
"screener.sh continue" operations at certain points in your
|
||||
script.
|
||||
|
||||
\end{verbatim}
|
18
docu/make-help.sh
Executable file
18
docu/make-help.sh
Executable file
@ -0,0 +1,18 @@
|
||||
#!/bin/bash
|
||||
|
||||
football_dir="${football_dir:-../football}"
|
||||
|
||||
function make_latex_include
|
||||
{
|
||||
local cmd="$1"
|
||||
|
||||
echo '\begin{verbatim}'
|
||||
eval "$cmd" | sed 's/\\/\\\\/g'
|
||||
echo '\end{verbatim}'
|
||||
}
|
||||
|
||||
make_latex_include "../userspace/marsadm --help" > marsadm.help
|
||||
make_latex_include "(cd $football_dir/ && ./football.sh --help)" > football.help
|
||||
make_latex_include "(cd $football_dir/ && ./football.sh --help --verbose)" > football-verbose.help
|
||||
make_latex_include "(cd $football_dir/ && ./screener.sh --help)" > screener.help
|
||||
make_latex_include "(cd $football_dir/ && ./screener.sh --help --verbose)" > screener-verbose.help
|
@ -39413,6 +39413,13 @@ maximum 100 logfiles per resource
|
||||
|
||||
\begin_layout Chapter
|
||||
Handout for Midnight Problem Solving
|
||||
\begin_inset CommandInset label
|
||||
LatexCommand label
|
||||
name "chap:Handout-for-Midnight"
|
||||
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Standard
|
||||
@ -42012,6 +42019,162 @@ A_{s,p,T}(k,n)=n^{s+1}*T*\sum_{\bar{k}=k}^{k*n}C(k,\bar{k},k*n)*\binom{k*n}{\bar
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Chapter
|
||||
Command Documentation for Userspace Tools
|
||||
\begin_inset CommandInset label
|
||||
LatexCommand label
|
||||
name "chap:Command-Documentation-for"
|
||||
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Section
|
||||
|
||||
\family typewriter
|
||||
marsadm --help
|
||||
\begin_inset CommandInset label
|
||||
LatexCommand label
|
||||
name "sec:marsadm-–help"
|
||||
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Standard
|
||||
\begin_inset ERT
|
||||
status open
|
||||
|
||||
\begin_layout Plain Layout
|
||||
|
||||
|
||||
\backslash
|
||||
input{marsadm.help}
|
||||
\end_layout
|
||||
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Section
|
||||
|
||||
\family typewriter
|
||||
football.sh --help
|
||||
\begin_inset CommandInset label
|
||||
LatexCommand label
|
||||
name "sec:football-–help"
|
||||
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Standard
|
||||
\begin_inset ERT
|
||||
status open
|
||||
|
||||
\begin_layout Plain Layout
|
||||
|
||||
|
||||
\backslash
|
||||
input{football.help}
|
||||
\end_layout
|
||||
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Section
|
||||
|
||||
\family typewriter
|
||||
football.sh --help --verbose
|
||||
\begin_inset CommandInset label
|
||||
LatexCommand label
|
||||
name "sec:football-help-verbose"
|
||||
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Standard
|
||||
\begin_inset ERT
|
||||
status open
|
||||
|
||||
\begin_layout Plain Layout
|
||||
|
||||
|
||||
\backslash
|
||||
input{football-verbose.help}
|
||||
\end_layout
|
||||
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Section
|
||||
|
||||
\family typewriter
|
||||
screener.sh --help
|
||||
\begin_inset CommandInset label
|
||||
LatexCommand label
|
||||
name "sec:screener–help"
|
||||
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Standard
|
||||
\begin_inset ERT
|
||||
status open
|
||||
|
||||
\begin_layout Plain Layout
|
||||
|
||||
|
||||
\backslash
|
||||
input{screener.help}
|
||||
\end_layout
|
||||
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Section
|
||||
|
||||
\family typewriter
|
||||
screener.sh --help --verbose
|
||||
\begin_inset CommandInset label
|
||||
LatexCommand label
|
||||
name "sec:screener-help-verbose"
|
||||
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Standard
|
||||
\begin_inset ERT
|
||||
status open
|
||||
|
||||
\begin_layout Plain Layout
|
||||
|
||||
|
||||
\backslash
|
||||
input{screener-verbose.help}
|
||||
\end_layout
|
||||
|
||||
\end_inset
|
||||
|
||||
|
||||
\end_layout
|
||||
|
||||
\begin_layout Chapter
|
||||
|
570
docu/marsadm.help
Normal file
570
docu/marsadm.help
Normal file
@ -0,0 +1,570 @@
|
||||
\begin{verbatim}
|
||||
|
||||
Thorough documentation is in mars-manual.pdf. Please use the PDF manual
|
||||
as authoritative reference! Here is only a short summary of the most
|
||||
important sub-commands / options:
|
||||
|
||||
marsadm [<global_options>] <command> [<resource_name> | all | <args> ]
|
||||
marsadm [<global_options>] view[-<macroname>] [<resource_name> | all ]
|
||||
|
||||
<global_option> =
|
||||
--force
|
||||
Skip safety checks.
|
||||
Use this only when you really know what you are doing!
|
||||
Warning! This is dangerous! First try --dry-run.
|
||||
Not combinable with 'all'.
|
||||
--dry-run
|
||||
Don't modify the symlink tree, but tell what would be done.
|
||||
Use this before starting potentially harmful actions such as
|
||||
'delete-resource'.
|
||||
--verbose
|
||||
Increase speakyness of some commands.
|
||||
--logger=/path/to/usr/bin/logger
|
||||
Use an alternative syslog messenger.
|
||||
When empty, disable syslogging.
|
||||
--max-deletions=<number>
|
||||
When your network or your firewall rules are defective over a
|
||||
longer time, too many deletion links may accumulate at
|
||||
/mars/todo-global/delete-* and sibling locations.
|
||||
This limit is preventing overflow of the filesystem as well
|
||||
as overloading the worker threads.
|
||||
--thresh-logfiles=<number>
|
||||
--thresh-logsize=<number>
|
||||
Prevention of too many small logfiles when secondaries are not
|
||||
catching up. When more than thresh-logfiles are already present,
|
||||
the next one is only created when the last one has at least
|
||||
size thresh-logsize (in units of GB).
|
||||
--timeout=<seconds>
|
||||
Abort safety checks after timeout with an error.
|
||||
When giving 'all' as resource agument, this works for each
|
||||
resource independently.
|
||||
--window=<seconds>
|
||||
Treat other cluster nodes as healthy when some communcation has
|
||||
occured during the given time window.
|
||||
--threshold=<bytes>
|
||||
Some macros like 'fetch-threshold-reached' use this for determining
|
||||
their sloppyness.
|
||||
--host=<hostname>
|
||||
Act as if the command was running on cluster node <hostname>.
|
||||
Warning! This is dangerous! First try --dry-run
|
||||
--backup-dir=</absolute_path>
|
||||
Only for experts.
|
||||
Used by several special commands like merge-cluster, split-cluster
|
||||
etc for creating backups of important data.
|
||||
--ip=<ip>
|
||||
Override the IP address stored in the symlink tree, as well as
|
||||
the default IP determined from the list of network interfaces.
|
||||
Usually you will need this only at 'create-cluster' or
|
||||
'join-cluster' for resolving ambiguities.
|
||||
--ssh-port=<port_nr>
|
||||
Override the default ssh port (22) for ssh and rsync.
|
||||
Useful for running {join,merge}-cluster on non-standard ssh ports.
|
||||
--ssh-opts="<ssh_commandline_options>"
|
||||
Override the default ssh commandline options. Also used for rsync.
|
||||
--macro=<text>
|
||||
Handy for testing short macro evaluations at the command line.
|
||||
|
||||
<command> =
|
||||
attach
|
||||
usage: attach <resource_name>
|
||||
Attaches the local disk (backing block device) to the resource.
|
||||
The disk must have been previously configured at
|
||||
{create,join}-resource.
|
||||
When designated as a primary, /dev/mars/$res will also appear.
|
||||
This does not change the state of {fetch,replay}.
|
||||
For a complete local startup of the resource, use 'marsadm up'.
|
||||
|
||||
cat
|
||||
usage: cat <path>
|
||||
Print internal debug output in human readable form.
|
||||
Numerical timestamps and numerical error codes are replaced
|
||||
by more readable means.
|
||||
Example: marsadm cat /mars/5.total.status
|
||||
|
||||
connect
|
||||
usage: connect <resource_name>
|
||||
See resume-fetch-local.
|
||||
|
||||
connect-global
|
||||
usage: connect-global <resource_name>
|
||||
Like resume-fetch-local, but affects all resource members
|
||||
in the cluster (remotely).
|
||||
|
||||
connect-local
|
||||
usage: connect-local <resource_name>
|
||||
See resume-fetch-local.
|
||||
|
||||
create-cluster
|
||||
usage: create-cluster (no parameters)
|
||||
This must be called exactly once when creating a new cluster.
|
||||
Don't call this again! Use join-cluster on the secondary nodes.
|
||||
Please read the PDF manual for details.
|
||||
|
||||
create-resource
|
||||
usage: create-resource <resource_name> </dev/lv/mydata>
|
||||
(further syntax variants are described in the PDF manual).
|
||||
Create a new resource out of a pre-existing disk (backing
|
||||
block device) /dev/lv/mydata (or similar).
|
||||
The current node will start in primary role, thus
|
||||
/dev/mars/<resource_name> will appear after a short time, initially
|
||||
showing the same contents as the underlying disk /dev/lv/mydata.
|
||||
It is good practice to name the resource <resource_name> and the
|
||||
disk name identical.
|
||||
|
||||
cron
|
||||
usage: cron (no parameters)
|
||||
Do all necessary regular housekeeping tasks.
|
||||
This is equivalent to log-rotate all; sleep 5; log-delete-all all.
|
||||
|
||||
delete-resource
|
||||
usage: delete-resource <resource_name>
|
||||
CAUTION! This is dangerous when the network is somehow
|
||||
interrupted, or when damaged nodes are later re-surrected
|
||||
in any way.
|
||||
|
||||
Precondition: the resource must no longer have any members
|
||||
(see leave-resource).
|
||||
This is only needed when you _insist_ on re-using a damaged
|
||||
resource for re-creating a new one with exactly the same
|
||||
old <resource_name>.
|
||||
HINT: best practice is to not use this, but just create a _new_
|
||||
resource with a new <resource_name> out of your local disks.
|
||||
Please read the PDF manual on potential consequences.
|
||||
|
||||
detach
|
||||
usage: detach <resource_name>
|
||||
Detaches the local disk (backing block device) from the
|
||||
MARS resource.
|
||||
Caution! you may read data from the local disk afterwards,
|
||||
but ensure that no data is written to it!
|
||||
Otherwise, you are likely to produce harmful inconsistencies.
|
||||
When running in primary role, /dev/mars/$res will also disappear.
|
||||
This does not change the state of {fetch,replay}.
|
||||
For a complete local shutdown of the resource, use 'marsadm down'.
|
||||
|
||||
disconnect
|
||||
usage: disconnect <resource_name>
|
||||
See pause-fetch-local.
|
||||
|
||||
disconnect-global
|
||||
usage: disconnect-global <resource_name>
|
||||
Like pause-fetch-local, but affects all resource members
|
||||
in the cluster (remotely).
|
||||
|
||||
disconnect-local
|
||||
usage: disconnect-local <resource_name>
|
||||
See pause-fetch-local.
|
||||
|
||||
down
|
||||
usage: down <resource_name>
|
||||
Shortcut for detach + pause-sync + pause-fetch + pause-replay.
|
||||
|
||||
get-emergency-limit
|
||||
usage: get-emergency-limit <resource_name>
|
||||
Counterpart of set-emergency-limit (per-resource emergency limit)
|
||||
|
||||
get-sync-limit-value
|
||||
usage: get-sync-limit-value (no parameters)
|
||||
For retrieval of the value set by set-sync-limit-value.
|
||||
|
||||
get-systemd-unit
|
||||
usage: get-systemd-unit <resource_name>
|
||||
Show the system units (for start and stop), or empty when unset.
|
||||
|
||||
invalidate
|
||||
usage: invalidate <resource_name>
|
||||
Only useful on a secondary node.
|
||||
Forces MARS to consider the local replica disk as being
|
||||
inconsistent, and therefore starting a fast full-sync from
|
||||
the currently designated primary node (which must exist;
|
||||
therefore avoid the 'secondary' command).
|
||||
This is usually needed for resolving emergency mode.
|
||||
When having k=2 replicas, this can be also used for
|
||||
quick-and-simple split-brain resolution.
|
||||
In other cases, or when the split-brain is not resolved by
|
||||
this command, please use the 'leave-resource' / 'join-resource'
|
||||
method as described in the PDF manual (in the right order as
|
||||
described there).
|
||||
|
||||
join-cluster
|
||||
usage: join-cluster <hostname_of_primary>
|
||||
Establishes a new cluster membership.
|
||||
This must be called once on any new cluster member.
|
||||
This is a prerequisite for join-resource.
|
||||
|
||||
join-resource
|
||||
usage: join-resource <resource_name> </dev/lv/mydata>
|
||||
(further syntax variants are described in the PDF manual).
|
||||
The resource <resource_name> must have been already created on
|
||||
another cluster node, and the network must be healthy.
|
||||
The contents of the local replica disk /dev/lv/mydata will be
|
||||
overwritten by the initial fast full sync from the currently
|
||||
designated primary node.
|
||||
After the initial full sync has finished, the current host will
|
||||
act in secondary role.
|
||||
For details on size constraints etc, refer to the PDF manual.
|
||||
|
||||
leave-cluster
|
||||
usage: leave-cluster (no parameters)
|
||||
This can be used for final deconstruction of a cluster member.
|
||||
Prior to this, all resources must have been left
|
||||
via leave-resource.
|
||||
Notice: this will never destroy the cluster UID on the /mars/
|
||||
filesystem.
|
||||
Please read the PDF manual for details.
|
||||
|
||||
leave-resource
|
||||
usage: leave-resource <resource_name>
|
||||
Precondition: the local host must be in secondary role.
|
||||
Stop being a member of the resource, and thus stop all
|
||||
replication activities. The status of the underlying disk
|
||||
will remain in its current state (whatever it is).
|
||||
|
||||
log-delete
|
||||
usage: log-delete <resource_name>
|
||||
When possible, globally delete all old transaction logfiles which
|
||||
are known to be superflous, i.e. all secondaries no longer need
|
||||
to replay them.
|
||||
This must be regularly called by a cron job or similar, in order
|
||||
to prevent overflow of the /mars/ directory.
|
||||
For regular maintainance cron jobs, please prefer 'marsadm cron'.
|
||||
For details and best practices, please refer to the PDF manual.
|
||||
|
||||
log-delete-all
|
||||
usage: log-delete-all <resource_name>
|
||||
Alias for log-delete
|
||||
|
||||
log-delete-one
|
||||
usage: log-delete-one <resource_name>
|
||||
When possible, globally delete at most one old transaction logfile
|
||||
which is known to be superfluous, i.e. all secondaries no longer
|
||||
need to replay it.
|
||||
Hint: use this only for testing and manual inspection.
|
||||
For regular maintainance cron jobs, please prefer cron
|
||||
or log-delete-all.
|
||||
|
||||
log-purge-all
|
||||
usage: log-purge-all <resource_name>
|
||||
This is potentially dangerous.
|
||||
Use this only if you are really desperate in trying to resolve a
|
||||
split brain. Use this only after reading the PDF manual!
|
||||
|
||||
log-rotate
|
||||
usage: log-rotate <resource_name>
|
||||
Only useful at the primary side.
|
||||
Start writing transaction logs into a new transaction logfile.
|
||||
This should be regularly called by a cron job or similar.
|
||||
For regular maintainance cron jobs, please prefer 'marsadm cron'.
|
||||
For details and best practices, please refer to the PDF manual.
|
||||
|
||||
lowlevel-delete-host
|
||||
usage: lowlevel-delete-host <resource_name>
|
||||
Delete cluster member.
|
||||
|
||||
lowlevel-ls-host-ips
|
||||
usage: lowlevel-ls-host-ips <resource_name>
|
||||
List cluster member names and IP addresses.
|
||||
|
||||
lowlevel-set-host-ip
|
||||
usage: lowlevel-set-host-ip <resource_name>
|
||||
Set IP for host.
|
||||
|
||||
merge-cluster
|
||||
usage: merge-cluster <hostname_of_other_cluster>
|
||||
Precondition: the resource names of both clusters must be disjoint.
|
||||
Create the union of two clusters, consisting of the
|
||||
union of all machines, and the union of all resources.
|
||||
The members of each resource are _not_ changed by this.
|
||||
This is useful for creating a big "virtual LVM cluster" where
|
||||
resources can be almost arbitrarily migrated between machines via
|
||||
later join-resource / leave-resource operations.
|
||||
|
||||
merge-cluster-check
|
||||
usage: merge-cluster-check <hostname_of_other_cluster>
|
||||
Check whether the resources of both clusters are disjoint.
|
||||
Useful for checking in advance whether merge-cluster would be
|
||||
possible.
|
||||
|
||||
merge-cluster-list
|
||||
usage: merge-cluster-list
|
||||
Determine the local list of resources.
|
||||
Useful for checking or analysis of merge-cluster disjointness by hand.
|
||||
|
||||
pause-fetch
|
||||
usage: pause-fetch <resource_name>
|
||||
See pause-fetch-local.
|
||||
|
||||
pause-fetch-global
|
||||
usage: pause-fetch-global <resource_name>
|
||||
Like pause-fetch-local, but affects all resource members
|
||||
in the cluster (remotely).
|
||||
|
||||
pause-fetch-local
|
||||
usage: pause-fetch-local <resource_name>
|
||||
Stop fetching transaction logfiles from the current
|
||||
designated primary.
|
||||
This is independent from any {pause,resume}-replay operations.
|
||||
Only useful on a secondary node.
|
||||
|
||||
pause-replay
|
||||
usage: pause-replay <resource_name>
|
||||
See pause-replay-local.
|
||||
|
||||
pause-replay-global
|
||||
usage: pause-replay-global <resource_name>
|
||||
Like pause-replay-local, but affects all resource members
|
||||
in the cluster (remotely).
|
||||
|
||||
pause-replay-local
|
||||
usage: pause-replay-local <resource_name>
|
||||
Stop replaying transaction logfiles for now.
|
||||
This is independent from any {pause,resume}-fetch operations.
|
||||
This may be used for freezing the state of your replica for some
|
||||
time, if you have enough space on /mars/.
|
||||
Only useful on a secondary node.
|
||||
|
||||
pause-sync
|
||||
usage: pause-sync <resource_name>
|
||||
See pause-sync-local.
|
||||
|
||||
pause-sync-global
|
||||
usage: pause-sync-global <resource_name>
|
||||
Like pause-sync-local, but affects all resource members
|
||||
in the cluster (remotely).
|
||||
|
||||
pause-sync-local
|
||||
usage: pause-sync-local <resource_name>
|
||||
Pause the initial data sync at current stage.
|
||||
This has only an effect if a sync is actually running (i.e.
|
||||
there is something to be actually synced).
|
||||
Don't pause too long, because the local replica will remain
|
||||
inconsistent during the pause.
|
||||
Use this only for limited reduction of system load.
|
||||
Only useful on a secondary node.
|
||||
|
||||
primary
|
||||
usage: primary <resource_name>
|
||||
Promote the resource into primary role.
|
||||
This is necessary for /dev/mars/$res to appear on the local host.
|
||||
Notice: by concept there can be only _one_ designated primary
|
||||
in a cluster at the same time.
|
||||
The role change is automatically distributed to the other nodes
|
||||
in the cluster, provided that the network is healthy.
|
||||
The old primary node will _automatically_ go
|
||||
into secondary role first. This is different from DRBD!
|
||||
With MARS, you don't need an intermediate 'secondary' command
|
||||
for switching roles.
|
||||
It is usually better to _directly_ switch the primary roles
|
||||
between both hosts.
|
||||
When --force is not given, a planned handover is started:
|
||||
the local host will only become actually primary _after_ the
|
||||
old primary is gone, and all old transaction logs have been
|
||||
fetched and replayed at the new designated priamry.
|
||||
When --force is given, no handover is attempted. A a consequence,
|
||||
a split brain situation is likely to emerge.
|
||||
Thus, use --force only after an ordinary handover attempt has
|
||||
failed, and when you don't care about the split brain.
|
||||
For more details, please refer to the PDF manual.
|
||||
|
||||
resize
|
||||
usage: resize <resource_name>
|
||||
Prerequisite: all underlying disks (usually /dev/vg/$res) must
|
||||
have been already increased, e.g. at the LVM layer (cf. lvresize).
|
||||
Causes MARS to re-examine all sizing constraints on all members of
|
||||
the resource, and increase the global logical size of the resource
|
||||
accordingly.
|
||||
Shrinking is currently not yet implemented.
|
||||
When successful, /dev/mars/$res at the primary will be increased
|
||||
in size. In addition, all secondaries will start an incremental
|
||||
fast full-sync to get the enlarged parts from the primary.
|
||||
|
||||
resume-fetch
|
||||
usage: resume-fetch <resource_name>
|
||||
See resume-fetch-local.
|
||||
|
||||
resume-fetch-global
|
||||
usage: resume-fetch-global <resource_name>
|
||||
Like resume-fetch-local, but affects all resource members
|
||||
in the cluster (remotely).
|
||||
|
||||
resume-fetch-local
|
||||
usage: resume-fetch-local <resource_name>
|
||||
Start fetching transaction logfiles from the current
|
||||
designated primary node, if there is one.
|
||||
This is independent from any {pause,resume}-replay operations.
|
||||
Only useful on a secondary node.
|
||||
|
||||
resume-replay
|
||||
usage: resume-replay <resource_name>
|
||||
See resume-replay-local.
|
||||
|
||||
resume-replay-global
|
||||
usage: resume-replay-global <resource_name>
|
||||
Like resume-replay-local, but affects all resource members
|
||||
in the cluster (remotely).
|
||||
|
||||
resume-replay-local
|
||||
usage: resume-replay-local <resource_name>
|
||||
Restart replaying transaction logfiles, when there is some
|
||||
data left.
|
||||
This is independent from any {pause,resume}-fetch operations.
|
||||
This should be used for unfreezing the state of your local replica.
|
||||
Only useful on a secondary node.
|
||||
|
||||
resume-sync
|
||||
usage: resume-sync <resource_name>
|
||||
See resume-sync-local.
|
||||
|
||||
resume-sync-global
|
||||
usage: resume-sync-global <resource_name>
|
||||
Like resume-sync-local, but affects all resource members
|
||||
in the cluster (remotely).
|
||||
|
||||
resume-sync-local
|
||||
usage: resume-sync-local <resource_name>
|
||||
Resume any initial / incremental data sync at the stage where it
|
||||
had been interrupted by pause-sync.
|
||||
Only useful on a secondary node.
|
||||
|
||||
secondary
|
||||
usage: secondary <resource_name>
|
||||
Promote all cluster members into secondary role, globally.
|
||||
In contrast to DRBD, this is not needed as an intermediate step
|
||||
for planned handover between an old and a new primary node.
|
||||
The only reasonable usage is before the last leave-resource of the
|
||||
last cluster member, immediately before leave-cluster is executed
|
||||
for final deconstruction of the cluster.
|
||||
In all other cases, please prefer 'primary' for direct handover
|
||||
between cluster nodes.
|
||||
Notice: 'secondary' sets the global designated primary node
|
||||
to '(none)' which in turn prevents the execution of 'invalidate'
|
||||
or 'join-resource' or 'resize' anywhere in the cluster.
|
||||
Therefore, don't unnecessarily give 'secondary'!
|
||||
|
||||
set-emergency-limit
|
||||
usage: set-emergency-limit <resource_name> <value>
|
||||
Set a per-resource emergency limit for disk space in /mars.
|
||||
See PDF manual for details.
|
||||
|
||||
set-sync-limit-value
|
||||
usage: set-sync-limit-value <new_value>
|
||||
Set the maximum number of resources which should by syncing
|
||||
concurrently.
|
||||
|
||||
set-systemd-unit
|
||||
usage: set-systemd-unit <resource_name> <start_unit_name> [<stop_unit_name>]
|
||||
This activates the systemd template engine of marsadm.
|
||||
Please read mars-manual.pdf on this.
|
||||
When <stop_unit_name> is omitted, it will be treated equal to
|
||||
<start_unit_name>.
|
||||
|
||||
split-cluster
|
||||
usage: split-cluster (no parameters)
|
||||
NOT OFFICIALLY SUPPORTED - ONLY FOR EXPERTS.
|
||||
RTFS = Read The Fucking Sourcecode.
|
||||
Use this only if you know what you are doing.
|
||||
|
||||
up
|
||||
usage: up <resource_name>
|
||||
Shortcut for attach + resume-sync + resume-fetch + resume-replay.
|
||||
|
||||
wait-cluster
|
||||
usage: wait-resource [<resource_name>]
|
||||
Waits until a ping-pong communication has succeeded in the
|
||||
whole cluster (or only the members of <resource_name>).
|
||||
NOTICE: this is extremely useful for avoiding races when scripting
|
||||
in a cluster.
|
||||
|
||||
wait-connect
|
||||
usage: wait-connect [<resource_name>]
|
||||
See wait-cluster.
|
||||
|
||||
wait-resource
|
||||
usage: wait-resource <resource_name>
|
||||
[[attach|fetch|replay|sync][-on|-off]]
|
||||
Wait until the given condition is met on the resource, locally.
|
||||
|
||||
wait-umount
|
||||
usage: wait-umount <resource_name>
|
||||
Wait until /dev/mars/<resource_name> has disappeared in the
|
||||
cluster (even remotely).
|
||||
Useful on both primary and secondary nodes.
|
||||
|
||||
<resource_name> = name of resource or "all" for all resources
|
||||
|
||||
|
||||
<macroname> = <complex_macroname> | <primitive_macroname>
|
||||
|
||||
<complex_macroname> =
|
||||
1and1
|
||||
comminfo
|
||||
commstate
|
||||
cstate
|
||||
default
|
||||
default-global
|
||||
diskstate
|
||||
diskstate-1and1
|
||||
dstate
|
||||
fetch-line
|
||||
fetch-line-1and1
|
||||
flags
|
||||
flags-1and1
|
||||
outdated-flags
|
||||
outdated-flags-1and1
|
||||
primarynode
|
||||
primarynode-1and1
|
||||
replay-line
|
||||
replay-line-1and1
|
||||
replinfo
|
||||
replinfo-1and1
|
||||
replstate
|
||||
replstate-1and1
|
||||
resource-errors
|
||||
resource-errors-1and1
|
||||
role
|
||||
role-1and1
|
||||
state
|
||||
status
|
||||
sync-line
|
||||
sync-line-1and1
|
||||
syncinfo
|
||||
syncinfo-1and1
|
||||
todo-role
|
||||
|
||||
|
||||
<primitive_macroname> =
|
||||
deletable-size
|
||||
device-opened
|
||||
errno-text
|
||||
Convert errno numbers (positive or negative) into human readable text.
|
||||
get-log-status
|
||||
get-resource-{fat,err,wrn}{,-count}
|
||||
get-{disk,device}
|
||||
is-{alive}
|
||||
is-{split-brain,consistent,emergency}
|
||||
occupied-size
|
||||
present-{disk,device}
|
||||
(deprecated, use *-present instead)
|
||||
replay-basenr
|
||||
replay-code
|
||||
When negative, this indidates that a replay/recovery error has occurred.
|
||||
rest-space
|
||||
summary-vector
|
||||
systemd-unit
|
||||
tree
|
||||
uuid
|
||||
wait-{is,todo}-{attach,sync,fetch,replay,primary}-{on,off}
|
||||
{alive,fetch,replay,work}-{timestamp,age,lag}
|
||||
{all,the}-{pretty-,}{global-,}{{err,wrn,inf}-,}msg
|
||||
{cluster,resource}-members
|
||||
{disk,device}-present
|
||||
{disk,resource,device}-size
|
||||
{fetch,replay,work}-{lognr,logcount}
|
||||
{get,actual}-primary
|
||||
{is,todo}-{attach,sync,fetch,replay,primary}
|
||||
{my,all}-resources
|
||||
{sync,fetch,replay,work,syncpos}-{size,pos}
|
||||
{sync,fetch,replay,work}-{rest,{almost-,threshold-,}reached,percent,permille,vector}
|
||||
{sync,fetch,replay}-{rate,remain}
|
||||
{time,real-time}
|
||||
\end{verbatim}
|
365
docu/screener-verbose.help
Normal file
365
docu/screener-verbose.help
Normal file
@ -0,0 +1,365 @@
|
||||
\begin{verbatim}
|
||||
OVERRIDE verbose=1
|
||||
./screener.sh: Run _unattended_ processes in screen sessions.
|
||||
Useful for MASS automation, running hundreds of unattended
|
||||
commands in parallel.
|
||||
HINT: for running more than ~500 sessions in parallel, you might need
|
||||
some system tuning (e.g. rlimits, kernel patches etc) for creating
|
||||
a huge number of file descritor / sockets / etc.
|
||||
ADVANTAGE: You may attach to individual screens, kill them, or continue
|
||||
some waiting commands.
|
||||
|
||||
Synopsis:
|
||||
./screener.sh --help [--verbose]
|
||||
./screener.sh list-running
|
||||
./screener.sh list-waiting
|
||||
./screener.sh list-failed
|
||||
./screener.sh list-critical
|
||||
./screener.sh list-serious
|
||||
./screener.sh list-done
|
||||
./screener.sh list
|
||||
./screener.sh list-screens
|
||||
./screener.sh run <file.csv> [<condition_list>]
|
||||
./screener.sh start <screen_id> <cmd> <args...>
|
||||
./screener.sh [<options>] <operation> <screen_id>
|
||||
|
||||
Inquiry operations:
|
||||
|
||||
./screener.sh list-screens
|
||||
Equivalent to screen -ls
|
||||
|
||||
./screener.sh list-<type>
|
||||
Show a list of currently running, waiting (for continuation), failed,
|
||||
and done/completed screen sessions.
|
||||
|
||||
./screener.sh list
|
||||
First show a list of currently running screens, then
|
||||
for each <type> a list of (old) failed / completed / sessions
|
||||
(and so on).
|
||||
|
||||
./screener.sh status <screen_id>
|
||||
Like list-*, but filter <sceen_id> and dont report timestamps.
|
||||
|
||||
./screener.sh show <screen_id>
|
||||
Show the last logfile of <screen_id> at standard output.
|
||||
|
||||
./screener.sh less <screen_id>
|
||||
Show the last logfile of <screen_id> using "less -r".
|
||||
|
||||
MASS starting of screen sessions:
|
||||
|
||||
./screener.sh run <file.csv> <condition_list>
|
||||
Commands are launched in screen sessions via "./screener.sh start" commands,
|
||||
unless the same <screen_id> is already running,
|
||||
or is in some error state, or is already done (see below).
|
||||
The commands are given by a column with CSV header name
|
||||
containing "command", or by the first column.
|
||||
The <screen_id> needs to be given by a column with CSV header
|
||||
name matching "screen_id|resource".
|
||||
The number and type of commands to launch can be reduced via
|
||||
any combination of the following filter conditions:
|
||||
|
||||
--max=<number>
|
||||
Limit the number of _new_ sessions additionally started this time.
|
||||
|
||||
--<column_name>==<value>
|
||||
Only select lines where an arbitrary CSV column (given by its
|
||||
CSV header name in C identifier syntax) has the given value.
|
||||
|
||||
--<column_name>!=<value>
|
||||
Only select lines where the colum has _not_ the given value.
|
||||
|
||||
--<column_name>=~<bash_regex>
|
||||
Only select lines where the bash regular expression matches
|
||||
at the given column.
|
||||
|
||||
--max-per=<number>
|
||||
Limit the number per _distinct_ value of the column denoted by
|
||||
the _next_ filter condition.
|
||||
Example: ./screener.sh run test.csv --dry-run --max-per=2 --dst_network=~.
|
||||
would launch only 2 Football processes per destination network.
|
||||
|
||||
Hint: filter conditions can be easily checked by giving --dry-run.
|
||||
|
||||
Start / restart / kill / continue screen sessions:
|
||||
|
||||
./screener.sh start <screen_id> <cmd> <args...>
|
||||
Start a new screen session, running arbitrary <cmd> and <args...>
|
||||
inside.
|
||||
|
||||
./screener.sh restart <screen_id>
|
||||
Works only when the last command for <screen_id> failed.
|
||||
This will restart the old <cmd> and its <args...> as before.
|
||||
Use only when you want to repeat the same command once again.
|
||||
|
||||
./screener.sh kill <screen_id>
|
||||
Terminate the running screen session forcibly.
|
||||
|
||||
./screener.sh continue
|
||||
./screener.sh continue <screen_id> [<screen_id_list>]
|
||||
./screener.sh continue <number>
|
||||
Useful for MASS automation of processes involving critical sections
|
||||
such as customer downtime.
|
||||
When giving a numerical <number> argument, up to that number
|
||||
of sessions are resumed (ordered by age).
|
||||
When no further arugment is given, _all_ currently waiting sessions
|
||||
are continued.
|
||||
When --auto-attach is given, it will sequentially resume the
|
||||
sessions to be continued. By default, unless --force_attach is set,
|
||||
it uses "screen -r" skipping those sessions which are already
|
||||
attached to somebody else.
|
||||
This feature works only with prepared scripts which are creating
|
||||
an empty flagfile
|
||||
/home/schoebel/mars/mars-migration.git/screener-logdir-testing/running/$screen_id.waiting
|
||||
whenever they want to wait for manual intervention (for whatever reason).
|
||||
Afterwards, the script must be polling this flagfile for removal.
|
||||
This screener operation simply removes the flagfile, such that
|
||||
the script will then continue afterwards.
|
||||
Example: look into ./football.sh
|
||||
and search for occurrences of substring "call_hook start_wait".
|
||||
|
||||
./screener.sh wakeup
|
||||
./screener.sh wakeup <screen_id> [<screen_id_list>]
|
||||
./screener.sh wakeup <number>
|
||||
Similar to continue, but refers to delayed commands waiting for
|
||||
a timeout. This can be used to individually shorten the timeout
|
||||
period.
|
||||
Example: Football cleanup operations may be artificially delayed
|
||||
before doing "lvremove", to keep some sort of 'backup' for a
|
||||
limited time. When your project is under time pressure, these
|
||||
delays may be hindering.
|
||||
Use this for premature ending of such artificial delays.
|
||||
|
||||
./screener.sh up <...>
|
||||
Do both continue and wakeup.
|
||||
|
||||
./screener.sh auto <...>
|
||||
Equivalent to ./screener.sh --auto-attach up <...>
|
||||
Remember that only session without current attachment will be
|
||||
attached to.
|
||||
|
||||
Attach to a running session:
|
||||
|
||||
./screener.sh attach <screen_id>
|
||||
This is equivalent to screen -x $screen_id
|
||||
|
||||
./screener.sh resume <screen_id>
|
||||
This is equivalent to screen -r $screen_id
|
||||
|
||||
Communication:
|
||||
|
||||
./screener.sh notify <screen_id> <txt>
|
||||
May be called from external scripts to send emails etc.
|
||||
|
||||
Locking (only when supported by <cmd>):
|
||||
|
||||
./screener.sh lock
|
||||
./screener.sh unlock
|
||||
./screener.sh lock <screen_id>
|
||||
./screener.sh unlock <screen_id>
|
||||
|
||||
Cleanup / bookkeeping:
|
||||
|
||||
./screener.sh clear-critical <screen_id>
|
||||
./screener.sh clear-serious <screen_id>
|
||||
./screener.sh clear-failed <screen_id>
|
||||
Mark the status as "done" and move the logfile away.
|
||||
|
||||
./screener.sh purge [<days>]
|
||||
This will remove all old logfiles which are older than
|
||||
<days>. By default, the variable $screener_log_purge_period
|
||||
will be used, which is currently set to '30'.
|
||||
|
||||
./screener.sh cron
|
||||
You should call this regulary from a user cron job, in order
|
||||
to purge old logfiles, or to detect hanging sessions, or to
|
||||
automatically send pending emails, etc.
|
||||
|
||||
Options:
|
||||
|
||||
--variable
|
||||
--variable=$value
|
||||
These must come first, in order to prevent mixup with
|
||||
options of <cmd> <args...>.
|
||||
Allows overriding of any internal shell variable.
|
||||
--help --verbose
|
||||
Show all overridable shell variables, also for plugins.
|
||||
|
||||
## football_includes
|
||||
# List of directories where screener-*.conf files can be found.
|
||||
football_includes="${football_includes:-/usr/lib/mars/plugins /etc/mars/plugins $script_dir/plugins $HOME/.mars/plugins ./plugins}"
|
||||
|
||||
## title
|
||||
# Used as a title for startup of screen sessions, and later for
|
||||
# display at list-*
|
||||
title="${title:-}"
|
||||
|
||||
## auto_attach
|
||||
# Upon start or upon continue/wakuep/up, attach to the
|
||||
# (newly created or existing) session.
|
||||
auto_attach="${auto_attach:-0}"
|
||||
|
||||
## auto_attach_grace
|
||||
# Before attaching, wait this time in seconds.
|
||||
# The user may abort within this sleep time by
|
||||
# pressing Ctrl-C.
|
||||
auto_attach_grace="${auto_attach_grace:-10}"
|
||||
|
||||
## force_attach
|
||||
# Use "screen -x" instead of "screen -r" allowing
|
||||
# shared sessions between different users / end terminals.
|
||||
force_attach="${force_attach:-0}"
|
||||
|
||||
## drop_shell
|
||||
# When a <cmd> fails, the screen session will not terminated immediately.
|
||||
# Instead, an interactive bash is started, so can later attach and
|
||||
# rectify any probllems.
|
||||
# WARNING! only activate this if you regulary check for failed sessions
|
||||
# and then manually attach to them. Don't use this when running hundreds
|
||||
# or thousand in parallel.
|
||||
drop_shell="${drop_shell:-0}"
|
||||
|
||||
## session_timeout
|
||||
# Detect hanging sessions when they don't produce any output anymore
|
||||
# for a longer time. Hanging sessions are then marked as failed or critical.
|
||||
session_timeout="${session_timeout:-$(( 3600 * 3 ))}" # seconds
|
||||
|
||||
## screener_logdir or logdir
|
||||
# Where the logfiles and all status information is kept.
|
||||
export screener_logdir="${screener_logdir:-${logdir:-$HOME/screener-logs}}"
|
||||
|
||||
## screener_command_log
|
||||
# This logfile will accumulate all relevant $0 command invocations,
|
||||
# including timestamps and ssh agent identities.
|
||||
# To switch off, use /dev/null here.
|
||||
screener_command_log="${screener_command_log:-$screener_logdir/commands.log}"
|
||||
|
||||
## screener_cron_log
|
||||
# Since "$0 cron" works silently, you won't notice any errors.
|
||||
# This logfiles gives you a chance for checking any problems.
|
||||
screener_cron_log="${screener_cron_log:-$screener_logdir/cron.log}"
|
||||
|
||||
## screener_log_purge_period
|
||||
# $0 cron or $0 purge will automatically remove all old logfiles
|
||||
# from $screener_logdir/*/ when this period is exceeded.
|
||||
screener_log_purge_period="${screener_log_purge_period:-30}" # Days
|
||||
|
||||
## dry_run
|
||||
# Dont actually start screen sessions when set.
|
||||
dry_run="${dry_run:-0}"
|
||||
|
||||
## verbose
|
||||
# increase speakiness.
|
||||
verbose=${verbose:-0}
|
||||
|
||||
## debug
|
||||
# Some additional debug messages.
|
||||
debug="${debug:-0}"
|
||||
|
||||
## sleep
|
||||
# Workaround races by keeping sessions open for a few seconds.
|
||||
# This is useful for debugging of immediate script failures.
|
||||
# You have some short time window for attaching.
|
||||
# HINT: instead, just inspect the logfiles in $screener_logdir/*/*.log
|
||||
sleep="${sleep:-3}"
|
||||
|
||||
## screen_cmd
|
||||
# Customize the screen command (e.g. add some further options, etc).
|
||||
screen_cmd="${screen_cmd:-screen}"
|
||||
|
||||
## use_screenlog
|
||||
# Add the -L option. Not really useful when running thousands of
|
||||
# parallel screen sessions, because the automatically generated filenames
|
||||
# are crap, and cannot be set in advance.
|
||||
# Useful for basic debugging of setup problems etc.
|
||||
use_screenlog="${use_screenlog:-0}"
|
||||
|
||||
## waiting_txt and delay_txt
|
||||
# RTFS Don't use this, unless you know what you are doing.
|
||||
waiting_txt="${waiting_txt:-SCREENER_waiting_WAIT}"
|
||||
delayed_txt="${delayed_txt:-SCREENER_delayed_WAIT}"
|
||||
|
||||
## critical_status
|
||||
# This is the "magic" exit code indicating _criticality_
|
||||
# of a failed command.
|
||||
critical_status="${critical_status:-199}"
|
||||
|
||||
## serious_status
|
||||
# This is the "magic" exit code indicating _seriosity_
|
||||
# of a failed command.
|
||||
serious_status="${serious_status:-198}"
|
||||
|
||||
## less_cmd
|
||||
# Used at $0 less $id
|
||||
less_cmd="${less_cmd:-less -r}"
|
||||
|
||||
## date_format
|
||||
# Here you can customize the appearance of list-* commands
|
||||
date_format="${date_format:-%Y-%m-%d %H:%M}"
|
||||
|
||||
## csv_delimit
|
||||
# The delimiter used for CSV file parsing
|
||||
csv_delim="${csv_delim:-;}"
|
||||
|
||||
## csv_cmd_fields
|
||||
# Regex telling the field name for 'cmd'
|
||||
csv_cmd_fields="${csv_cmd_fields:-command}"
|
||||
|
||||
## csv_id_fields
|
||||
# Regex telling the field name for 'screen_id'
|
||||
csv_id_fields="${csv_id_fields:-screen_id|resource}"
|
||||
|
||||
## csv_remove
|
||||
# Regex for global removal of command options
|
||||
csv_remove="${csv_remove:---screener}"
|
||||
|
||||
## user_name
|
||||
# Normally automatically derived from ssh agent or from $LOGNAME.
|
||||
# Please override this only when really necessary.
|
||||
export user_name="${user_name:-$(ssh-add -l | grep -o '[^ ]+@[^ ]+' | sort -u | tail -1)}"
|
||||
export user_name="${user_name:-$LOGNAME}"
|
||||
|
||||
## tmp_dir and tmp_stub
|
||||
# Where temporary files are residing
|
||||
tmp_dir="${tmp_dir:-/tmp}"
|
||||
tmp_stub="${tmp_stub:-$tmp_dir/screener.$$}"
|
||||
|
||||
Running hook: email_describe_plugin
|
||||
|
||||
PLUGIN screener-email
|
||||
|
||||
Generic plugin for sending emails (or SMS via gateways)
|
||||
upon status changes, such as script failures.
|
||||
|
||||
## email_*
|
||||
# List of email addresses.
|
||||
# Empty = don't send emails.
|
||||
email_critical="${email_critical:-}"
|
||||
email_serious="${email_serious:-}"
|
||||
email_failed="${email_failed:-}"
|
||||
email_warning="${email_warning:-}"
|
||||
email_waiting="${email_waiting:-}"
|
||||
email_done="${email_done:-}"
|
||||
|
||||
## sms_*
|
||||
# List of email addresses of SMS gateways.
|
||||
# These may be distinct from email_*.
|
||||
# Empty = don't send sms.
|
||||
sms_critical="${sms_critical:-}"
|
||||
sms_serious="${sms_serious:-}"
|
||||
sms_failed="${sms_failed:-}"
|
||||
sms_warning="${sms_warning:-}"
|
||||
sms_waiting="${sms_waiting:-}"
|
||||
sms_done="${sms_done:-}"
|
||||
|
||||
## email_cmd
|
||||
# Command for email sending.
|
||||
# Please include your gateways etc here.
|
||||
email_cmd="${email_cmd:-mailx -S smtp=mx.nowhere.org:587 -S smpt-auth-user=test}"
|
||||
|
||||
## email_logfiles
|
||||
# Whether to include logfiles in the body.
|
||||
# Not used for sms_*.
|
||||
email_logfiles="${email_logfiles:-1}"
|
||||
|
||||
\end{verbatim}
|
193
docu/screener.help
Normal file
193
docu/screener.help
Normal file
@ -0,0 +1,193 @@
|
||||
\begin{verbatim}
|
||||
./screener.sh: Run _unattended_ processes in screen sessions.
|
||||
Useful for MASS automation, running hundreds of unattended
|
||||
commands in parallel.
|
||||
HINT: for running more than ~500 sessions in parallel, you might need
|
||||
some system tuning (e.g. rlimits, kernel patches etc) for creating
|
||||
a huge number of file descritor / sockets / etc.
|
||||
ADVANTAGE: You may attach to individual screens, kill them, or continue
|
||||
some waiting commands.
|
||||
|
||||
Synopsis:
|
||||
./screener.sh --help [--verbose]
|
||||
./screener.sh list-running
|
||||
./screener.sh list-waiting
|
||||
./screener.sh list-failed
|
||||
./screener.sh list-critical
|
||||
./screener.sh list-serious
|
||||
./screener.sh list-done
|
||||
./screener.sh list
|
||||
./screener.sh list-screens
|
||||
./screener.sh run <file.csv> [<condition_list>]
|
||||
./screener.sh start <screen_id> <cmd> <args...>
|
||||
./screener.sh [<options>] <operation> <screen_id>
|
||||
|
||||
Inquiry operations:
|
||||
|
||||
./screener.sh list-screens
|
||||
Equivalent to screen -ls
|
||||
|
||||
./screener.sh list-<type>
|
||||
Show a list of currently running, waiting (for continuation), failed,
|
||||
and done/completed screen sessions.
|
||||
|
||||
./screener.sh list
|
||||
First show a list of currently running screens, then
|
||||
for each <type> a list of (old) failed / completed / sessions
|
||||
(and so on).
|
||||
|
||||
./screener.sh status <screen_id>
|
||||
Like list-*, but filter <sceen_id> and dont report timestamps.
|
||||
|
||||
./screener.sh show <screen_id>
|
||||
Show the last logfile of <screen_id> at standard output.
|
||||
|
||||
./screener.sh less <screen_id>
|
||||
Show the last logfile of <screen_id> using "less -r".
|
||||
|
||||
MASS starting of screen sessions:
|
||||
|
||||
./screener.sh run <file.csv> <condition_list>
|
||||
Commands are launched in screen sessions via "./screener.sh start" commands,
|
||||
unless the same <screen_id> is already running,
|
||||
or is in some error state, or is already done (see below).
|
||||
The commands are given by a column with CSV header name
|
||||
containing "command", or by the first column.
|
||||
The <screen_id> needs to be given by a column with CSV header
|
||||
name matching "screen_id|resource".
|
||||
The number and type of commands to launch can be reduced via
|
||||
any combination of the following filter conditions:
|
||||
|
||||
--max=<number>
|
||||
Limit the number of _new_ sessions additionally started this time.
|
||||
|
||||
--<column_name>==<value>
|
||||
Only select lines where an arbitrary CSV column (given by its
|
||||
CSV header name in C identifier syntax) has the given value.
|
||||
|
||||
--<column_name>!=<value>
|
||||
Only select lines where the colum has _not_ the given value.
|
||||
|
||||
--<column_name>=~<bash_regex>
|
||||
Only select lines where the bash regular expression matches
|
||||
at the given column.
|
||||
|
||||
--max-per=<number>
|
||||
Limit the number per _distinct_ value of the column denoted by
|
||||
the _next_ filter condition.
|
||||
Example: ./screener.sh run test.csv --dry-run --max-per=2 --dst_network=~.
|
||||
would launch only 2 Football processes per destination network.
|
||||
|
||||
Hint: filter conditions can be easily checked by giving --dry-run.
|
||||
|
||||
Start / restart / kill / continue screen sessions:
|
||||
|
||||
./screener.sh start <screen_id> <cmd> <args...>
|
||||
Start a new screen session, running arbitrary <cmd> and <args...>
|
||||
inside.
|
||||
|
||||
./screener.sh restart <screen_id>
|
||||
Works only when the last command for <screen_id> failed.
|
||||
This will restart the old <cmd> and its <args...> as before.
|
||||
Use only when you want to repeat the same command once again.
|
||||
|
||||
./screener.sh kill <screen_id>
|
||||
Terminate the running screen session forcibly.
|
||||
|
||||
./screener.sh continue
|
||||
./screener.sh continue <screen_id> [<screen_id_list>]
|
||||
./screener.sh continue <number>
|
||||
Useful for MASS automation of processes involving critical sections
|
||||
such as customer downtime.
|
||||
When giving a numerical <number> argument, up to that number
|
||||
of sessions are resumed (ordered by age).
|
||||
When no further arugment is given, _all_ currently waiting sessions
|
||||
are continued.
|
||||
When --auto-attach is given, it will sequentially resume the
|
||||
sessions to be continued. By default, unless --force_attach is set,
|
||||
it uses "screen -r" skipping those sessions which are already
|
||||
attached to somebody else.
|
||||
This feature works only with prepared scripts which are creating
|
||||
an empty flagfile
|
||||
/home/schoebel/mars/mars-migration.git/screener-logdir-testing/running/$screen_id.waiting
|
||||
whenever they want to wait for manual intervention (for whatever reason).
|
||||
Afterwards, the script must be polling this flagfile for removal.
|
||||
This screener operation simply removes the flagfile, such that
|
||||
the script will then continue afterwards.
|
||||
Example: look into ./football.sh
|
||||
and search for occurrences of substring "call_hook start_wait".
|
||||
|
||||
./screener.sh wakeup
|
||||
./screener.sh wakeup <screen_id> [<screen_id_list>]
|
||||
./screener.sh wakeup <number>
|
||||
Similar to continue, but refers to delayed commands waiting for
|
||||
a timeout. This can be used to individually shorten the timeout
|
||||
period.
|
||||
Example: Football cleanup operations may be artificially delayed
|
||||
before doing "lvremove", to keep some sort of 'backup' for a
|
||||
limited time. When your project is under time pressure, these
|
||||
delays may be hindering.
|
||||
Use this for premature ending of such artificial delays.
|
||||
|
||||
./screener.sh up <...>
|
||||
Do both continue and wakeup.
|
||||
|
||||
./screener.sh auto <...>
|
||||
Equivalent to ./screener.sh --auto-attach up <...>
|
||||
Remember that only session without current attachment will be
|
||||
attached to.
|
||||
|
||||
Attach to a running session:
|
||||
|
||||
./screener.sh attach <screen_id>
|
||||
This is equivalent to screen -x $screen_id
|
||||
|
||||
./screener.sh resume <screen_id>
|
||||
This is equivalent to screen -r $screen_id
|
||||
|
||||
Communication:
|
||||
|
||||
./screener.sh notify <screen_id> <txt>
|
||||
May be called from external scripts to send emails etc.
|
||||
|
||||
Locking (only when supported by <cmd>):
|
||||
|
||||
./screener.sh lock
|
||||
./screener.sh unlock
|
||||
./screener.sh lock <screen_id>
|
||||
./screener.sh unlock <screen_id>
|
||||
|
||||
Cleanup / bookkeeping:
|
||||
|
||||
./screener.sh clear-critical <screen_id>
|
||||
./screener.sh clear-serious <screen_id>
|
||||
./screener.sh clear-failed <screen_id>
|
||||
Mark the status as "done" and move the logfile away.
|
||||
|
||||
./screener.sh purge [<days>]
|
||||
This will remove all old logfiles which are older than
|
||||
<days>. By default, the variable $screener_log_purge_period
|
||||
will be used, which is currently set to '30'.
|
||||
|
||||
./screener.sh cron
|
||||
You should call this regulary from a user cron job, in order
|
||||
to purge old logfiles, or to detect hanging sessions, or to
|
||||
automatically send pending emails, etc.
|
||||
|
||||
Options:
|
||||
|
||||
--variable
|
||||
--variable=$value
|
||||
These must come first, in order to prevent mixup with
|
||||
options of <cmd> <args...>.
|
||||
Allows overriding of any internal shell variable.
|
||||
--help --verbose
|
||||
Show all overridable shell variables, also for plugins.
|
||||
|
||||
|
||||
PLUGIN screener-email
|
||||
|
||||
Generic plugin for sending emails (or SMS via gateways)
|
||||
upon status changes, such as script failures.
|
||||
|
||||
\end{verbatim}
|
Loading…
Reference in New Issue
Block a user