mirror of https://github.com/schoebel/mars
doc: update help
This commit is contained in:
parent
16db0dacab
commit
143c339745
File diff suppressed because it is too large
Load Diff
|
@ -1,246 +1,2 @@
|
|||
\begin{verbatim}
|
||||
Usage:
|
||||
./football.sh --help [--verbose]
|
||||
Show help
|
||||
./football.sh --variable=<value>
|
||||
Override any shell variable
|
||||
|
||||
Actions for resource migration:
|
||||
|
||||
./football.sh migrate <resource> <target_primary> [<target_secondary>]
|
||||
Run the sequence
|
||||
migrate_prepare ; migrate_wait ; migrate_finish; migrate_cleanup.
|
||||
|
||||
Dto for testing (do not rely on it):
|
||||
|
||||
./football.sh migrate_prepare <resource> <target_primary> [<target_secondary>]
|
||||
Allocate LVM space at the targets and start MARS replication.
|
||||
|
||||
./football.sh migrate_wait <resource> <target_primary> [<target_secondary>]
|
||||
Wait until MARS replication reports UpToDate.
|
||||
|
||||
./football.sh migrate_finish <resource> <target_primary> [<target_secondary>]
|
||||
Call hooks for handover to the targets.
|
||||
|
||||
./football.sh migrate_cleanup <resource>
|
||||
Remove old / currently unused LV replicas from MARS and deallocate
|
||||
from LVM.
|
||||
|
||||
Actions for inplace FS shrinking:
|
||||
|
||||
./football.sh shrink <resource> <percent>
|
||||
Run the sequence shrink_prepare ; shrink_finish ; shrink_cleanup.
|
||||
|
||||
Dto for testing (do not rely on it):
|
||||
|
||||
./football.sh shrink_prepare <resource> [<percent>]
|
||||
Allocate temporary LVM space (when possible) and create initial
|
||||
raw FS copy.
|
||||
Default percent value(when left out) is 85.
|
||||
|
||||
./football.sh shrink_finish <resource>
|
||||
Incrementally update the FS copy, swap old <=> new copy with
|
||||
small downtime.
|
||||
|
||||
./football.sh shrink_cleanup <resource>
|
||||
Remove old FS copy from LVM.
|
||||
|
||||
Actions for inplace FS extension:
|
||||
|
||||
./football.sh expand <resource> <percent>
|
||||
./football.sh extend <resource> <percent>
|
||||
Increase mounted filesystem size during operations.
|
||||
|
||||
Combined actions:
|
||||
|
||||
./football.sh migrate+shrink <resource> <target_primary> [<target_secondary>] [<percent>]
|
||||
Similar to migrate ; shrink but produces less network traffic.
|
||||
Default percent value (when left out) is 85.
|
||||
|
||||
./football.sh migrate+shrink+back <resource> <tmp_primary> [<percent>]
|
||||
Migrate temporarily to <tmp_primary>, then shrink there,
|
||||
finally migrate back to old primary and secondaries.
|
||||
Default percent value (when left out) is 85.
|
||||
|
||||
Actions for (manual) repair in emergency situations:
|
||||
|
||||
./football.sh manual_handover <resource> <target_primary>
|
||||
This is useful in place of going to the machines and starting
|
||||
handover on their command line. You dont need to log in.
|
||||
All hooks (e.g. for downtime / reporting / etc) are automatically
|
||||
called.
|
||||
Notice: it will only work when there is already a replica
|
||||
at <target_primary>, and when further constraints such as
|
||||
clustermanager constraints will allow it.
|
||||
For a full Football game between different clusters, use
|
||||
"migrate" instead.
|
||||
|
||||
./football.sh manual_migrate_config <resource> <target_primary> [<target_secondary>]
|
||||
Transfer only the cluster config, without changing the MARS replicas.
|
||||
This does no resource stopping / restarting.
|
||||
Useful for reverting a failed migration.
|
||||
|
||||
./football.sh manual_config_update <hostname>
|
||||
Only update the cluster config, without changing anything else.
|
||||
Useful for manual repair of failed migration.
|
||||
|
||||
./football.sh manual_merge_cluster <hostname1> <hostname2>
|
||||
Run "marsadm merge-cluster" for the given hosts.
|
||||
Hostnames must be from different (former) clusters.
|
||||
|
||||
./football.sh manual_split_cluster <hostname_list>
|
||||
Run "marsadm split-cluster" at the given hosts.
|
||||
Useful for fixing failed / asymmetric splits.
|
||||
Hint: provide _all_ hostnames which have formerly participated
|
||||
in the cluster.
|
||||
|
||||
./football.sh repair_vm <resource> <primary_candidate_list>
|
||||
Try to restart the VM <resource> on one of the given machines.
|
||||
Useful during unexpected customer downtime.
|
||||
|
||||
./football.sh repair_mars <resource> <primary_candidate_list>
|
||||
Before restarting the VM like in repair_vm, try to find a local
|
||||
LV where a stand-alone MARS resource can be found and built up.
|
||||
Use this only when the MARS resources are gone, and when you are
|
||||
desperate. Problem: this will likely create a MARS setup which is
|
||||
not usable for production, and therefore must be corrected later
|
||||
by hand. Use this only during an emergency situation in order to
|
||||
get the customers online again, while buying the downsides of this
|
||||
command.
|
||||
|
||||
./football.sh manual_lock <item> <host_list>
|
||||
./football.sh manual_unlock <item> <host_list>
|
||||
Manually lock or unlock an item at all of the given hosts, in
|
||||
an atomic fashion. In most cases, use "ALL" for the item.
|
||||
|
||||
Only for testing / development (no stable interfaces):
|
||||
|
||||
./football.sh manual_call_hook <name> <args>
|
||||
|
||||
Global maintenance:
|
||||
|
||||
./football.sh lv_cleanup <resource>
|
||||
|
||||
General features:
|
||||
|
||||
- Instead of <percent>, an absolute amount of storage with suffix
|
||||
'k' or 'm' or 'g' can be given.
|
||||
|
||||
- When <resource> is currently stopped, login to the container is
|
||||
not possible, and in turn the hypervisor node and primary storage node
|
||||
cannot be automatically determined. In such a case, the missing
|
||||
nodes can be specified via the syntax
|
||||
<resource>:<hypervisor>:<primary_storage>
|
||||
|
||||
- The following LV suffixes are used (naming convention):
|
||||
-tmp = currently emerging version for shrinking
|
||||
-preshrink = old version before shrinking took place
|
||||
|
||||
- By adding the option --screener, you can handover football execution
|
||||
to ./screener.sh .
|
||||
When some --enable_*_waiting is also added, then the critical
|
||||
sections involving customer downtime are temporarily halted until
|
||||
some sysadmins says "screener.sh continue $resource" or
|
||||
attaches to the sessions and presses the RETURN key.
|
||||
|
||||
|
||||
PLUGIN football-1and1config
|
||||
|
||||
1&1 specfic plugin for dealing with the cm3 clusters
|
||||
and its concrete configuration.
|
||||
|
||||
|
||||
PLUGIN football-cm3
|
||||
|
||||
1&1 specfic plugin for dealing with the cm3 cluster manager
|
||||
and its concrete operating enviroment (singleton instance).
|
||||
|
||||
Current maximum cluster size limit:
|
||||
|
||||
Maximum #syncs running before migration can start:
|
||||
|
||||
Following marsadm --version must be installed:
|
||||
|
||||
Following mars kernel modules must be loaded:
|
||||
|
||||
Specific actions for plugin football-cm3:
|
||||
|
||||
./football.sh clustertool {GET|PUT} <url>
|
||||
Call through to the clustertool via REST.
|
||||
Useful for manual inspection and repair.
|
||||
|
||||
Specific features with plugin football-cm3:
|
||||
|
||||
- Parameter syntax "cluster123" instead of "icpu456 icpu457"
|
||||
This is an alternate specification syntax, which is
|
||||
automatically replaced with the real machine names.
|
||||
It tries to minimize datacenter cross-traffic by
|
||||
taking the new $target_primary at the same datacenter
|
||||
location where the container is currenty running.
|
||||
|
||||
|
||||
PLUGIN football-ticket
|
||||
|
||||
Generic plugin for creating and updating tickets,
|
||||
e.g. Jira tickets.
|
||||
|
||||
You will need to hook in some external scripts which are
|
||||
then creating / updating the tickets.
|
||||
|
||||
Comment texts may be provided with following conventions:
|
||||
|
||||
comment.$ticket_state.txt
|
||||
comment.$ticket_phase.$ticket_state.txt
|
||||
|
||||
Directories where comments may reside:
|
||||
|
||||
football_creds=/usr/lib/mars/creds /etc/mars/creds /home/schoebel/mars/football-master.git/creds /home/schoebel/mars/football-master.git /home/schoebel/.mars/creds ./creds
|
||||
football_confs=/usr/lib/mars/confs /etc/mars/confs /home/schoebel/mars/football-master.git/confs /home/schoebel/.mars/confs ./confs
|
||||
football_includes=/usr/lib/mars/plugins /etc/mars/plugins /home/schoebel/mars/football-master.git/plugins /home/schoebel/.mars/plugins ./plugins
|
||||
|
||||
|
||||
PLUGIN football-basic
|
||||
|
||||
Generic driver for systemd-controlled MARS pools.
|
||||
The current version supports only a flat model:
|
||||
(1) There is a single "big cluster" at metadata level.
|
||||
All cluster members are joined via merge-cluster.
|
||||
All occurring names need to be globally unique.
|
||||
(2) The network uses BGP or other means, thus any hypervisor
|
||||
can (potentially) start any VM at any time.
|
||||
(3) iSCSI or remote devices are not supported for now
|
||||
(LocalSharding model). This may be extended in a future
|
||||
release.
|
||||
This plugin is exclusive-or with cm3.
|
||||
|
||||
Plugin specific actions:
|
||||
|
||||
./football.sh basic_add_host <hostname>
|
||||
Manually add another host to the hostname cache.
|
||||
|
||||
|
||||
PLUGIN football-downtime
|
||||
|
||||
Generic plugin for communication of customer downtime.
|
||||
|
||||
|
||||
PLUGIN football-motd
|
||||
|
||||
Generic plugin for motd. Communicate that Football is running
|
||||
at login via motd.
|
||||
|
||||
|
||||
PLUGIN football-report
|
||||
|
||||
Generic plugin for communication of reports.
|
||||
|
||||
|
||||
PLUGIN football-waiting
|
||||
|
||||
Generic plugig, interfacing with screener: when this is used
|
||||
by your script and enabled, then you will be able to wait for
|
||||
"screener.sh continue" operations at certain points in your
|
||||
script.
|
||||
|
||||
|
||||
\end{verbatim}
|
||||
|
|
|
@ -262,15 +262,15 @@ marsadm [<global_options>] view[-<macroname>] [<resource_names> | all ]
|
|||
For details and best practices, please refer to the PDF manual.
|
||||
|
||||
lowlevel-delete-host
|
||||
usage: lowlevel-delete-host <resource_name>
|
||||
usage: lowlevel-ls-host-ips <hostname>
|
||||
Delete cluster member.
|
||||
|
||||
lowlevel-ls-host-ips
|
||||
usage: lowlevel-ls-host-ips <resource_name>
|
||||
usage: lowlevel-ls-host-ips
|
||||
List cluster member names and IP addresses.
|
||||
|
||||
lowlevel-set-host-ip
|
||||
usage: lowlevel-set-host-ip <resource_name>
|
||||
usage: lowlevel-ls-host-ips <hostname> <new_ip>
|
||||
Set IP for host.
|
||||
|
||||
merge-cluster
|
||||
|
@ -537,6 +537,7 @@ marsadm [<global_options>] view[-<macroname>] [<resource_names> | all ]
|
|||
|
||||
|
||||
<primitive_macroname> =
|
||||
count-{cluster,resource}-members
|
||||
deletable-size
|
||||
device-opened
|
||||
errno-text
|
||||
|
@ -552,12 +553,14 @@ marsadm [<global_options>] view[-<macroname>] [<resource_names> | all ]
|
|||
replay-basenr
|
||||
replay-code
|
||||
When negative, this indidates that a replay/recovery error has occurred.
|
||||
resource-possible-size
|
||||
rest-space
|
||||
summary-vector
|
||||
systemd-unit
|
||||
tree
|
||||
uuid
|
||||
wait-{is,todo}-{attach,sync,fetch,replay,primary}-{on,off}
|
||||
writeback-rest
|
||||
{alive,fetch,replay,work}-{timestamp,age,lag}
|
||||
{all,the}-{pretty-,}{global-,}{{err,wrn,inf}-,}msg
|
||||
{cluster,resource}-members
|
||||
|
|
|
@ -1,405 +1,2 @@
|
|||
\begin{verbatim}
|
||||
OVERRIDE verbose=1
|
||||
./screener.sh: Run _unattended_ processes in screen sessions.
|
||||
Useful for MASS automation, running hundreds of unattended
|
||||
commands in parallel.
|
||||
HINT: for running more than ~500 sessions in parallel, you might need
|
||||
some system tuning (e.g. rlimits, kernel patches etc) for creating
|
||||
a huge number of file descritor / sockets / etc.
|
||||
ADVANTAGE: You may attach to individual screens, kill them, or continue
|
||||
some waiting commands.
|
||||
|
||||
Synopsis:
|
||||
./screener.sh --help [--verbose]
|
||||
./screener.sh list-running
|
||||
./screener.sh list-waiting
|
||||
./screener.sh list-interrupted
|
||||
./screener.sh list-illegal
|
||||
./screener.sh list-timeouted
|
||||
./screener.sh list-failed
|
||||
./screener.sh list-critical
|
||||
./screener.sh list-serious
|
||||
./screener.sh list-done
|
||||
./screener.sh list
|
||||
./screener.sh list-archive
|
||||
./screener.sh list-screens
|
||||
./screener.sh run <file.csv> [<condition_list>]
|
||||
./screener.sh start <screen_id> <cmd> <args...>
|
||||
./screener.sh [<options>] <operation> <screen_id>
|
||||
|
||||
Inquiry operations:
|
||||
|
||||
./screener.sh list-screens
|
||||
Equivalent to screen -ls
|
||||
|
||||
./screener.sh list-<type>
|
||||
Show a list of currently running, waiting (for continuation), failed,
|
||||
and done/completed screen sessions.
|
||||
|
||||
./screener.sh list
|
||||
First show a list of currently running screens, then
|
||||
for each <type> a list of (old) failed / completed / sessions
|
||||
(and so on).
|
||||
|
||||
./screener.sh status <screen_id>
|
||||
Like list-*, but filter <sceen_id> and dont report timestamps.
|
||||
|
||||
./screener.sh show <screen_id>
|
||||
Show the last logfile of <screen_id> at standard output.
|
||||
|
||||
./screener.sh less <screen_id>
|
||||
Show the last logfile of <screen_id> using "less -r".
|
||||
|
||||
MASS starting of screen sessions:
|
||||
|
||||
./screener.sh run <file.csv> <condition_list>
|
||||
Commands are launched in screen sessions via "./screener.sh start" commands,
|
||||
unless the same <screen_id> is already running,
|
||||
or is in some error state, or is already done (see below).
|
||||
The commands are given by a column with CSV header name
|
||||
containing "command", or by the first column.
|
||||
The <screen_id> needs to be given by a column with CSV header
|
||||
name matching "screen_id|resource".
|
||||
The number and type of commands to launch can be reduced via
|
||||
any combination of the following filter conditions:
|
||||
|
||||
--max=<number>
|
||||
Limit the number of _new_ sessions additionally started this time.
|
||||
|
||||
--<column_name>==<value>
|
||||
Only select lines where an arbitrary CSV column (given by its
|
||||
CSV header name in C identifier syntax) has the given value.
|
||||
|
||||
--<column_name>!=<value>
|
||||
Only select lines where the colum has _not_ the given value.
|
||||
|
||||
--<column_name>=~<bash_regex>
|
||||
Only select lines where the bash regular expression matches
|
||||
at the given column.
|
||||
|
||||
--max-per=<number>
|
||||
Limit the number per _distinct_ value of the column denoted by
|
||||
the _next_ filter condition.
|
||||
Example: ./screener.sh run test.csv --dry-run --max-per=2 --dst_network=~.
|
||||
would launch only 2 Football processes per destination network.
|
||||
|
||||
Hint: filter conditions can be easily checked by giving --dry-run.
|
||||
|
||||
Start / restart / kill / continue screen sessions:
|
||||
|
||||
./screener.sh start <screen_id> <cmd> <args...>
|
||||
Start a new screen session, running arbitrary <cmd> and <args...>
|
||||
inside.
|
||||
|
||||
./screener.sh restart <screen_id>
|
||||
Works only when the last command for <screen_id> failed.
|
||||
This will restart the old <cmd> and its <args...> as before.
|
||||
Use only when you want to repeat the same command once again.
|
||||
|
||||
./screener.sh kill <screen_id>
|
||||
Terminate the running screen session forcibly.
|
||||
|
||||
./screener.sh continue
|
||||
./screener.sh continue <screen_id> [<screen_id_list>]
|
||||
./screener.sh continue <number>
|
||||
Useful for MASS automation of processes involving critical sections
|
||||
such as customer downtime.
|
||||
When giving a numerical <number> argument, up to that number
|
||||
of sessions are resumed (ordered by age).
|
||||
When no further arugment is given, _all_ currently waiting sessions
|
||||
are continued.
|
||||
When --auto-attach is given, it will sequentially resume the
|
||||
sessions to be continued. By default, unless --force_attach is set,
|
||||
it uses "screen -r" skipping those sessions which are already
|
||||
attached to somebody else.
|
||||
This feature works only with prepared scripts which are creating
|
||||
an empty flagfile
|
||||
/home/schoebel/mars/mars-migration.git/screener-logdir-testing/running/$screen_id.waiting
|
||||
whenever they want to wait for manual intervention (for whatever reason).
|
||||
Afterwards, the script must be polling this flagfile for removal.
|
||||
This screener operation simply removes the flagfile, such that
|
||||
the script will then continue afterwards.
|
||||
Example: look into ./football.sh
|
||||
and search for occurrences of substring "call_hook start_wait".
|
||||
|
||||
./screener.sh wakeup
|
||||
./screener.sh wakeup <screen_id> [<screen_id_list>]
|
||||
./screener.sh wakeup <number>
|
||||
Similar to continue, but refers to delayed commands waiting for
|
||||
a timeout. This can be used to individually shorten the timeout
|
||||
period.
|
||||
Example: Football cleanup operations may be artificially delayed
|
||||
before doing "lvremove", to keep some sort of 'backup' for a
|
||||
limited time. When your project is under time pressure, these
|
||||
delays may be hindering.
|
||||
Use this for premature ending of such artificial delays.
|
||||
|
||||
./screener.sh up <...>
|
||||
Do both continue and wakeup.
|
||||
|
||||
./screener.sh auto <...>
|
||||
Equivalent to ./screener.sh --auto-attach up <...>
|
||||
Remember that only session without current attachment will be
|
||||
attached to.
|
||||
|
||||
Attach to a running session:
|
||||
|
||||
./screener.sh attach <screen_id>
|
||||
This is equivalent to screen -x $screen_id
|
||||
|
||||
./screener.sh resume <screen_id>
|
||||
This is equivalent to screen -r $screen_id
|
||||
|
||||
Communication:
|
||||
|
||||
./screener.sh notify <screen_id> <txt>
|
||||
May be called from external scripts to send emails etc.
|
||||
|
||||
Locking (only when supported by <cmd>):
|
||||
|
||||
./screener.sh lock
|
||||
./screener.sh unlock
|
||||
./screener.sh lock <screen_id>
|
||||
./screener.sh unlock <screen_id>
|
||||
|
||||
Cleanup / bookkeeping:
|
||||
|
||||
./screener.sh clear-critical <screen_id>
|
||||
./screener.sh clear-serious <screen_id>
|
||||
./screener.sh clear-interrupted <screen_id>
|
||||
./screener.sh clear-illegal <screen_id>
|
||||
./screener.sh clear-timeouted <screen_id>
|
||||
./screener.sh clear-failed <screen_id>
|
||||
Mark the status as "done" and move the logfile away.
|
||||
|
||||
./screener.sh purge [<days>]
|
||||
This will remove all old logfiles which are older than
|
||||
<days>. By default, the variable $screener_log_purge_period
|
||||
will be used, which is currently set to '30'.
|
||||
|
||||
./screener.sh cron
|
||||
You should call this regulary from a user cron job, in order
|
||||
to purge old logfiles, or to detect hanging sessions, or to
|
||||
automatically send pending emails, etc.
|
||||
|
||||
Options:
|
||||
|
||||
--variable
|
||||
--variable=$value
|
||||
These must come first, in order to prevent mixup with
|
||||
options of <cmd> <args...>.
|
||||
Allows overriding of any internal shell variable.
|
||||
--help --verbose
|
||||
Show all overridable shell variables, also for plugins.
|
||||
|
||||
## screener_includes
|
||||
# List of directories where screener-*.conf files can be found.
|
||||
screener_includes="${screener_includes:-/usr/lib/mars/plugins /etc/mars/plugins $script_dir/plugins $HOME/.mars/plugins ./plugins}"
|
||||
|
||||
## screener_confs
|
||||
# Another list of directories where screener-*.conf files can be found.
|
||||
# These are sourced in a second pass after $screener_includes.
|
||||
# Thus you can change this during the first pass.
|
||||
screener_confs="${screener_confs:-/usr/lib/mars/confs /etc/mars/confs $script_dir/confs $HOME/.mars/confs ./confs}"
|
||||
|
||||
## title
|
||||
# Used as a title for startup of screen sessions, and later for
|
||||
# display at list-*
|
||||
title="${title:-}"
|
||||
|
||||
## auto_attach
|
||||
# Upon start or upon continue/wakuep/up, attach to the
|
||||
# (newly created or existing) session.
|
||||
auto_attach="${auto_attach:-0}"
|
||||
|
||||
## auto_attach_grace
|
||||
# Before attaching, wait this time in seconds.
|
||||
# The user may abort within this sleep time by
|
||||
# pressing Ctrl-C.
|
||||
auto_attach_grace="${auto_attach_grace:-10}"
|
||||
|
||||
## force_attach
|
||||
# Use "screen -x" instead of "screen -r" allowing
|
||||
# shared sessions between different users / end terminals.
|
||||
force_attach="${force_attach:-0}"
|
||||
|
||||
## drop_shell
|
||||
# When a <cmd> fails, the screen session will not terminated immediately.
|
||||
# Instead, an interactive bash is started, so can later attach and
|
||||
# rectify any probllems.
|
||||
# WARNING! only activate this if you regulary check for failed sessions
|
||||
# and then manually attach to them. Don't use this when running hundreds
|
||||
# or thousand in parallel.
|
||||
drop_shell="${drop_shell:-0}"
|
||||
|
||||
## session_timeout
|
||||
# Detect hanging sessions when they don't produce any output anymore
|
||||
# for a longer time. Hanging sessions are then marked as either
|
||||
# 'timeout' or 'critical'.
|
||||
session_timeout="${session_timeout:-$(( 3600 * 3 ))}" # seconds
|
||||
|
||||
## screener_logdir or logdir
|
||||
# Where the logfiles and all status information is kept.
|
||||
export screener_logdir="${screener_logdir:-${logdir:-$HOME/screener-logs}}"
|
||||
|
||||
## screener_command_log
|
||||
# This logfile will accumulate all relevant $0 command invocations,
|
||||
# including timestamps and ssh agent identities.
|
||||
# To switch off, use /dev/null here.
|
||||
screener_command_log="${screener_command_log:-$screener_logdir/commands.log}"
|
||||
|
||||
## screener_cron_log
|
||||
# Since "$0 cron" works silently, you won't notice any errors.
|
||||
# This logfiles gives you a chance for checking any problems.
|
||||
screener_cron_log="${screener_cron_log:-$screener_logdir/cron.log}"
|
||||
|
||||
## screener_log_purge_period
|
||||
# $0 cron or $0 purge will automatically remove all old logfiles
|
||||
# from $screener_logdir/*/ when this period is exceeded.
|
||||
screener_log_purge_period="${screener_log_purge_period:-30}" # Days
|
||||
|
||||
## screener_log_purge_archive
|
||||
# When set, the logfiles will be moved to $screener_logdir/archive/
|
||||
# Otherwise they will be deleted.
|
||||
screener_log_purge_archive="${screener_log_purge_archive:-1}"
|
||||
|
||||
## dry_run
|
||||
# Dont actually start screen sessions when set.
|
||||
dry_run="${dry_run:-0}"
|
||||
|
||||
## verbose
|
||||
# increase speakiness.
|
||||
verbose=${verbose:-0}
|
||||
|
||||
## debug
|
||||
# Some additional debug messages.
|
||||
debug="${debug:-0}"
|
||||
|
||||
## sleep
|
||||
# Workaround races by keeping sessions open for a few seconds.
|
||||
# This is useful for debugging of immediate script failures.
|
||||
# You have some short time window for attaching.
|
||||
# HINT: instead, just inspect the logfiles in $screener_logdir/*/*.log
|
||||
sleep="${sleep:-3}"
|
||||
|
||||
## screen_cmd
|
||||
# Customize the screen command (e.g. add some further options, etc).
|
||||
screen_cmd="${screen_cmd:-screen}"
|
||||
|
||||
## use_screenlog
|
||||
# Add the -L option. Not really useful when running thousands of
|
||||
# parallel screen sessions, because the automatically generated filenames
|
||||
# are crap, and cannot be set in advance.
|
||||
# Useful for basic debugging of setup problems etc.
|
||||
use_screenlog="${use_screenlog:-0}"
|
||||
|
||||
## waiting_txt and delay_txt and condition_txt
|
||||
# RTFS Don't use this, unless you know what you are doing.
|
||||
waiting_txt="${waiting_txt:-SCREENER_waiting_WAIT}"
|
||||
delayed_txt="${delayed_txt:-SCREENER_delayed_WAIT}"
|
||||
condition_txt="${condition_txt:-SCREENER_condition_WAIT}"
|
||||
|
||||
## critical_status
|
||||
# This is the "magic" exit code indicating _criticality_
|
||||
# of a failed command.
|
||||
critical_status="${critical_status:-199}"
|
||||
|
||||
## serious_status
|
||||
# This is the "magic" exit code indicating _seriosity_
|
||||
# of a failed command.
|
||||
serious_status="${serious_status:-198}"
|
||||
|
||||
## interrupted_status
|
||||
# This is the "magic" exit code indicating a manual interruption
|
||||
# (e.g. keypress Ctl-c)
|
||||
interrupted_status="${interrupted_status:-190}"
|
||||
|
||||
## illegal_status
|
||||
# This is the "magic" exit code indicating an illegal command
|
||||
# (e.g. syntax error, illegal arguments, etc)
|
||||
illegal_status="${illegal_status:-191}"
|
||||
|
||||
## timeouted_status
|
||||
# This is the "magic" internal code indicating a
|
||||
# hanging session (see $session_timeout).
|
||||
timeouted_status="${timeouted_status:-195}"
|
||||
|
||||
## less_cmd
|
||||
# Used at $0 less $id
|
||||
less_cmd="${less_cmd:-less -r}"
|
||||
|
||||
## date_format
|
||||
# Here you can customize the appearance of list-* commands
|
||||
date_format="${date_format:-%Y-%m-%d %H:%M}"
|
||||
|
||||
## csv_delimit
|
||||
# The delimiter used for CSV file parsing
|
||||
csv_delim="${csv_delim:-;}"
|
||||
|
||||
## csv_cmd_fields
|
||||
# Regex telling the field name for 'cmd'
|
||||
csv_cmd_fields="${csv_cmd_fields:-command}"
|
||||
|
||||
## csv_id_fields
|
||||
# Regex telling the field name for 'screen_id'
|
||||
csv_id_fields="${csv_id_fields:-screen_id|resource}"
|
||||
|
||||
## csv_remove
|
||||
# Regex for global removal of command options
|
||||
csv_remove="${csv_remove:---screener}"
|
||||
|
||||
## user_name
|
||||
# Normally automatically derived from ssh agent or from $LOGNAME.
|
||||
# Please override this only when really necessary.
|
||||
export user_name="${user_name:-$(ssh-add -l | grep -o '[^ ]+@[^ ]+' | sort -u | tail -1)}"
|
||||
export user_name="${user_name:-$LOGNAME}"
|
||||
|
||||
## screener_break_timeout
|
||||
# Avoid deadlocks by breaking a screener lock after this timeout has elapsed.
|
||||
# NOTICE: these type of locks are only intended for short-term locking.
|
||||
screener_break_timeout="${screener_break_timeout:-30}" # seconds
|
||||
|
||||
## tmp_dir and tmp_stub
|
||||
# Where temporary files are residing
|
||||
tmp_dir="${tmp_dir:-/tmp}"
|
||||
tmp_stub="${tmp_stub:-$tmp_dir/screener.$$}"
|
||||
|
||||
Running hook: email_describe_plugin
|
||||
|
||||
PLUGIN screener-email
|
||||
|
||||
Generic plugin for sending emails (or SMS via gateways)
|
||||
upon status changes, such as script failures.
|
||||
|
||||
## email_*
|
||||
# List of email addresses.
|
||||
# Empty = don't send emails.
|
||||
email_critical="${email_critical:-}"
|
||||
email_serious="${email_serious:-}"
|
||||
email_failed="${email_failed:-}"
|
||||
email_warning="${email_warning:-}"
|
||||
email_waiting="${email_waiting:-}"
|
||||
email_done="${email_done:-}"
|
||||
|
||||
## sms_*
|
||||
# List of email addresses of SMS gateways.
|
||||
# These may be distinct from email_*.
|
||||
# Empty = don't send sms.
|
||||
sms_critical="${sms_critical:-}"
|
||||
sms_serious="${sms_serious:-}"
|
||||
sms_failed="${sms_failed:-}"
|
||||
sms_warning="${sms_warning:-}"
|
||||
sms_waiting="${sms_waiting:-}"
|
||||
sms_done="${sms_done:-}"
|
||||
|
||||
## email_cmd
|
||||
# Command for email sending.
|
||||
# Please include your gateways etc here.
|
||||
email_cmd="${email_cmd:-mailx -S smtp=mx.nowhere.org:587 -S smpt-auth-user=test}"
|
||||
|
||||
## email_logfiles
|
||||
# Whether to include logfiles in the body.
|
||||
# Not used for sms_*.
|
||||
email_logfiles="${email_logfiles:-1}"
|
||||
|
||||
\end{verbatim}
|
||||
|
|
|
@ -1,200 +1,2 @@
|
|||
\begin{verbatim}
|
||||
./screener.sh: Run _unattended_ processes in screen sessions.
|
||||
Useful for MASS automation, running hundreds of unattended
|
||||
commands in parallel.
|
||||
HINT: for running more than ~500 sessions in parallel, you might need
|
||||
some system tuning (e.g. rlimits, kernel patches etc) for creating
|
||||
a huge number of file descritor / sockets / etc.
|
||||
ADVANTAGE: You may attach to individual screens, kill them, or continue
|
||||
some waiting commands.
|
||||
|
||||
Synopsis:
|
||||
./screener.sh --help [--verbose]
|
||||
./screener.sh list-running
|
||||
./screener.sh list-waiting
|
||||
./screener.sh list-interrupted
|
||||
./screener.sh list-illegal
|
||||
./screener.sh list-timeouted
|
||||
./screener.sh list-failed
|
||||
./screener.sh list-critical
|
||||
./screener.sh list-serious
|
||||
./screener.sh list-done
|
||||
./screener.sh list
|
||||
./screener.sh list-archive
|
||||
./screener.sh list-screens
|
||||
./screener.sh run <file.csv> [<condition_list>]
|
||||
./screener.sh start <screen_id> <cmd> <args...>
|
||||
./screener.sh [<options>] <operation> <screen_id>
|
||||
|
||||
Inquiry operations:
|
||||
|
||||
./screener.sh list-screens
|
||||
Equivalent to screen -ls
|
||||
|
||||
./screener.sh list-<type>
|
||||
Show a list of currently running, waiting (for continuation), failed,
|
||||
and done/completed screen sessions.
|
||||
|
||||
./screener.sh list
|
||||
First show a list of currently running screens, then
|
||||
for each <type> a list of (old) failed / completed / sessions
|
||||
(and so on).
|
||||
|
||||
./screener.sh status <screen_id>
|
||||
Like list-*, but filter <sceen_id> and dont report timestamps.
|
||||
|
||||
./screener.sh show <screen_id>
|
||||
Show the last logfile of <screen_id> at standard output.
|
||||
|
||||
./screener.sh less <screen_id>
|
||||
Show the last logfile of <screen_id> using "less -r".
|
||||
|
||||
MASS starting of screen sessions:
|
||||
|
||||
./screener.sh run <file.csv> <condition_list>
|
||||
Commands are launched in screen sessions via "./screener.sh start" commands,
|
||||
unless the same <screen_id> is already running,
|
||||
or is in some error state, or is already done (see below).
|
||||
The commands are given by a column with CSV header name
|
||||
containing "command", or by the first column.
|
||||
The <screen_id> needs to be given by a column with CSV header
|
||||
name matching "screen_id|resource".
|
||||
The number and type of commands to launch can be reduced via
|
||||
any combination of the following filter conditions:
|
||||
|
||||
--max=<number>
|
||||
Limit the number of _new_ sessions additionally started this time.
|
||||
|
||||
--<column_name>==<value>
|
||||
Only select lines where an arbitrary CSV column (given by its
|
||||
CSV header name in C identifier syntax) has the given value.
|
||||
|
||||
--<column_name>!=<value>
|
||||
Only select lines where the colum has _not_ the given value.
|
||||
|
||||
--<column_name>=~<bash_regex>
|
||||
Only select lines where the bash regular expression matches
|
||||
at the given column.
|
||||
|
||||
--max-per=<number>
|
||||
Limit the number per _distinct_ value of the column denoted by
|
||||
the _next_ filter condition.
|
||||
Example: ./screener.sh run test.csv --dry-run --max-per=2 --dst_network=~.
|
||||
would launch only 2 Football processes per destination network.
|
||||
|
||||
Hint: filter conditions can be easily checked by giving --dry-run.
|
||||
|
||||
Start / restart / kill / continue screen sessions:
|
||||
|
||||
./screener.sh start <screen_id> <cmd> <args...>
|
||||
Start a new screen session, running arbitrary <cmd> and <args...>
|
||||
inside.
|
||||
|
||||
./screener.sh restart <screen_id>
|
||||
Works only when the last command for <screen_id> failed.
|
||||
This will restart the old <cmd> and its <args...> as before.
|
||||
Use only when you want to repeat the same command once again.
|
||||
|
||||
./screener.sh kill <screen_id>
|
||||
Terminate the running screen session forcibly.
|
||||
|
||||
./screener.sh continue
|
||||
./screener.sh continue <screen_id> [<screen_id_list>]
|
||||
./screener.sh continue <number>
|
||||
Useful for MASS automation of processes involving critical sections
|
||||
such as customer downtime.
|
||||
When giving a numerical <number> argument, up to that number
|
||||
of sessions are resumed (ordered by age).
|
||||
When no further arugment is given, _all_ currently waiting sessions
|
||||
are continued.
|
||||
When --auto-attach is given, it will sequentially resume the
|
||||
sessions to be continued. By default, unless --force_attach is set,
|
||||
it uses "screen -r" skipping those sessions which are already
|
||||
attached to somebody else.
|
||||
This feature works only with prepared scripts which are creating
|
||||
an empty flagfile
|
||||
/home/schoebel/mars/mars-migration.git/screener-logdir-testing/running/$screen_id.waiting
|
||||
whenever they want to wait for manual intervention (for whatever reason).
|
||||
Afterwards, the script must be polling this flagfile for removal.
|
||||
This screener operation simply removes the flagfile, such that
|
||||
the script will then continue afterwards.
|
||||
Example: look into ./football.sh
|
||||
and search for occurrences of substring "call_hook start_wait".
|
||||
|
||||
./screener.sh wakeup
|
||||
./screener.sh wakeup <screen_id> [<screen_id_list>]
|
||||
./screener.sh wakeup <number>
|
||||
Similar to continue, but refers to delayed commands waiting for
|
||||
a timeout. This can be used to individually shorten the timeout
|
||||
period.
|
||||
Example: Football cleanup operations may be artificially delayed
|
||||
before doing "lvremove", to keep some sort of 'backup' for a
|
||||
limited time. When your project is under time pressure, these
|
||||
delays may be hindering.
|
||||
Use this for premature ending of such artificial delays.
|
||||
|
||||
./screener.sh up <...>
|
||||
Do both continue and wakeup.
|
||||
|
||||
./screener.sh auto <...>
|
||||
Equivalent to ./screener.sh --auto-attach up <...>
|
||||
Remember that only session without current attachment will be
|
||||
attached to.
|
||||
|
||||
Attach to a running session:
|
||||
|
||||
./screener.sh attach <screen_id>
|
||||
This is equivalent to screen -x $screen_id
|
||||
|
||||
./screener.sh resume <screen_id>
|
||||
This is equivalent to screen -r $screen_id
|
||||
|
||||
Communication:
|
||||
|
||||
./screener.sh notify <screen_id> <txt>
|
||||
May be called from external scripts to send emails etc.
|
||||
|
||||
Locking (only when supported by <cmd>):
|
||||
|
||||
./screener.sh lock
|
||||
./screener.sh unlock
|
||||
./screener.sh lock <screen_id>
|
||||
./screener.sh unlock <screen_id>
|
||||
|
||||
Cleanup / bookkeeping:
|
||||
|
||||
./screener.sh clear-critical <screen_id>
|
||||
./screener.sh clear-serious <screen_id>
|
||||
./screener.sh clear-interrupted <screen_id>
|
||||
./screener.sh clear-illegal <screen_id>
|
||||
./screener.sh clear-timeouted <screen_id>
|
||||
./screener.sh clear-failed <screen_id>
|
||||
Mark the status as "done" and move the logfile away.
|
||||
|
||||
./screener.sh purge [<days>]
|
||||
This will remove all old logfiles which are older than
|
||||
<days>. By default, the variable $screener_log_purge_period
|
||||
will be used, which is currently set to '30'.
|
||||
|
||||
./screener.sh cron
|
||||
You should call this regulary from a user cron job, in order
|
||||
to purge old logfiles, or to detect hanging sessions, or to
|
||||
automatically send pending emails, etc.
|
||||
|
||||
Options:
|
||||
|
||||
--variable
|
||||
--variable=$value
|
||||
These must come first, in order to prevent mixup with
|
||||
options of <cmd> <args...>.
|
||||
Allows overriding of any internal shell variable.
|
||||
--help --verbose
|
||||
Show all overridable shell variables, also for plugins.
|
||||
|
||||
|
||||
PLUGIN screener-email
|
||||
|
||||
Generic plugin for sending emails (or SMS via gateways)
|
||||
upon status changes, such as script failures.
|
||||
|
||||
\end{verbatim}
|
||||
|
|
Loading…
Reference in New Issue