mars/userspace/marsadm.8

560 lines
21 KiB
Groff

.TH marsadm 8 "March 06, 2015" "MARS light0.1stable12\-7\-gc2f5f29" "MARS Admin Tool"
.SH NAME
marsadm \- Administration tool for MARS.\" marsadm
.SH SYNOPSIS
.B marsadm \-h \-\-help
.br
.B marsadm \-v \-\-version
.br
.B marsadm {\fIglobal_option\fR} \fIcommand\fR [\fIresource_name\fR | all | {arg}]
.br
.B marsadm {\fIglobal_option\fR} view[\-\fImacroname\fR] [\fIresource_name\fR | all]
.br
.SH DESCRIPTION
MARS is a kernel\-level asynchronous replication system for block devices.
It is tailored for long\-distance replication and through network
bottlenecks.
By default, it works asynchronously (in contrast to DRBD).
.br
More infomation on concepts and differences to DRDB can be found at
https://github.com/schoebel/mars/blob/master/docu/MARS_LinuxTag2014.pdf?raw=true
.SH OPTIONS
\fB\-\-force\fR
Skip safety checks.
Use this only when you really know what you are doing!
Warning! This is dangerous! First try \fB\-\-dry\-run\fR.
Not combinable with 'all'.
\fB\-\-dry\-run\fR
Don't modify the symlink tree, but tell what would be done.
Use this before starting potentially harmful actions such as
'delete\-resource'.
\fB\-\-verbose\fR
Increase speakyness of some commands.
\fB\-\-logger\fR=/path/to/usr/bin/logger
Use an alternative syslog messenger.
When empty, disable syslogging.
\fB\-\-timeout\fR=\fIseconds\fR
Abort safety checks after timeout with an error.
When giving 'all' as resource agument, this works for each
resource independently.
\fB\-\-window\fR=\fIseconds\fR
Treat other cluster nodes as healthy when some communcation has
occured during the given time window.
\fB\-\-threshold\fR=\fIbytes\fR
Some macros like 'fetch\-threshold\-reached' use this for determining
their sloppyness.
\fB\-\-host\fR=\fIhostname\fR
Act as if the command was running on cluster node \fIhostname\fR.
Warning! This is dangerous! First try \fB\-\-dry\-run\fR
\fB\-\-ip\fR=\fIip\fR
Override the IP address stored in the symlink tree, as well as
the default IP determined from the list of network interfaces.
Usually you will need this only at 'create\-cluster' or
'join\-cluster' for resolving ambiguities.
\fB\-\-macro\fR=\fItext\fR
Handy for testing short macro evaluations at the command line.
.SH ORDINARY COMMANDS
\fB attach\fR
usage: attach \fIresource_name\fR
Attaches the local disk (backing block device) to the resource.
The disk must have been previously configured at
{create,join}\-resource.
When designated as a primary, /dev/mars/$res will also appear.
This does not change the state of {fetch,replay}.
For a complete local startup of the resource, use 'marsadm up'.
\fB cat\fR
usage: cat \fIpath\fR
Print internal debug output in human readable form.
Numerical timestamps and numerical error codes are replaced
by more readable means.
Example: marsadm cat /mars/5.total.status
\fB connect\fR
usage: connect \fIresource_name\fR
See resume\-fetch\-local.
\fB connect\-global\fR
usage: connect\-global \fIresource_name\fR
Like resume\-fetch\-local, but affects all resource members
in the cluster (remotely).
\fB connect\-local\fR
usage: connect\-local \fIresource_name\fR
See resume\-fetch\-local.
\fB create\-cluster\fR
usage: create\-cluster (no parameters)
This must be called exactly once when creating a new cluster.
Don't call this again! Use join\-cluster on the secondary nodes.
Please read the PDF manual for details.
\fB create\-resource\fR
usage: create\-resource \fIresource_name\fR \fI/dev/lv/mydata\fR
(further syntax variants are described in the PDF manual).
Create a new resource out of a pre\-existing disk (backing
block device) /dev/lv/mydata (or similar).
The current node will start in primary role, thus
/dev/mars/\fIresource_name\fR will appear after a short time, initially
showing the same contents as the underlying disk /dev/lv/mydata.
It is good practice to name the resource \fIresource_name\fR and the
disk name identical.
\fB delete\-resource\fR
usage: delete\-resource \fIresource_name\fR
CAUTION! This is dangerous when the network is somehow
interrupted, or when damaged nodes are later re\-surrected
in any way.
Precondition: the resource must no longer have any members
(see leave\-resource).
This is only needed when you _insist_ on re\-using a damaged
resource for re\-creating a new one with exactly the same
old \fIresource_name\fR.
HINT: best practice is to not use this, but just create a _new_
resource with a new \fIresource_name\fR out of your local disks.
Please read the PDF manual on potential consequences.
\fB detach\fR
usage: detach \fIresource_name\fR
Detaches the local disk (backing block device) from the
MARS resource.
Caution! you may read data from the local disk afterwards,
but ensure that no data is written to it!
Otherwise, you are likely to produce harmful inconsistencies.
When running in primary role, /dev/mars/$res will also disappear.
This does not change the state of {fetch,replay}.
For a complete local shutdown of the resource, use 'marsadm down'.
\fB disconnect\fR
usage: disconnect \fIresource_name\fR
See pause\-fetch\-local.
\fB disconnect\-global\fR
usage: disconnect\-global \fIresource_name\fR
Like pause\-fetch\-local, but affects all resource members
in the cluster (remotely).
\fB disconnect\-local\fR
usage: disconnect\-local \fIresource_name\fR
See pause\-fetch\-local.
\fB down\fR
usage: down \fIresource_name\fR
Shortcut for detach + pause\-sync + pause\-fetch + pause\-replay.
\fB get\-emergency\-limit\fR
usage: get\-emergency\-limit \fIresource_name\fR
Counterpart of set\-emergency\-limit (per\-resource emergency limit)
\fB get\-sync\-limit\-value\fR
usage: get\-sync\-limit\-value (no parameters)
For retrieval of the value set by set\-sync\-limit\-value.
\fB invalidate\fR
usage: invalidate \fIresource_name\fR
Only useful on a secondary node.
Forces MARS to consider the local replica disk as being
inconsistent, and therefore starting a fast full\-sync from
the currently designated primary node (which must exist;
therefore avoid the 'secondary' command).
This is usually needed for resolving emergency mode.
When having k=2 replicas, this can be also used for
quick\-and\-simple split\-brain resolution.
In other cases, or when the split\-brain is not resolved by
this command, please use the 'leave\-resource' / 'join\-resource'
method as described in the PDF manual (in the right order as
described there).
\fB join\-cluster\fR
usage: join\-cluster \fIhostname_of_primary\fR
Establishes a new cluster membership.
This must be called once on any new cluster member.
This is a prerequisite for join\-resource.
\fB join\-resource\fR
usage: join\-resource \fIresource_name\fR \fI/dev/lv/mydata\fR
(further syntax variants are described in the PDF manual).
The resource \fIresource_name\fR must have been already created on
another cluster node, and the network must be healthy.
The contents of the local replica disk /dev/lv/mydata will be
overwritten by the initial fast full sync from the currently
designated primary node.
After the initial full sync has finished, the current host will
act in secondary role.
For details on size constraints etc, refer to the PDF manual.
\fB leave\-cluster\fR
usage: leave\-cluster (no parameters)
This can be used for final deconstruction of a cluster member.
Prior to this, all resources must have been left
via leave\-resource.
Notice: this will never destroy the cluster UID on the /mars/
filesystem.
Please read the PDF manual for details.
\fB leave\-resource\fR
usage: leave\-resource \fIresource_name\fR
Precondition: the local host must be in secondary role.
Stop being a member of the resource, and thus stop all
replication activities. The status of the underlying disk
will remain in its current state (whatever it is).
\fB log\-delete\fR
usage: log\-delete \fIresource_name\fR
When possible, globally delete at most one old transaction logfile
which is known to be superfluous, i.e. all secondaries no longer
need to replay it.
Hint: use this only for testing and manual inspection.
For regular maintainance cron jobs, please prefer log\-delete\-all.
\fB log\-delete\-all\fR
usage: log\-delete\-all \fIresource_name\fR
When possible, globally delete all old transaction logfiles which
are known to be superflous, i.e. all secondaries no longer need
to replay them.
This must be regularly called by a cron job or similar, in order
to prevent overflow of the /mars/ directory.
For details and best practices, please refer to the PDF manual.
\fB log\-purge\-all\fR
usage: log\-purge\-all \fIresource_name\fR
This is potentially dangerous.
Use this only if you are really desperate in trying to resolve a
split brain. Use this only after reading the PDF manual!
\fB log\-rotate\fR
usage: log\-rotate \fIresource_name\fR
Only useful at the primary side.
Start writing transaction logs into a new transaction logfile.
This must be regularly called by a cron job or similar.
For details and best practices, please refer to the PDF manual.
\fB pause\-fetch\fR
usage: pause\-fetch \fIresource_name\fR
See pause\-fetch\-local.
\fB pause\-fetch\-global\fR
usage: pause\-fetch\-global \fIresource_name\fR
Like pause\-fetch\-local, but affects all resource members
in the cluster (remotely).
\fB pause\-fetch\-local\fR
usage: pause\-fetch\-local \fIresource_name\fR
Stop fetching transaction logfiles from the current
designated primary.
This is independent from any {pause,resume}\-replay operations.
Only useful on a secondary node.
\fB pause\-replay\fR
usage: pause\-replay \fIresource_name\fR
See pause\-replay\-local.
\fB pause\-replay\-global\fR
usage: pause\-replay\-global \fIresource_name\fR
Like pause\-replay\-local, but affects all resource members
in the cluster (remotely).
\fB pause\-replay\-local\fR
usage: pause\-replay\-local \fIresource_name\fR
Stop replaying transaction logfiles for now.
This is independent from any {pause,resume}\-fetch operations.
This may be used for freezing the state of your replica for some
time, if you have enough space on /mars/.
Only useful on a secondary node.
\fB pause\-sync\fR
usage: pause\-sync \fIresource_name\fR
See pause\-sync\-local.
\fB pause\-sync\-global\fR
usage: pause\-sync\-global \fIresource_name\fR
Like pause\-sync\-local, but affects all resource members
in the cluster (remotely).
\fB pause\-sync\-local\fR
usage: pause\-sync\-local \fIresource_name\fR
Pause the initial data sync at current stage.
This has only an effect if a sync is actually running (i.e.
there is something to be actually synced).
Don't pause too long, because the local replica will remain
inconsistent during the pause.
Use this only for limited reduction of system load.
Only useful on a secondary node.
\fB primary\fR
usage: primary \fIresource_name\fR
Promote the resource into primary role.
This is necessary for /dev/mars/$res to appear on the local host.
Notice: by concept there can be only _one_ designated primary
in a cluster at the same time.
The role change is automatically distributed to the other nodes
in the cluster, provided that the network is healthy.
The old primary node will _automatically_ go
into secondary role first. This is different from DRBD!
With MARS, you don't need an intermediate 'secondary' command
for switching roles.
It is usually better to _directly_ switch the primary roles
between both hosts.
When \-\-force is not given, a planned handover is started:
the local host will only become actually primary _after_ the
old primary is gone, and all old transaction logs have been
fetched and replayed at the new designated priamry.
When \-\-force is given, no handover is attempted. A a consequence,
a split brain situation is likely to emerge.
Thus, use \-\-force only after an ordinary handover attempt has
failed, and when you don't care about the split brain.
For more details, please refer to the PDF manual.
\fB resize\fR
usage: resize \fIresource_name\fR
Prerequisite: all underlying disks (usually /dev/vg/$res) must
have been already increased, e.g. at the LVM layer (cf. lvresize).
Causes MARS to re\-examine all sizing constraints on all members of
the resource, and increase the global logical size of the resource
accordingly.
Shrinking is currently not yet implemented.
When successful, /dev/mars/$res at the primary will be increased
in size. In addition, all secondaries will start an incremental
fast full\-sync to get the enlarged parts from the primary.
\fB resume\-fetch\fR
usage: resume\-fetch \fIresource_name\fR
See resume\-fetch\-local.
\fB resume\-fetch\-global\fR
usage: resume\-fetch\-global \fIresource_name\fR
Like resume\-fetch\-local, but affects all resource members
in the cluster (remotely).
\fB resume\-fetch\-local\fR
usage: resume\-fetch\-local \fIresource_name\fR
Start fetching transaction logfiles from the current
designated primary node, if there is one.
This is independent from any {pause,resume}\-replay operations.
Only useful on a secondary node.
\fB resume\-replay\fR
usage: resume\-replay \fIresource_name\fR
See resume\-replay\-local.
\fB resume\-replay\-global\fR
usage: resume\-replay\-global \fIresource_name\fR
Like resume\-replay\-local, but affects all resource members
in the cluster (remotely).
\fB resume\-replay\-local\fR
usage: resume\-replay\-local \fIresource_name\fR
Restart replaying transaction logfiles, when there is some
data left.
This is independent from any {pause,resume}\-fetch operations.
This should be used for unfreezing the state of your local replica.
Only useful on a secondary node.
\fB resume\-sync\fR
usage: resume\-sync \fIresource_name\fR
See resume\-sync\-local.
\fB resume\-sync\-global\fR
usage: resume\-sync\-global \fIresource_name\fR
Like resume\-sync\-local, but affects all resource members
in the cluster (remotely).
\fB resume\-sync\-local\fR
usage: resume\-sync\-local \fIresource_name\fR
Resume any initial / incremental data sync at the stage where it
had been interrupted by pause\-sync.
Only useful on a secondary node.
\fB secondary\fR
usage: secondary \fIresource_name\fR
Promote all cluster members into secondary role, globally.
In contrast to DRBD, this is not needed as an intermediate step
for planned handover between an old and a new primary node.
The only reasonable usage is before the last leave\-resource of the
last cluster member, immediately before leave\-cluster is executed
for final deconstruction of the cluster.
In all other cases, please prefer 'primary' for direct handover
between cluster nodes.
Notice: 'secondary' sets the global designated primary node
to '(none)' which in turn prevents the execution of 'invalidate'
or 'join\-resource' or 'resize' anywhere in the cluster.
Therefore, don't unnecessarily give 'secondary'!
\fB set\-emergency\-limit\fR
usage: set\-emergency\-limit \fIresource_name\fR \fIvalue\fR
Set a per\-resource emergency limit for disk space in /mars.
See PDF manual for details.
\fB set\-sync\-limit\-value\fR
usage: set\-sync\-limit\-value \fInew_value\fR
Set the maximum number of resources which should by syncing
concurrently.
\fB up\fR
usage: up \fIresource_name\fR
Shortcut for attach + resume\-sync + resume\-fetch + resume\-replay.
\fB wait\-cluster\fR
usage: wait\-resource [\fIresource_name\fR]
Waits until a ping\-pong communication has succeeded in the
whole cluster (or only the members of \fIresource_name\fR).
NOTICE: this is extremely useful for avoiding races when scripting
in a cluster.
\fB wait\-connect\fR
usage: wait\-connect [\fIresource_name\fR]
See wait\-cluster.
\fB wait\-resource\fR
usage: wait\-resource \fIresource_name\fR
[[attach|fetch|replay|sync][\-on|\-off]]
Wait until the given condition is met on the resource, locally.
\fB wait\-umount\fR
usage: wait\-umount \fIresource_name\fR
Wait until /dev/mars/\fIresource_name\fR has disappeared in the
cluster (even remotely).
Useful on both primary and secondary nodes.
.SH EXPERT COMMANDS
\fB delete\-file\fR
usage: delete\-file \fIpath\fR
VERY dangerous!
Only for experts.
\fB fake\-sync\fR
usage: fake\-sync \fIresource_name\fR
Attention: this is potentially dangerous.
Only for experts.
Please read the PDF manual to understand the risks!
\fB get\-link\fR
usage: get\-link \fIpath\fR
Only for experts.
Will disappear in a future MARS release.
\fB set\-link\fR
usage: set\-link \fIpath\fR \fIvalue\fR
Only for experts.
Will disappear in a future MARS release.
\fB set\-replay\fR
usage: set\-replay \fIresource_name\fR
VERY dangerous!
Only for experts.
.SH 1&1 INTERNAL COMMANDS
\fB get\-connect\-pref\-list\fR
usage: get\-connect\-pref\-list \fIresource_name\fR
Provisionary command for internal use at 1&1. Will be replaced by
a better concept somewhen.
Shows the outcome of set\-connect\-pref\-list.
\fB get\-sync\-pref\-list\fR
usage: get\-sync\-pref\-list \fIresource_name\fR
Provisionary command for internal use at 1&1. Will be replaced by
a better concept somewhen. Shows the outcome of set\-sync\-pref\-list.
\fB set\-connect\-pref\-list\fR
usage: set\-connect\-pref\-list \fIresource_name\fR \fIhost_list\fR
Provisionary command for internal use at 1&1. Will be replaced by
a better concept somewhen. The \fIhost_list\fR must be comma\-separated.
\fB set\-sync\-pref\-list\fR
usage: set\-sync\-pref\-list \fIresource_name\fR \fIhost_list\fR
Provisionary command for internal use at 1&1. Will be replaced by
a better concept somewhen. The \fIhost_list\fR must be comma\-separated.
.SH DEPRECATED COMMANDS
\fB create\-uuid\fR
usage: create\-uuid (no parameters)
Deprecated.
This is only needed if you have a very old /mars/
directory structure from MARS version light0.1beta05
or earlier.
\fB role\fR
usage: role \fIresource_name\fR
Deprecated.
Please use the macro command 'view\-role' instead.
For even better summary information, use plain 'view'.
\fB show\fR
usage: show \fIresource_name\fR
Deprecated old low\-level tool. Don't use. Use macros instead.
\fB show\-errors\fR
usage: show\-errors \fIresource_name\fR
Deprecated old low\-level tool. Don't use. Use macros instead.
\fB show\-info\fR
usage: show\-info \fIresource_name\fR
Deprecated old low\-level tool. Don't use. Use macros instead.
\fB show\-state\fR
usage: show\-state \fIresource_name\fR
Deprecated old low\-level tool. Don't use. Use macros instead.
\fB state\fR
usage: state \fIresource_name\fR
Deprecated.
Please use the macro command 'view\-role' instead.
For even better summary information, use plain 'view'.
.SH SELECTED MACROS
This is a small selection of some useful macros for humans. For a full list and for detailed information as well as for scripting instructions, please refer to the PDF manual.
\fB view all\fR
Show standard information about the local state of MARS at the local host.
\fB view\-replstate all\fR
Show only the replication state part of plain view.
\fB view\-flags all\fR
Show only the flags part from plain view.
\fB view\-primarynode all\fR
Display (none) or the hostname of the current designated primary.
\fB view\-the\-pretty\-err\-msg all\fR
Show reported error messages.
\fB view\-the\-pretty\-wrn\-msg all\fR
Show reported warnings.
\fB view\-is\-emergency all\fR
Tell whether emergency mode has been entered.
\fB view\-rest\-space\fR
Show the internal rest space calculations used for calculating emergency mode. This value should not go down to 0.
\fB view\-get\-disk all\fR
Show the underlying disk name for each resource.
.SH AUTHOR
Written by Thomas Schoebel\-Theuer.
.SH COPYRIGHT
Copyright 2010\-2015 Thomas Schoebel\-Theuer
Copyright 2011\-2015 1&1 Internet AG
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.
.SH NOTES
http://schoebel.github.io/mars/
http://github.com/schoebel/mars/