mirror of
https://github.com/schoebel/mars
synced 2025-01-12 18:01:52 +00:00
571 lines
20 KiB
Plaintext
571 lines
20 KiB
Plaintext
\begin{verbatim}
|
|
|
|
Thorough documentation is in mars-manual.pdf. Please use the PDF manual
|
|
as authoritative reference! Here is only a short summary of the most
|
|
important sub-commands / options:
|
|
|
|
marsadm [<global_options>] <command> [<resource_name> | all | <args> ]
|
|
marsadm [<global_options>] view[-<macroname>] [<resource_name> | all ]
|
|
|
|
<global_option> =
|
|
--force
|
|
Skip safety checks.
|
|
Use this only when you really know what you are doing!
|
|
Warning! This is dangerous! First try --dry-run.
|
|
Not combinable with 'all'.
|
|
--dry-run
|
|
Don't modify the symlink tree, but tell what would be done.
|
|
Use this before starting potentially harmful actions such as
|
|
'delete-resource'.
|
|
--verbose
|
|
Increase speakyness of some commands.
|
|
--logger=/path/to/usr/bin/logger
|
|
Use an alternative syslog messenger.
|
|
When empty, disable syslogging.
|
|
--max-deletions=<number>
|
|
When your network or your firewall rules are defective over a
|
|
longer time, too many deletion links may accumulate at
|
|
/mars/todo-global/delete-* and sibling locations.
|
|
This limit is preventing overflow of the filesystem as well
|
|
as overloading the worker threads.
|
|
--thresh-logfiles=<number>
|
|
--thresh-logsize=<number>
|
|
Prevention of too many small logfiles when secondaries are not
|
|
catching up. When more than thresh-logfiles are already present,
|
|
the next one is only created when the last one has at least
|
|
size thresh-logsize (in units of GB).
|
|
--timeout=<seconds>
|
|
Abort safety checks after timeout with an error.
|
|
When giving 'all' as resource agument, this works for each
|
|
resource independently.
|
|
--window=<seconds>
|
|
Treat other cluster nodes as healthy when some communcation has
|
|
occured during the given time window.
|
|
--threshold=<bytes>
|
|
Some macros like 'fetch-threshold-reached' use this for determining
|
|
their sloppyness.
|
|
--host=<hostname>
|
|
Act as if the command was running on cluster node <hostname>.
|
|
Warning! This is dangerous! First try --dry-run
|
|
--backup-dir=</absolute_path>
|
|
Only for experts.
|
|
Used by several special commands like merge-cluster, split-cluster
|
|
etc for creating backups of important data.
|
|
--ip=<ip>
|
|
Override the IP address stored in the symlink tree, as well as
|
|
the default IP determined from the list of network interfaces.
|
|
Usually you will need this only at 'create-cluster' or
|
|
'join-cluster' for resolving ambiguities.
|
|
--ssh-port=<port_nr>
|
|
Override the default ssh port (22) for ssh and rsync.
|
|
Useful for running {join,merge}-cluster on non-standard ssh ports.
|
|
--ssh-opts="<ssh_commandline_options>"
|
|
Override the default ssh commandline options. Also used for rsync.
|
|
--macro=<text>
|
|
Handy for testing short macro evaluations at the command line.
|
|
|
|
<command> =
|
|
attach
|
|
usage: attach <resource_name>
|
|
Attaches the local disk (backing block device) to the resource.
|
|
The disk must have been previously configured at
|
|
{create,join}-resource.
|
|
When designated as a primary, /dev/mars/$res will also appear.
|
|
This does not change the state of {fetch,replay}.
|
|
For a complete local startup of the resource, use 'marsadm up'.
|
|
|
|
cat
|
|
usage: cat <path>
|
|
Print internal debug output in human readable form.
|
|
Numerical timestamps and numerical error codes are replaced
|
|
by more readable means.
|
|
Example: marsadm cat /mars/5.total.status
|
|
|
|
connect
|
|
usage: connect <resource_name>
|
|
See resume-fetch-local.
|
|
|
|
connect-global
|
|
usage: connect-global <resource_name>
|
|
Like resume-fetch-local, but affects all resource members
|
|
in the cluster (remotely).
|
|
|
|
connect-local
|
|
usage: connect-local <resource_name>
|
|
See resume-fetch-local.
|
|
|
|
create-cluster
|
|
usage: create-cluster (no parameters)
|
|
This must be called exactly once when creating a new cluster.
|
|
Don't call this again! Use join-cluster on the secondary nodes.
|
|
Please read the PDF manual for details.
|
|
|
|
create-resource
|
|
usage: create-resource <resource_name> </dev/lv/mydata>
|
|
(further syntax variants are described in the PDF manual).
|
|
Create a new resource out of a pre-existing disk (backing
|
|
block device) /dev/lv/mydata (or similar).
|
|
The current node will start in primary role, thus
|
|
/dev/mars/<resource_name> will appear after a short time, initially
|
|
showing the same contents as the underlying disk /dev/lv/mydata.
|
|
It is good practice to name the resource <resource_name> and the
|
|
disk name identical.
|
|
|
|
cron
|
|
usage: cron (no parameters)
|
|
Do all necessary regular housekeeping tasks.
|
|
This is equivalent to log-rotate all; sleep 5; log-delete-all all.
|
|
|
|
delete-resource
|
|
usage: delete-resource <resource_name>
|
|
CAUTION! This is dangerous when the network is somehow
|
|
interrupted, or when damaged nodes are later re-surrected
|
|
in any way.
|
|
|
|
Precondition: the resource must no longer have any members
|
|
(see leave-resource).
|
|
This is only needed when you _insist_ on re-using a damaged
|
|
resource for re-creating a new one with exactly the same
|
|
old <resource_name>.
|
|
HINT: best practice is to not use this, but just create a _new_
|
|
resource with a new <resource_name> out of your local disks.
|
|
Please read the PDF manual on potential consequences.
|
|
|
|
detach
|
|
usage: detach <resource_name>
|
|
Detaches the local disk (backing block device) from the
|
|
MARS resource.
|
|
Caution! you may read data from the local disk afterwards,
|
|
but ensure that no data is written to it!
|
|
Otherwise, you are likely to produce harmful inconsistencies.
|
|
When running in primary role, /dev/mars/$res will also disappear.
|
|
This does not change the state of {fetch,replay}.
|
|
For a complete local shutdown of the resource, use 'marsadm down'.
|
|
|
|
disconnect
|
|
usage: disconnect <resource_name>
|
|
See pause-fetch-local.
|
|
|
|
disconnect-global
|
|
usage: disconnect-global <resource_name>
|
|
Like pause-fetch-local, but affects all resource members
|
|
in the cluster (remotely).
|
|
|
|
disconnect-local
|
|
usage: disconnect-local <resource_name>
|
|
See pause-fetch-local.
|
|
|
|
down
|
|
usage: down <resource_name>
|
|
Shortcut for detach + pause-sync + pause-fetch + pause-replay.
|
|
|
|
get-emergency-limit
|
|
usage: get-emergency-limit <resource_name>
|
|
Counterpart of set-emergency-limit (per-resource emergency limit)
|
|
|
|
get-sync-limit-value
|
|
usage: get-sync-limit-value (no parameters)
|
|
For retrieval of the value set by set-sync-limit-value.
|
|
|
|
get-systemd-unit
|
|
usage: get-systemd-unit <resource_name>
|
|
Show the system units (for start and stop), or empty when unset.
|
|
|
|
invalidate
|
|
usage: invalidate <resource_name>
|
|
Only useful on a secondary node.
|
|
Forces MARS to consider the local replica disk as being
|
|
inconsistent, and therefore starting a fast full-sync from
|
|
the currently designated primary node (which must exist;
|
|
therefore avoid the 'secondary' command).
|
|
This is usually needed for resolving emergency mode.
|
|
When having k=2 replicas, this can be also used for
|
|
quick-and-simple split-brain resolution.
|
|
In other cases, or when the split-brain is not resolved by
|
|
this command, please use the 'leave-resource' / 'join-resource'
|
|
method as described in the PDF manual (in the right order as
|
|
described there).
|
|
|
|
join-cluster
|
|
usage: join-cluster <hostname_of_primary>
|
|
Establishes a new cluster membership.
|
|
This must be called once on any new cluster member.
|
|
This is a prerequisite for join-resource.
|
|
|
|
join-resource
|
|
usage: join-resource <resource_name> </dev/lv/mydata>
|
|
(further syntax variants are described in the PDF manual).
|
|
The resource <resource_name> must have been already created on
|
|
another cluster node, and the network must be healthy.
|
|
The contents of the local replica disk /dev/lv/mydata will be
|
|
overwritten by the initial fast full sync from the currently
|
|
designated primary node.
|
|
After the initial full sync has finished, the current host will
|
|
act in secondary role.
|
|
For details on size constraints etc, refer to the PDF manual.
|
|
|
|
leave-cluster
|
|
usage: leave-cluster (no parameters)
|
|
This can be used for final deconstruction of a cluster member.
|
|
Prior to this, all resources must have been left
|
|
via leave-resource.
|
|
Notice: this will never destroy the cluster UID on the /mars/
|
|
filesystem.
|
|
Please read the PDF manual for details.
|
|
|
|
leave-resource
|
|
usage: leave-resource <resource_name>
|
|
Precondition: the local host must be in secondary role.
|
|
Stop being a member of the resource, and thus stop all
|
|
replication activities. The status of the underlying disk
|
|
will remain in its current state (whatever it is).
|
|
|
|
log-delete
|
|
usage: log-delete <resource_name>
|
|
When possible, globally delete all old transaction logfiles which
|
|
are known to be superflous, i.e. all secondaries no longer need
|
|
to replay them.
|
|
This must be regularly called by a cron job or similar, in order
|
|
to prevent overflow of the /mars/ directory.
|
|
For regular maintainance cron jobs, please prefer 'marsadm cron'.
|
|
For details and best practices, please refer to the PDF manual.
|
|
|
|
log-delete-all
|
|
usage: log-delete-all <resource_name>
|
|
Alias for log-delete
|
|
|
|
log-delete-one
|
|
usage: log-delete-one <resource_name>
|
|
When possible, globally delete at most one old transaction logfile
|
|
which is known to be superfluous, i.e. all secondaries no longer
|
|
need to replay it.
|
|
Hint: use this only for testing and manual inspection.
|
|
For regular maintainance cron jobs, please prefer cron
|
|
or log-delete-all.
|
|
|
|
log-purge-all
|
|
usage: log-purge-all <resource_name>
|
|
This is potentially dangerous.
|
|
Use this only if you are really desperate in trying to resolve a
|
|
split brain. Use this only after reading the PDF manual!
|
|
|
|
log-rotate
|
|
usage: log-rotate <resource_name>
|
|
Only useful at the primary side.
|
|
Start writing transaction logs into a new transaction logfile.
|
|
This should be regularly called by a cron job or similar.
|
|
For regular maintainance cron jobs, please prefer 'marsadm cron'.
|
|
For details and best practices, please refer to the PDF manual.
|
|
|
|
lowlevel-delete-host
|
|
usage: lowlevel-delete-host <resource_name>
|
|
Delete cluster member.
|
|
|
|
lowlevel-ls-host-ips
|
|
usage: lowlevel-ls-host-ips <resource_name>
|
|
List cluster member names and IP addresses.
|
|
|
|
lowlevel-set-host-ip
|
|
usage: lowlevel-set-host-ip <resource_name>
|
|
Set IP for host.
|
|
|
|
merge-cluster
|
|
usage: merge-cluster <hostname_of_other_cluster>
|
|
Precondition: the resource names of both clusters must be disjoint.
|
|
Create the union of two clusters, consisting of the
|
|
union of all machines, and the union of all resources.
|
|
The members of each resource are _not_ changed by this.
|
|
This is useful for creating a big "virtual LVM cluster" where
|
|
resources can be almost arbitrarily migrated between machines via
|
|
later join-resource / leave-resource operations.
|
|
|
|
merge-cluster-check
|
|
usage: merge-cluster-check <hostname_of_other_cluster>
|
|
Check whether the resources of both clusters are disjoint.
|
|
Useful for checking in advance whether merge-cluster would be
|
|
possible.
|
|
|
|
merge-cluster-list
|
|
usage: merge-cluster-list
|
|
Determine the local list of resources.
|
|
Useful for checking or analysis of merge-cluster disjointness by hand.
|
|
|
|
pause-fetch
|
|
usage: pause-fetch <resource_name>
|
|
See pause-fetch-local.
|
|
|
|
pause-fetch-global
|
|
usage: pause-fetch-global <resource_name>
|
|
Like pause-fetch-local, but affects all resource members
|
|
in the cluster (remotely).
|
|
|
|
pause-fetch-local
|
|
usage: pause-fetch-local <resource_name>
|
|
Stop fetching transaction logfiles from the current
|
|
designated primary.
|
|
This is independent from any {pause,resume}-replay operations.
|
|
Only useful on a secondary node.
|
|
|
|
pause-replay
|
|
usage: pause-replay <resource_name>
|
|
See pause-replay-local.
|
|
|
|
pause-replay-global
|
|
usage: pause-replay-global <resource_name>
|
|
Like pause-replay-local, but affects all resource members
|
|
in the cluster (remotely).
|
|
|
|
pause-replay-local
|
|
usage: pause-replay-local <resource_name>
|
|
Stop replaying transaction logfiles for now.
|
|
This is independent from any {pause,resume}-fetch operations.
|
|
This may be used for freezing the state of your replica for some
|
|
time, if you have enough space on /mars/.
|
|
Only useful on a secondary node.
|
|
|
|
pause-sync
|
|
usage: pause-sync <resource_name>
|
|
See pause-sync-local.
|
|
|
|
pause-sync-global
|
|
usage: pause-sync-global <resource_name>
|
|
Like pause-sync-local, but affects all resource members
|
|
in the cluster (remotely).
|
|
|
|
pause-sync-local
|
|
usage: pause-sync-local <resource_name>
|
|
Pause the initial data sync at current stage.
|
|
This has only an effect if a sync is actually running (i.e.
|
|
there is something to be actually synced).
|
|
Don't pause too long, because the local replica will remain
|
|
inconsistent during the pause.
|
|
Use this only for limited reduction of system load.
|
|
Only useful on a secondary node.
|
|
|
|
primary
|
|
usage: primary <resource_name>
|
|
Promote the resource into primary role.
|
|
This is necessary for /dev/mars/$res to appear on the local host.
|
|
Notice: by concept there can be only _one_ designated primary
|
|
in a cluster at the same time.
|
|
The role change is automatically distributed to the other nodes
|
|
in the cluster, provided that the network is healthy.
|
|
The old primary node will _automatically_ go
|
|
into secondary role first. This is different from DRBD!
|
|
With MARS, you don't need an intermediate 'secondary' command
|
|
for switching roles.
|
|
It is usually better to _directly_ switch the primary roles
|
|
between both hosts.
|
|
When --force is not given, a planned handover is started:
|
|
the local host will only become actually primary _after_ the
|
|
old primary is gone, and all old transaction logs have been
|
|
fetched and replayed at the new designated priamry.
|
|
When --force is given, no handover is attempted. A a consequence,
|
|
a split brain situation is likely to emerge.
|
|
Thus, use --force only after an ordinary handover attempt has
|
|
failed, and when you don't care about the split brain.
|
|
For more details, please refer to the PDF manual.
|
|
|
|
resize
|
|
usage: resize <resource_name>
|
|
Prerequisite: all underlying disks (usually /dev/vg/$res) must
|
|
have been already increased, e.g. at the LVM layer (cf. lvresize).
|
|
Causes MARS to re-examine all sizing constraints on all members of
|
|
the resource, and increase the global logical size of the resource
|
|
accordingly.
|
|
Shrinking is currently not yet implemented.
|
|
When successful, /dev/mars/$res at the primary will be increased
|
|
in size. In addition, all secondaries will start an incremental
|
|
fast full-sync to get the enlarged parts from the primary.
|
|
|
|
resume-fetch
|
|
usage: resume-fetch <resource_name>
|
|
See resume-fetch-local.
|
|
|
|
resume-fetch-global
|
|
usage: resume-fetch-global <resource_name>
|
|
Like resume-fetch-local, but affects all resource members
|
|
in the cluster (remotely).
|
|
|
|
resume-fetch-local
|
|
usage: resume-fetch-local <resource_name>
|
|
Start fetching transaction logfiles from the current
|
|
designated primary node, if there is one.
|
|
This is independent from any {pause,resume}-replay operations.
|
|
Only useful on a secondary node.
|
|
|
|
resume-replay
|
|
usage: resume-replay <resource_name>
|
|
See resume-replay-local.
|
|
|
|
resume-replay-global
|
|
usage: resume-replay-global <resource_name>
|
|
Like resume-replay-local, but affects all resource members
|
|
in the cluster (remotely).
|
|
|
|
resume-replay-local
|
|
usage: resume-replay-local <resource_name>
|
|
Restart replaying transaction logfiles, when there is some
|
|
data left.
|
|
This is independent from any {pause,resume}-fetch operations.
|
|
This should be used for unfreezing the state of your local replica.
|
|
Only useful on a secondary node.
|
|
|
|
resume-sync
|
|
usage: resume-sync <resource_name>
|
|
See resume-sync-local.
|
|
|
|
resume-sync-global
|
|
usage: resume-sync-global <resource_name>
|
|
Like resume-sync-local, but affects all resource members
|
|
in the cluster (remotely).
|
|
|
|
resume-sync-local
|
|
usage: resume-sync-local <resource_name>
|
|
Resume any initial / incremental data sync at the stage where it
|
|
had been interrupted by pause-sync.
|
|
Only useful on a secondary node.
|
|
|
|
secondary
|
|
usage: secondary <resource_name>
|
|
Promote all cluster members into secondary role, globally.
|
|
In contrast to DRBD, this is not needed as an intermediate step
|
|
for planned handover between an old and a new primary node.
|
|
The only reasonable usage is before the last leave-resource of the
|
|
last cluster member, immediately before leave-cluster is executed
|
|
for final deconstruction of the cluster.
|
|
In all other cases, please prefer 'primary' for direct handover
|
|
between cluster nodes.
|
|
Notice: 'secondary' sets the global designated primary node
|
|
to '(none)' which in turn prevents the execution of 'invalidate'
|
|
or 'join-resource' or 'resize' anywhere in the cluster.
|
|
Therefore, don't unnecessarily give 'secondary'!
|
|
|
|
set-emergency-limit
|
|
usage: set-emergency-limit <resource_name> <value>
|
|
Set a per-resource emergency limit for disk space in /mars.
|
|
See PDF manual for details.
|
|
|
|
set-sync-limit-value
|
|
usage: set-sync-limit-value <new_value>
|
|
Set the maximum number of resources which should by syncing
|
|
concurrently.
|
|
|
|
set-systemd-unit
|
|
usage: set-systemd-unit <resource_name> <start_unit_name> [<stop_unit_name>]
|
|
This activates the systemd template engine of marsadm.
|
|
Please read mars-manual.pdf on this.
|
|
When <stop_unit_name> is omitted, it will be treated equal to
|
|
<start_unit_name>.
|
|
|
|
split-cluster
|
|
usage: split-cluster (no parameters)
|
|
NOT OFFICIALLY SUPPORTED - ONLY FOR EXPERTS.
|
|
RTFS = Read The Fucking Sourcecode.
|
|
Use this only if you know what you are doing.
|
|
|
|
up
|
|
usage: up <resource_name>
|
|
Shortcut for attach + resume-sync + resume-fetch + resume-replay.
|
|
|
|
wait-cluster
|
|
usage: wait-resource [<resource_name>]
|
|
Waits until a ping-pong communication has succeeded in the
|
|
whole cluster (or only the members of <resource_name>).
|
|
NOTICE: this is extremely useful for avoiding races when scripting
|
|
in a cluster.
|
|
|
|
wait-connect
|
|
usage: wait-connect [<resource_name>]
|
|
See wait-cluster.
|
|
|
|
wait-resource
|
|
usage: wait-resource <resource_name>
|
|
[[attach|fetch|replay|sync][-on|-off]]
|
|
Wait until the given condition is met on the resource, locally.
|
|
|
|
wait-umount
|
|
usage: wait-umount <resource_name>
|
|
Wait until /dev/mars/<resource_name> has disappeared in the
|
|
cluster (even remotely).
|
|
Useful on both primary and secondary nodes.
|
|
|
|
<resource_name> = name of resource or "all" for all resources
|
|
|
|
|
|
<macroname> = <complex_macroname> | <primitive_macroname>
|
|
|
|
<complex_macroname> =
|
|
1and1
|
|
comminfo
|
|
commstate
|
|
cstate
|
|
default
|
|
default-global
|
|
diskstate
|
|
diskstate-1and1
|
|
dstate
|
|
fetch-line
|
|
fetch-line-1and1
|
|
flags
|
|
flags-1and1
|
|
outdated-flags
|
|
outdated-flags-1and1
|
|
primarynode
|
|
primarynode-1and1
|
|
replay-line
|
|
replay-line-1and1
|
|
replinfo
|
|
replinfo-1and1
|
|
replstate
|
|
replstate-1and1
|
|
resource-errors
|
|
resource-errors-1and1
|
|
role
|
|
role-1and1
|
|
state
|
|
status
|
|
sync-line
|
|
sync-line-1and1
|
|
syncinfo
|
|
syncinfo-1and1
|
|
todo-role
|
|
|
|
|
|
<primitive_macroname> =
|
|
deletable-size
|
|
device-opened
|
|
errno-text
|
|
Convert errno numbers (positive or negative) into human readable text.
|
|
get-log-status
|
|
get-resource-{fat,err,wrn}{,-count}
|
|
get-{disk,device}
|
|
is-{alive}
|
|
is-{split-brain,consistent,emergency}
|
|
occupied-size
|
|
present-{disk,device}
|
|
(deprecated, use *-present instead)
|
|
replay-basenr
|
|
replay-code
|
|
When negative, this indidates that a replay/recovery error has occurred.
|
|
rest-space
|
|
summary-vector
|
|
systemd-unit
|
|
tree
|
|
uuid
|
|
wait-{is,todo}-{attach,sync,fetch,replay,primary}-{on,off}
|
|
{alive,fetch,replay,work}-{timestamp,age,lag}
|
|
{all,the}-{pretty-,}{global-,}{{err,wrn,inf}-,}msg
|
|
{cluster,resource}-members
|
|
{disk,device}-present
|
|
{disk,resource,device}-size
|
|
{fetch,replay,work}-{lognr,logcount}
|
|
{get,actual}-primary
|
|
{is,todo}-{attach,sync,fetch,replay,primary}
|
|
{my,all}-resources
|
|
{sync,fetch,replay,work,syncpos}-{size,pos}
|
|
{sync,fetch,replay,work}-{rest,{almost-,threshold-,}reached,percent,permille,vector}
|
|
{sync,fetch,replay}-{rate,remain}
|
|
{time,real-time}
|
|
\end{verbatim}
|