\begin{verbatim} Thorough documentation is in mars-user-manual.pdf. Please use the PDF manual as authoritative reference! Here is only a short summary of the most important sub-commands / options: marsadm [] [ | all | ] marsadm [] view[-] [ | all ] = --force Skip safety checks. Use this only when you really know what you are doing! Warning! This is dangerous! First try --dry-run. Not combinable with 'all'. --ignore-sync Allow primary handover even when some sync is running somewhere. This is less rude than --force because it checks for all else preconditions. --dry-run Don't modify the symlink tree, but tell what would be done. Use this before starting potentially harmful actions such as 'delete-resource'. --verbose Increase speakyness of some commands. --parallel Only resonable when combined with "all". For each resource, fork() a sub-process running independently from other resources. May seepd up handover a lot. However, several cluster managers are known to have problems with a high parallelism degree (up to deadlocks). Only use this after thorough testing in combination with your whole operation stack! Turns off --singlestep. --parallel= Like --parallel, but limit the parallelism degree to the given number of parallel processes. Turns off --singlestep. --singlestep Debugging aid for multi-phase commands. Interactively step through the various phases of commands. Turns off --parallel. --delete-method= EXPERIMENTAL! Only for testing! This option will disappear again! == 0: Use new deletion method == 1: Use old deletion method default is 1 for compatibility. --logger=/path/to/usr/bin/logger Use an alternative syslog messenger. When empty, disable syslogging. --max-deletions= When your network or your firewall rules are defective over a longer time, too many deletion links may accumulate at /mars/todo-global/delete-* and sibling locations. This limit is preventing overflow of the filesystem as well as overloading the worker threads. --thresh-logfiles= --thresh-logsize= Prevention of too many small logfiles when secondaries are not catching up. When more than thresh-logfiles are already present, the next one is only created when the last one has at least size thresh-logsize (in units of GB). --timeout= Current default: 600 Abort safety checks and waiting loops after timeout with an error. When giving 'all' as resource agument, this works for each resource independently. The special value -1 means "infinite". --window= Current default: 60 Treat other cluster nodes as healthy when some communcation has occured during the given time window. --threshold= Some macros like 'fetch-threshold-reached' use this for determining their sloppyness. --host= Act as if the command was running on cluster node . Warning! This is dangerous! First try --dry-run --backup-dir= Only for experts. Used by several special commands like merge-cluster, split-cluster etc for creating backups of important data. --ip= Override the IP address stored in the symlink tree, as well as the default IP determined from the list of network interfaces. Usually you will need this only at 'create-cluster' or 'join-cluster' for resolving ambiguities. --ssh-port= Override the default ssh port (22) for ssh and rsync. Useful for running {join,merge}-cluster on non-standard ssh ports. --ssh-opts="" Override the default ssh commandline options. Also used for rsync. --macro= Handy for testing short macro evaluations at the command line. = attach usage: attach Attaches the local disk (backing block device) to the resource. The disk must have been previously configured at {create,join}-resource. When designated as a primary, /dev/mars/$res will also appear. This does not change the state of {fetch,replay}. For a complete local startup of the resource, use 'marsadm up'. cat usage: cat Print internal debug output in human readable form. Numerical timestamps and numerical error codes are replaced by more readable means. Example: marsadm cat /mars/5.total.status connect usage: connect See resume-fetch-local. connect-global usage: connect-global Like resume-fetch-local, but affects all resource members in the cluster (remotely). connect-local usage: connect-local See resume-fetch-local. create-cluster usage: create-cluster (no parameters) This must be called exactly once when creating a new cluster. Don't call this again! Use join-cluster on the secondary nodes. Please read the PDF manual for details. create-resource usage: create-resource (further syntax variants are described in the PDF manual). Create a new resource out of a pre-existing disk (backing block device) /dev/lv/mydata (or similar). The current node will start in primary role, thus /dev/mars/ will appear after a short time, initially showing the same contents as the underlying disk /dev/lv/mydata. It is good practice to name the resource and the disk name identical. cron usage: cron (no parameters) Do all necessary regular housekeeping tasks. This is equivalent to log-rotate all; sleep 5; log-delete-all all. delete-resource usage: delete-resource CAUTION! This is dangerous when the network is somehow interrupted, or when damaged nodes are later re-surrected in any way. Precondition: the resource must no longer have any members (see leave-resource). This is only needed when you _insist_ on re-using a damaged resource for re-creating a new one with exactly the same old . HINT: best practice is to not use this, but just create a _new_ resource with a new out of your local disks. Please read the PDF manual on potential consequences. detach usage: detach Detaches the local disk (backing block device) from the MARS resource. Caution! you may read data from the local disk afterwards, but ensure that no data is written to it! Otherwise, you are likely to produce harmful inconsistencies. When running in primary role, /dev/mars/$res will also disappear. This does not change the state of {fetch,replay}. For a complete local shutdown of the resource, use 'marsadm down'. disconnect usage: disconnect See pause-fetch-local. disconnect-global usage: disconnect-global Like pause-fetch-local, but affects all resource members in the cluster (remotely). disconnect-local usage: disconnect-local See pause-fetch-local. down usage: down Shortcut for detach + pause-sync + pause-fetch + pause-replay. err-purge-all usage: err-purge-all Remove any err message from the given resources. get-emergency-limit usage: get-emergency-limit Counterpart of set-emergency-limit (per-resource emergency limit) get-sync-limit-value usage: get-sync-limit-value (no parameters) For retrieval of the value set by set-sync-limit-value. get-systemd-unit usage: get-systemd-unit Show the system units (for start and stop), or empty when unset. get-systemd-want usage: get-systemd-want Show the current hostname where the complete systemd unit stack between start- and stop-unit should appear. Reports empty when unset, or "(none)" when stopped. invalidate usage: invalidate Only useful on a secondary node. Forces MARS to consider the local replica disk as being inconsistent, and therefore starting a fast full-sync from the currently designated primary node (which must exist; therefore avoid the 'secondary' command). This is usually needed for resolving emergency mode. When having k=2 replicas, this can be also used for quick-and-simple split-brain resolution. In other cases, or when the split-brain is not resolved by this command, please use the 'leave-resource' / 'join-resource' method as described in the PDF manual (in the right order as described there). join-cluster usage: join-cluster Establishes a new cluster membership. This must be called once on any new cluster member. This is a prerequisite for join-resource. join-resource usage: join-resource (further syntax variants are described in the PDF manual). The resource must have been already created on another cluster node, and the network must be healthy. The contents of the local replica disk /dev/lv/mydata will be overwritten by the initial fast full sync from the currently designated primary node. After the initial full sync has finished, the current host will act in secondary role. For details on size constraints etc, refer to the PDF manual. leave-cluster usage: leave-cluster (no parameters) This can be used for final deconstruction of a cluster member. Prior to this, all resources must have been left via leave-resource. Notice: this will never destroy the cluster UID on the /mars/ filesystem. Please read the PDF manual for details. leave-resource usage: leave-resource Precondition: the local host must be in secondary role. Stop being a member of the resource, and thus stop all replication activities. The status of the underlying disk will remain in its current state (whatever it is). link-purge-all usage: link-purge-all Remove any .deleted links. log-delete usage: log-delete When possible, globally delete all old transaction logfiles which are known to be superflous, i.e. all secondaries no longer need to replay them. This must be regularly called by a cron job or similar, in order to prevent overflow of the /mars/ directory. For regular maintainance cron jobs, please prefer 'marsadm cron'. For details and best practices, please refer to the PDF manual. log-delete-all usage: log-delete-all Alias for log-delete log-delete-one usage: log-delete-one When possible, globally delete at most one old transaction logfile which is known to be superfluous, i.e. all secondaries no longer need to replay it. Hint: use this only for testing and manual inspection. For regular maintainance cron jobs, please prefer cron or log-delete-all. log-purge-all usage: log-purge-all This is potentially dangerous. Use this only if you are really desperate in trying to resolve a split brain. Use this only after reading the PDF manual! log-rotate usage: log-rotate Only useful at the primary side. Start writing transaction logs into a new transaction logfile. This should be regularly called by a cron job or similar. For regular maintainance cron jobs, please prefer 'marsadm cron'. For details and best practices, please refer to the PDF manual. lowlevel-delete-host usage: lowlevel-ls-host-ips Delete cluster member. lowlevel-ls-host-ips usage: lowlevel-ls-host-ips List cluster member names and IP addresses. lowlevel-set-host-ip usage: lowlevel-ls-host-ips Set IP for host. merge-cluster usage: merge-cluster Precondition: the resource names of both clusters must be disjoint. Create the union of two clusters, consisting of the union of all machines, and the union of all resources. The members of each resource are _not_ changed by this. This is useful for creating a big "virtual LVM cluster" where resources can be almost arbitrarily migrated between machines via later join-resource / leave-resource operations. merge-cluster-check usage: merge-cluster-check Check whether the resources of both clusters are disjoint. Useful for checking in advance whether merge-cluster would be possible. merge-cluster-list usage: merge-cluster-list Determine the local list of resources. Useful for checking or analysis of merge-cluster disjointness by hand. pause-fetch usage: pause-fetch See pause-fetch-local. pause-fetch-global usage: pause-fetch-global Like pause-fetch-local, but affects all resource members in the cluster (remotely). pause-fetch-local usage: pause-fetch-local Stop fetching transaction logfiles from the current designated primary. This is independent from any {pause,resume}-replay operations. Only useful on a secondary node. pause-replay usage: pause-replay See pause-replay-local. pause-replay-global usage: pause-replay-global Like pause-replay-local, but affects all resource members in the cluster (remotely). pause-replay-local usage: pause-replay-local Stop replaying transaction logfiles for now. This is independent from any {pause,resume}-fetch operations. This may be used for freezing the state of your replica for some time, if you have enough space on /mars/. Only useful on a secondary node. pause-sync usage: pause-sync See pause-sync-local. pause-sync-global usage: pause-sync-global Like pause-sync-local, but affects all resource members in the cluster (remotely). pause-sync-local usage: pause-sync-local Pause the initial data sync at current stage. This has only an effect if a sync is actually running (i.e. there is something to be actually synced). Don't pause too long, because the local replica will remain inconsistent during the pause. Use this only for limited reduction of system load. Only useful on a secondary node. primary usage: primary Promote the resource into primary role. This is necessary for /dev/mars/$res to appear on the local host. Notice: by concept there can be only _one_ designated primary in a cluster at the same time. The role change is automatically distributed to the other nodes in the cluster, provided that the network is healthy. The old primary node will _automatically_ go into secondary role first. This is different from DRBD! With MARS, you don't need an intermediate 'secondary' command for switching roles. It is usually better to _directly_ switch the primary roles between both hosts. When --force is not given, a planned handover is started: the local host will only become actually primary _after_ the old primary is gone, and all old transaction logs have been fetched and replayed at the new designated priamry. When --force is given, no handover is attempted. A a consequence, a split brain situation is likely to emerge. Thus, use --force only after an ordinary handover attempt has failed, and when you don't care about the split brain. For more details, please refer to the PDF manual. resize usage: resize Prerequisite: all underlying disks (usually /dev/vg/$res) must have been already increased, e.g. at the LVM layer (cf. lvresize). Causes MARS to re-examine all sizing constraints on all members of the resource, and increase the global logical size of the resource accordingly. Shrinking is currently not yet implemented. When successful, /dev/mars/$res at the primary will be increased in size. In addition, all secondaries will start an incremental fast full-sync to get the enlarged parts from the primary. resume-fetch usage: resume-fetch See resume-fetch-local. resume-fetch-global usage: resume-fetch-global Like resume-fetch-local, but affects all resource members in the cluster (remotely). resume-fetch-local usage: resume-fetch-local Start fetching transaction logfiles from the current designated primary node, if there is one. This is independent from any {pause,resume}-replay operations. Only useful on a secondary node. resume-replay usage: resume-replay See resume-replay-local. resume-replay-global usage: resume-replay-global Like resume-replay-local, but affects all resource members in the cluster (remotely). resume-replay-local usage: resume-replay-local Restart replaying transaction logfiles, when there is some data left. This is independent from any {pause,resume}-fetch operations. This should be used for unfreezing the state of your local replica. Only useful on a secondary node. resume-sync usage: resume-sync See resume-sync-local. resume-sync-global usage: resume-sync-global Like resume-sync-local, but affects all resource members in the cluster (remotely). resume-sync-local usage: resume-sync-local Resume any initial / incremental data sync at the stage where it had been interrupted by pause-sync. Only useful on a secondary node. secondary usage: secondary Promote all cluster members into secondary role, globally. In contrast to DRBD, this is not needed as an intermediate step for planned handover between an old and a new primary node. The only reasonable usage is before the last leave-resource of the last cluster member, immediately before leave-cluster is executed for final deconstruction of the cluster. In all other cases, please prefer 'primary' for direct handover between cluster nodes. Notice: 'secondary' sets the global designated primary node to '(none)' which in turn prevents the execution of 'invalidate' or 'join-resource' or 'resize' anywhere in the cluster. Therefore, don't unnecessarily give 'secondary'! set-emergency-limit usage: set-emergency-limit Set a per-resource emergency limit for disk space in /mars. See PDF manual for details. set-global-disabled-log-digests usage: set-global-disabled-log-digests Tell the whole cluster which checksumming digests to disable globally for the payload in transaction logfiles. The effective value can be checked via "marsadm view-disabled-log-digests". See "marsadm view-potential-features" and "marsadm --help" for a list of digest feature names, which must be separated by | symbols. set-global-disabled-net-digests usage: set-global-disabled-net-digests Tell the whole cluster which checksumming digests to disable globally for cluster-wide data comparisons, like fast full-sync. The effective value can be checked via "marsadm view-disabled-net-digests". See "marsadm view-potential-features" and "marsadm --help" for a list of digest feature names, which must be separated by | symbols. set-global-enabled-log-compressions usage: set-global-enabled-log-compressions Tell the whole cluster which compression features to use globally for logfile compression. The effective value can be checked via "marsadm view-enabled-log-compressions". See "marsadm view-potential-features" and "marsadm --help" for a list of compression feature names, which must be separated by | symbols. set-global-enabled-net-compressions usage: set-global-enabled-net-compressions Tell the whole cluster which compression features to use globally for network transport compression. This is independent from log compression. The effective value can be checked via "marsadm view-enabled-log-compressions". See "marsadm view-potential-features" and "marsadm --help" for a list of compression feature names, which must be separated by | symbols. set-sync-limit-value usage: set-sync-limit-value Set the maximum number of resources which should by syncing concurrently. set-systemd-unit usage: set-systemd-unit [] This activates the systemd template engine of marsadm. Please read mars-user-manual.pdf on this. When is omitted, it will be treated equal to . set-systemd-want usage: set-systemd-want Override the current location where the complete systemd unit stack should be started. Useful for a _temporary_ stop of the systemd unit stack by supplying the special hostname "(none)". For a _permanent_ stop, use "marsadm set-systemd-unit " instead. split-cluster usage: split-cluster (no parameters) NOT OFFICIALLY SUPPORTED - ONLY FOR EXPERTS. RTFS = Read The Fucking Sourcecode. Use this only if you know what you are doing. up usage: up Shortcut for attach + resume-sync + resume-fetch + resume-replay. update-cluster usage: update-cluster [] Fetch all the links from all joined cluster hosts. Use this between create-resource and join-resource. NOTICE: this is extremely useful for avoiding races when scripting in a cluster. wait-cluster usage: wait-resource [] Waits until a ping-pong communication has succeeded in the whole cluster (or only the members of ). NOTICE: this is extremely useful for avoiding races when scripting in a cluster. wait-connect usage: wait-connect [] See wait-cluster. wait-resource usage: wait-resource [[attach|fetch|replay|sync][-on|-off]] Wait until the given condition is met on the resource, locally. wait-umount usage: wait-umount Wait until /dev/mars/ has disappeared in the cluster (even remotely). Useful on both primary and secondary nodes. = comma-separated list of resource names or "all" for all resources = | = 1and1 comminfo commstate cstate default default-footer default-global default-header default-resource device-info device-stats diskstate diskstate-1and1 dstate fetch-line fetch-line-1and1 flags flags-1and1 outdated-flags outdated-flags-1and1 primarynode primarynode-1and1 replay-line replay-line-1and1 replinfo replinfo-1and1 replstate replstate-1and1 resource-errors resource-errors-1and1 role role-1and1 state status sync-line sync-line-1and1 syncinfo syncinfo-1and1 todo-role = count-{cluster,resource}-members deletable-size device-{opened,nrflying,error} device-{ops-rate,amount-rate,rate} disabled-{log|net}-digests enabled-{log|net}-compressions errno-text Convert errno numbers (positive or negative) into human readable text. get-log-status get-resource-{fat,err,wrn}{,-count} get-{disk,device} is-{alive} is-{split-brain,consistent,emergency,orphan} occupied-size present-{disk,device} (deprecated, use *-present instead) replay-basenr replay-code When negative, this indidates that a replay/recovery error has occurred. resource-possible-size rest-space summary-vector systemd-unit tree used-{log,net}-{digest,compression} uuid wait-{is,todo}-{attach,sync,fetch,replay,primary}-{on,off} writeback-rest {alive,fetch,replay,work}-{timestamp,age,lag} {all,the}-{pretty-,}{global-,}{{err,wrn,inf}-,}msg {cluster,resource}-members {disk,device}-present {disk,resource,device}-size {fetch,replay,work}-{lognr,logcount} {get,actual}-primary {implemented,usable}-{digests,compressions} {is,todo,nr}-{attach,sync,fetch,replay,primary} {my,all}-resources {potential,implemented,usable}-features {sync,fetch,replay,work,syncpos}-{size,pos} {sync,fetch,replay,work}-{rest,{almost-,threshold-,}reached,percent,permille,vector} {sync,fetch,replay}-{ops-rate,amount-rate,rate,remain} {time,real-time} {tree,features}-version = CHKSUM_CRC32 | CHKSUM_CRC32C | CHKSUM_MD5 | CHKSUM_MD5_OLD | CHKSUM_SHA1 | COMPRESS_LZ4 | COMPRESS_LZO | COMPRESS_ZLIB \end{verbatim}