#LyX 2.1 created this file. For more info see http://www.lyx.org/ \lyxformat 474 \begin_document \begin_header \textclass scrreprt \begin_preamble \usepackage[dvipsnames]{xcolor} \usepackage{listings} \end_preamble \options abstracton \use_default_options true \begin_modules customHeadersFooters enumitem fixltx2e \end_modules \maintain_unincluded_children false \language english \language_package default \inputencoding auto \fontencoding global \font_roman default \font_sans default \font_typewriter default \font_math auto \font_default_family rmdefault \use_non_tex_fonts false \font_sc false \font_osf false \font_sf_scale 100 \font_tt_scale 100 \graphics default \default_output_format default \output_sync 0 \bibtex_command default \index_command default \paperfontsize 10 \spacing single \use_hyperref true \pdf_title "MARS Manual" \pdf_author "Thomas Schöbel-Theuer" \pdf_bookmarks true \pdf_bookmarksnumbered false \pdf_bookmarksopen false \pdf_bookmarksopenlevel 1 \pdf_breaklinks true \pdf_pdfborder true \pdf_colorlinks true \pdf_backref false \pdf_pdfusetitle true \papersize a4paper \use_geometry true \use_package amsmath 1 \use_package amssymb 1 \use_package cancel 1 \use_package esint 1 \use_package mathdots 1 \use_package mathtools 1 \use_package mhchem 1 \use_package stackrel 1 \use_package stmaryrd 1 \use_package undertilde 1 \cite_engine basic \cite_engine_type default \biblio_style plain \use_bibtopic false \use_indices false \paperorientation portrait \suppress_date false \justification true \use_refstyle 1 \index Index \shortcut idx \color #008000 \end_index \leftmargin 3.7cm \topmargin 2.7cm \rightmargin 2.8cm \bottommargin 2.3cm \secnumdepth 3 \tocdepth 3 \paragraph_separation indent \paragraph_indentation default \quotes_language english \papercolumns 1 \papersides 2 \paperpagestyle headings \tracking_changes false \output_changes false \html_math_output 0 \html_css_as_file 0 \html_be_strict false \end_header \begin_body \begin_layout Title \family typewriter MARS Manual \begin_inset Newline newline \end_inset \begin_inset space ~ \end_inset \end_layout \begin_layout Subtitle Multiversion Asynchronous Replicated Storage \begin_inset Newline newline \end_inset \begin_inset space ~ \end_inset \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/earth-mars-transfer.fig width 70col% \end_inset \end_layout \begin_layout Author Thomas Schöbel-Theuer ( \family typewriter tst@1und1.de \family default ) \end_layout \begin_layout Date Version 0.1-36 \end_layout \begin_layout Lowertitleback \noindent Copyright (C) 2013-16 Thomas Schöbel-Theuer \begin_inset Newline newline \end_inset Copyright (C) 2013-16 1&1 Internet AG (see \begin_inset Flex URL status open \begin_layout Plain Layout http://www.1und1.de \end_layout \end_inset shortly called 1&1 in the following). \begin_inset Newline newline \end_inset \size footnotesize Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled \begin_inset Quotes eld \end_inset \begin_inset CommandInset ref LatexCommand nameref reference "chap:GNU-FDL" \end_inset \begin_inset Quotes erd \end_inset . \end_layout \begin_layout Abstract \family typewriter \begin_inset ERT status open \begin_layout Plain Layout \backslash sloppy \end_layout \end_inset MARS \family default is a block-level storage replication system for long distances / flaky networks under GPL. It runs as a Linux kernel module. The sysadmin interface is similar to DRBD \begin_inset Foot status open \begin_layout Plain Layout Registered trademarks are the property of their respective owner. \end_layout \end_inset , but its internal engine is completely different from DRBD: it works with \series bold transaction logging \series default , similar to some database systems. \end_layout \begin_layout Abstract Therefore, MARS can provide stronger \series bold consistency guarantees \series default . Even in case of network bottlenecks / problems / failures, the secondaries may become outdated (reflect an elder state), but never become inconsistent. In contrast to DRBD, MARS preserves the \series bold order of write operations \series default even when the network is flaky ( \series bold Anytime Consistency \series default ). \end_layout \begin_layout Abstract The current version of MARS supports \begin_inset Formula $k>2$ \end_inset replicas and works \series bold asynchronously \series default . Therefore, application performance is completely decoupled from any network problems. Future versions are planned to also support synchronous or near-synchronous modes. \end_layout \begin_layout Abstract \paragraph_spacing double \noindent \begin_inset space ~ \end_inset \begin_inset Newline newline \end_inset \begin_inset space ~ \end_inset \begin_inset Newline newline \end_inset \begin_inset Box Frameless position "c" hor_pos "c" has_inner_box 1 inner_pos "c" use_parbox 0 use_makebox 1 width "100col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \begin_inset Graphics filename images/earth-mars-transfer.fig width 70col% \end_inset \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset CommandInset toc LatexCommand tableofcontents \end_inset \end_layout \begin_layout Chapter Why You should Replicate Big Data at Block Layer \begin_inset CommandInset label LatexCommand label name "chap:Why-You-should" \end_inset \end_layout \begin_layout Section Cost Arguments from Architecture \end_layout \begin_layout Standard Datacenters aren't usually operated for fun or for hobby. Costs are therefore a very important argument. \end_layout \begin_layout Standard Many enterprise system architects are starting with a particular architecture in mind, called \begin_inset Quotes eld \end_inset big cluster \begin_inset Quotes erd \end_inset . There is a common belief that otherwise \series bold scalability \series default could not be achieved: \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/Architecure_Big_Cluster.pdf width 100col% \end_inset \end_layout \begin_layout Standard \noindent The crucial point is the storage network here: \begin_inset Formula $n$ \end_inset frontend servers are interconnected with \begin_inset Formula $m=O(n)$ \end_inset storage servers, in order to achieve properties like scalability, failure tolerance, etc. \end_layout \begin_layout Standard Since \emph on any \emph default of the \begin_inset Formula $n$ \end_inset frontends must be able to access \emph on any \emph default of the \begin_inset Formula $m$ \end_inset storages in realtime, the storage network must be dimensioned for \begin_inset Formula $O(n\cdot m)=O(n^{2})$ \end_inset network connections running in parallel. Even if the total network throughput would be scaling only with \begin_inset Formula $O(n)$ \end_inset , the network has to \emph on switch \emph default the packets from \begin_inset Formula $n$ \end_inset sources to \begin_inset Formula $m$ \end_inset destinations (and their opposite way back) in \series bold realtime \series default . \end_layout \begin_layout Standard This \series bold cross-bar functionality \series default in realtime makes the storage network expensive. Some further factors are increasing the costs of storage networks: \end_layout \begin_layout Itemize In order to limit error propagation from other networks, the storage network is often built as a \emph on physically separate \emph default / \emph on dedicated \emph default network. \end_layout \begin_layout Itemize Because storage networks are heavily reacting to high latencies and packet loss, they often need to be dimensioned for the \series bold worst case \series default (load peaks, packet storms, etc), needing one of the best = most expensive components for reducing latency and increasing throughput. Dimensioning to the worst case instead of an average case plus some safety margins is nothing but an expensive \series bold overdimensioning \series default / \series bold over-engineering \series default . \end_layout \begin_layout Itemize When multipathing is required for improving fault tolerance of the storage network itself, these efforts will even \series bold double \series default . \end_layout \begin_layout Itemize When geo-redundancy is required, the whole mess may easily more than double another time because in cases of disasters like terrorist attacks the backup datacenter must be prepared for taking over for multiple days or weeks. \end_layout \begin_layout Standard Fortunately, there is an alternative called \begin_inset Quotes eld \end_inset sharding architecture \begin_inset Quotes erd \end_inset which does not need a storage network at all, at least when built and dimension ed properly. Instead, it \emph on should have \emph default (but not always needs) a so-called replication network which can, when present, be dimensioned much smaller because it does neither need realtime operations, nor scalabiliy to \begin_inset Formula $O(n^{2})$ \end_inset : \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/Architecure_Sharding.pdf width 100col% \end_inset \end_layout \begin_layout Standard \noindent Sharding architectures are extremely well suited when both the input traffic and the data is \series bold already partitioned \series default . For example, when several thousands or even millions of customers are operating on disjoint data sets, like in web hosting where each webspace is residing in its own home directory, or when each of millions of mySQL database instances has to be isolated from its neighbour. \end_layout \begin_layout Standard Even in cases when any customer may potentially access any of the data items residing in the whole storage pool (e.g. like in a search engine), sharding can be often applied. The trick is to create some relatively simple content-based dynamic switching or redirect mechanism in the input network traffic, similar to HTTP load balancers or redirectors. \end_layout \begin_layout Standard Only when partitioning of input traffic plus data is not possible in a reasonabl e way, big cluster architectures as implemented for example in Ceph or Swift (and partly even possible with MARS when resticted to the block layer) have their \series bold usecase \series default . Only under such a precondition they are really needed. \end_layout \begin_layout Standard When sharding is possible, it is the preferred model due to cost and performance reasons. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Notice that MARS' new remote device feature from the 0.2 branch series (which is a replacement for iSCSI) \emph on could \emph default be used for implementing the \begin_inset Quotes eld \end_inset big cluster \begin_inset Quotes erd \end_inset model at block layer. \end_layout \begin_layout Standard Nevertheless, this sub-variant is not the preferred model. Following is the a super-model which combines both the \begin_inset Quotes eld \end_inset big cluster \begin_inset Quotes erd \end_inset and sharding model at block lyer in a very flexible way. The following example shows only two servers from a pool consisting of hundreds or thousands of servers: \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/MARS_Cluster_on_Demand.pdf width 100col% \end_inset \end_layout \begin_layout Standard \noindent The idea is to use iSCSI or the MARS remote device \emph on only where necessary \emph default . Preferably, local storage is divided into multiple Logical Volumes (LVs) via LVM, which are \emph on directly \emph default used \emph on locally \emph default by Virtual Machines (VMs), such as KVM or filesystem-based variants like LXC containers. \end_layout \begin_layout Standard In the above example, the left machine has relatively less CPU power or RAM than storage capacity. Therefore, not \emph on all \emph default LVs could be instantiated locally at the same time without causing operational problems, but \emph on some \emph default of them can be run locally. The example solution is to \emph on exceptionally(!) \emph default export LV3 to the right server, which has some otherwise unused CPU and RAM capacity. \end_layout \begin_layout Standard Notice that locally running VMs doesn't produce any storage network traffic at all. Therefore, this is the preferred runtime configuration. \end_layout \begin_layout Standard Only in cases of resource imbalance, such as (transient) CPU or RAM peaks (e.g. caused by DDOS attacks), \emph on some \emph default containers may be run somewhere else over the network. In a well-balanced and well-dimensioned system, this will be the \series bold vast minority \series default , and should be only used for dealing with timely load peaks etc. \end_layout \begin_layout Standard Running VMs directly on the same servers as their storage is a \series bold major cost reducer. \end_layout \begin_layout Standard You simply don't need to buy and operate \begin_inset Formula $n+m$ \end_inset servers, but only about \begin_inset Formula $\max(n,m)+m\cdot\epsilon$ \end_inset servers, where \begin_inset Formula $\epsilon$ \end_inset corresponds to some relative small extra resources needed by MARS. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset In addition to this and to reduced networking costs, there are further cost savings at power consumption, air conditioning, Height Units (HUs), number of HDDs, operating costs, etc as explained below in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Cost-Arguments-from" \end_inset . \end_layout \begin_layout Standard The sharding model needs a different approach to load balancing of storage space than the big cluster model. There are serveral possibilities at different layers: \end_layout \begin_layout Itemize Dynamically growing the sizes of LVs via \family typewriter lvresize \family default followed by \family typewriter marsadm resize \family default followed by \family typewriter xfs_growfs \family default or similar operations. \end_layout \begin_layout Itemize Moving customer data at filesystem or database level via \family typewriter rsync \family default or \family typewriter mysqldump \family default or similar. \end_layout \begin_layout Itemize Moving whole LVs via MARS, as shown in the following example: \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/MARS_Background_Migration.pdf width 100col% \end_inset \end_layout \begin_layout Standard \noindent The idea is to dynamically create \emph on additional \emph default LV replicas for the sake of background migration. Examples: \end_layout \begin_layout Itemize In case you had no redundancy at LV level before, you have \begin_inset Formula $k=1$ \end_inset replicas during ordinary operation. If not yet done, you should transparently introduce MARS into your LVM-based stack by using the so-called \begin_inset Quotes eld \end_inset standalone mode \begin_inset Quotes erd \end_inset of MARS. When necessary, create the first MARS replica with \family typewriter marsadm create-resource \family default on your already-existing LV data, which is retained unmodified, and restart your application again. Now, for the sake of migration, you just create an additional replica at another server via \family typewriter marsadm join-resource \family default there and wait until the second mirror has been fully \series bold synced \series default in background, while your application is running and while the contents of the LV is modified \emph on in parallel \emph default by your ordinary applications. Then you do a primary \series bold handover \series default to your mirror. This is usually a matter of minutes, or even seconds. Once the application runs again at the new location, you can delete the old replica via \family typewriter marsadm leave-resource \family default and \family typewriter lvremove \family default . Finally, you may re-use the freed-up space for something else (e.g. \family typewriter lvresize \family default of \emph on another \emph default LV followed by \family typewriter marsadm resize \family default followed by \family typewriter xfs_growfs \family default or similar). For the sake of some hardware lifecycle, you may run a different strategy: evacuate the original source server completely via the above MARS migration method, and eventually decommission it. \end_layout \begin_layout Itemize In case you already have a redundant LV copy somewhere, you should run a similar procedure, but starting with \begin_inset Formula $k=2$ \end_inset replicas, and temporarily increasing the number of replicas to either \begin_inset Formula $k'=3$ \end_inset when moving each replica step-by-step, or you may even directly go up to \begin_inset Formula $k'=4$ \end_inset when moving pairs at once. \end_layout \begin_layout Itemize When already starting with \begin_inset Formula $k>2$ \end_inset LV replicas in the starting position, you can do the same analogously, or you may then use a lesser variant. For example, we have some mission-critical servers at 1&1 which are running \begin_inset Formula $k=4$ \end_inset replicas all the time on relatively small but important LVs for extremely increased safety. Only in such a case, you may have the freedom to temporarily decrease from \begin_inset Formula $k=4$ \end_inset to \begin_inset Formula $k'=3$ \end_inset and then going up to \begin_inset Formula $k''=4$ \end_inset again. This has the advantage of requiring less temporary storage space for \emph on swapping \emph default some LVs. \end_layout \begin_layout Section Cost Arguments from Technology \begin_inset CommandInset label LatexCommand label name "sec:Cost-Arguments-from" \end_inset \end_layout \begin_layout Standard A common pre-jugdement is that \begin_inset Quotes eld \end_inset big cluster \begin_inset Quotes erd \end_inset is the cheapest scaling storage technology when built on so-called \begin_inset Quotes eld \end_inset commodity hardware \begin_inset Quotes erd \end_inset . While this is very often true for the \begin_inset Quotes eld \end_inset commodity hardware \begin_inset Quotes erd \end_inset part, it is often not true for the \begin_inset Quotes eld \end_inset big cluster \begin_inset Quotes erd \end_inset part. But let us first look at the \begin_inset Quotes eld \end_inset commodity \begin_inset Quotes erd \end_inset part. \end_layout \begin_layout Standard Here are some rough market prices for basic storage as determined around end of 2016 / start of 2017: \end_layout \begin_layout Standard \noindent \align center \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size small Technology \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small Enterprise-Grade \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small Price in € / TB \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small Consumer SATA disks via on-board SATA controllers \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small no (small-scale) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small < 30 possible \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small SAS disks via SAS HBAs (e.g. in external 14 \begin_inset Quotes erd \end_inset shelfs) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small halfways \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small < 80 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small SAS disks via hardware RAID + LVM (+DRBD/MARS) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small yes \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small 80 to 150 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small Commercial storage appliances via iSCSI \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small yes \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small around 1000 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small Cloud storage, S3 over 5 years lifetime \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small yes \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size small 3000 to 8000 \end_layout \end_inset \end_inset \end_layout \begin_layout Standard \noindent You can see that any self-built and self-administered storage (whose price varies with slower high-capacity versus faster low-capacity disks) is much cheaper than any commercial offering by about a factor of 10 or even more. If you need to operate serveral petabytes of data, self-built storage is always cheaper than commercial one, even if additional manpower would be needed for commissioning and operating. Here we just assume that the storage is needed permanently for at least 5 years, as is the case in web hosting, databases, backup / archival systems, and many other application areas. \end_layout \begin_layout Standard Cloud storage is way too much hyped. From a commercial perspective it usually pays off only when your storage demands are \emph on extremely \emph default varying over time, and when you need some \emph on extra \emph default capacity only \emph on temporarily \emph default for a \emph on very \emph default short time. \end_layout \begin_layout Standard In addition to basic storage prices, many further factors come into play when roughly comparing big clusters versus sharding ( \family roman \series medium \shape up \size normal \emph off \bar no \strikeout off \uuline off \uwave off \noun off \color none \begin_inset Formula $\times2$ \end_inset \family default \series default \shape default \size default \emph default \bar default \strikeout default \uuline default \uwave default \noun default \color inherit means with geo-redundancy): \end_layout \begin_layout Standard \noindent \align center \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout BC \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout SHA \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout BC \begin_inset Formula $\times2$ \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout SHA \begin_inset Formula $\times2$ \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout # of Disks \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout >200% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout <120% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout >400% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout <240% \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout # of Servers \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \begin_inset Formula $\approx\times2$ \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \begin_inset Formula $\approx\times1.1$ \end_inset possible \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \begin_inset Formula $\approx\times4$ \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \begin_inset Formula $\approx\times2.2$ \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Power Consumption \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \begin_inset Formula $\approx\times2$ \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout dito \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout dito \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout dito \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout HU Consumption \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \begin_inset Formula $\approx\times2$ \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout dito \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout dito \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout dito \end_layout \end_inset \end_inset \end_layout \begin_layout Standard \noindent The crucial point is not only the number of extra servers needed for dedicated storage boxes, but also the total number of HDDs. While big cluster implementations like Ceph or Swift can \emph on theoretically \emph default use some erasure encoding for avoiding full object replicas, their \emph on practice \emph default as seen in our internal 1&1 Ceph clusters is similar to RAID-10, but just on objects instead of block-based sectors. \end_layout \begin_layout Standard Therefore a big cluster typically needs >200% disks to reach the same net capacity as a sharded cluster, where typically hardware RAID-60 with a significantly smaller overhead is sufficient for providing sufficient failure tolerance at disk level. \end_layout \begin_layout Standard There is a surprising consequence from this: geo-redundancy is not as expensive as many people are believing. It just needs to be built with the proper architecture. A sharded geo-redundant pool based on hardware RAID-60 costs roughly about the same as (or when taking \begin_inset Formula $O(n^{2})$ \end_inset storage networks into account it is possibly even cheaper than) a big cluster with full replicas without geo-redundancy. A geo-redundant sharded pool provides even better failure compensation. \end_layout \begin_layout Standard Notice that geo-redundancy implies by definition that an unforeseeable \series bold full datacenter loss \series default (e.g. caused by \series bold disasters \series default like a terrorist attack or an earthquake) must be compensated for \series bold several days or weeks \series default . Therefore it is \emph on not \emph default sufficient to take a big cluster and just spread it to two different locations. \end_layout \begin_layout Standard In any case, a MARS-based geo-redundant sharding pool is cheaper than using commercial storage appliances which are much more expensive by their nature. \end_layout \begin_layout Section Performance Arguments from Architecture \end_layout \begin_layout Standard Some people think that replication is easily done at filesystem layer. There exist lots of cluster filesystems and other filesystem-layer solutions which claim to be able to replicate your data, sometimes even over long distances. \end_layout \begin_layout Standard Trying to replicate several petabytes of data, or some billions of inodes, is however a much bigger challenge than many people can imagine. \end_layout \begin_layout Standard Choosing the wrong layer for \series bold mass data replication \series default may get you into trouble. Here is an explanation why replication at the block layer is more easy and less error prone: \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/Layers.pdf width 100col% \end_inset \end_layout \begin_layout Standard \noindent The picture shows the main components of a standalone Unix / Linux system. In the late 1970s / early 1980s, a so-called \series bold Buffer Cache \series default had been introduced into the architecture of Unix. Today's Linux has refined the concept to various internal caches such as the Page Cache and the Dentry Cache. \end_layout \begin_layout Standard All these caches serve only one purpose: they are reducing the load onto the storage by exploitation of fast RAM. A well-tuned cache can yield high cache hit ratios, typically 99%. In some cases (as observed in practice) even more than 99.9%. \end_layout \begin_layout Standard Now start distributing the system over long distances. There are two potential cut points A and B. Cutting at A means replication at filesystem level. B means replication at block level. \end_layout \begin_layout Standard When replicating at A, you will notice that the caches are \emph on below \emph default your cut point. Thus you will have to re-implement \series bold distributed caches \series default , and you will have to \series bold maintain cache coherence \series default . \end_layout \begin_layout Standard When replicating at B, the Linux caches are \emph on above \emph default your cut point. Thus you will receive much less traffic, typically already reduced by a factor of 100, or even more. This is much more easy to cope with. You will also profit from \series bold journalling filesystems \series default like \family typewriter ext4 \family default or \family typewriter xfs \family default . In contrast, \emph on truly distributed \begin_inset Foot status open \begin_layout Plain Layout In this context, \begin_inset Quotes eld \end_inset truly \begin_inset Quotes erd \end_inset means that the POSIX semantics would be always guaranteed cluster-wide, and even in case of partial failures. In practice, some distributed filesystems like NFS don't even obey the POSIX standard \emph on locally \emph default on 1 standalone client. We know of projects which have \emph on failed \emph default right because of this. \end_layout \end_inset \emph default journalling is typically not available with distributed cluster filesystems. \end_layout \begin_layout Standard A \emph on potential \emph default drawback of block layer replication is that you are typically limited to active-passive replication. An active-active operation is not impossible at block layer (see combinations of DRBD with \family typewriter ocfs2 \family default ), but less common, and less safe to operate. \end_layout \begin_layout Standard This limitation isn't necessarily caused by the choice of layer. It is simply caused by the \series bold laws of physics \series default : communication is always limited by the speed of light. A distributed filesystem is nothing else but a logically \series bold distributed shared memory \series default (DSM). \end_layout \begin_layout Standard Some decades of research on DSM have shown that there exist applications / workloads where the DSM model is \emph on inferior \emph default to the direct communication paradigm. Even in short-distance / cluster scenarios. Long-distance DSM is extremely cumbersome. \end_layout \begin_layout Standard Therefore: you simply shouldn't try to solve long-distance communication needs via communication over filesystems. Even simple producer-consumer scenarios (one-way communication) are less performant (e.g. when compared to plain TCP/IP) when it comes to distributed POSIX semantics. There is simply too much \series bold synchronisation overhead at metadata level \series default . \end_layout \begin_layout Standard If you have a need for mixed operations at different locations in parallel: just split your data set into disjoint filesystem instances (or database / VM instances, etc). All you need is careful thought about the \emph on appropriate \emph default \emph on granularity \emph default of your data sets (such as well-chosen \emph on sets \emph default of user homedirectory subtrees, or database sets logically belonging together, etc). \end_layout \begin_layout Standard Replication at filesystem level is often at single-file granularity. If you have several millions or even billions of inodes, you may easily find yourself in a snakepit. \end_layout \begin_layout Standard Conclusion: active-passive operation over long distances (such as between continents) is even an advantage. It keeps you from trying bad / almost impossible things. \end_layout \begin_layout Chapter Use Cases for MARS vs DRBD \begin_inset CommandInset label LatexCommand label name "chap:Use-Cases-for" \end_inset \end_layout \begin_layout Standard DRBD has a long history of successfully providing HA features to many users of Linux. With the advent of MARS, many people are wondering what the difference is. They ask for recommendations. In which use cases should DRBD be recommended, and in which other cases is MARS the better choice? \end_layout \begin_layout Standard The following table is a short guide to the most important cases where the decision is rather clear: \end_layout \begin_layout Standard \noindent \align center \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout Use Case \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout Recommendation \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout server pairs, each directly connected via \series bold crossover cables \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout DRBD \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold active-active \series default / dual-primary, e.g. \family typewriter \series bold gfs2 \family default \series default , \family typewriter \series bold ocfs2 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout DRBD \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout distance \series bold > 50km \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout MARS \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold > 100 server pairs \series default over a short-distance \series bold shared \series default line \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout MARS \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout all else / intermediate cases \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout read the following details \end_layout \end_inset \end_inset \end_layout \begin_layout Standard \noindent There exist some use cases where DRBD is clearly better than MARS. 1&1 has a long history of experiences with DRBD where it works very fine, in particular coupling Linux devices rack-to-rack via crossover cables. DRBD is just \emph on constructed \emph default for that use case (RAID-1 over network). In such a scenario, DRBD is better than MARS because it uses up less disk space resources. In addition, newer DRBD versions can run over high-speed but short-distance interconnects like Infiniband (via the SDP protocol). Another use case for DRBD is active-active / dual-primary mode, e.g. \family typewriter ocfs2 \family default \begin_inset Foot status open \begin_layout Plain Layout Notice that \family typewriter ocfs2 \family default is appearantly not constructed for long distances. 1&1 has some experiences on a specific short distance cluster where the \family typewriter ocfs2 \family default / \family typewriter DRBD \family default combination scaled a little bit better than \family typewriter NFS \family default , but worse than \family typewriter glusterfs \family default (using 2 clients in both cases -- notice that \family typewriter glusterfs \family default showed extremely bad performance when trying to enable active-active \family typewriter glusterfs \family default replication between 2 server instances, therefore we ended up using active-pass ive DRBD replication below a single \family typewriter glusterfs \family default server). Conclusion: \family typewriter NFS \family default < \family typewriter ocfs2 \family default < \family typewriter glusterfs \family default < sharding. We found that \family typewriter glusterfs \family default on top of active-passive DRBD scalability was about 2 times better than \family typewriter NFS \family default on top of active-passive DRBD, while \family typewriter ocfs2 \family default on top of \family typewriter DRBD \family default in active-active mode was somewhere inbetween. All cluster comparisons with an increasing workload over time (measured as number of customers which could be safely operated). Each system was replaced by the next one when the respective scalability was at its respective end, each time leading to operational problems. The ultimate solution was to replace all of these clustering concepts by the general concept of \series bold sharding \series default . \end_layout \end_inset over short \begin_inset Foot status open \begin_layout Plain Layout Active-active won't work over long distances at all because of high network latencies (cf chapter \begin_inset CommandInset ref LatexCommand ref reference "chap:Why-You-should" \end_inset ). Probably, for replication of whole clusters over long distances DRBD and MARS could be stacked: using DRBD on top for MARS for active-active clustering of \family typewriter gfs2 \family default or \family typewriter ocfs2 \family default , and a MARS instance \emph on below \emph default for failover of \emph on one \emph default of the DRBD replicas over long distances. \end_layout \end_inset distances. \end_layout \begin_layout Standard On the other hand, there exist other use cases where DRBD did not work as expected, leading to incidents and other operational problems. We analyzed them for our specific use cases. The later author of MARS came to the conclusion that they could only be resolved by fundamental changes in the overall architecture of DRBD. The development of MARS started at the personal initiative of the author, first in form of a personal project during holidays, but later picked up by 1&1 as an official project. \end_layout \begin_layout Standard MARS and DRBD simply have \series bold different application areas \series default . \end_layout \begin_layout Standard In the following, we will discuss the pros and cons of each system in particular situations and contexts, and we shed some light at their conceptual and operational differences. \end_layout \begin_layout Section Network Bottlenecks \begin_inset CommandInset label LatexCommand label name "sec:Network-Bottlenecks" \end_inset \end_layout \begin_layout Subsection Behaviour of DRBD \begin_inset CommandInset label LatexCommand label name "sub:Behaviour-of-DRBD" \end_inset \end_layout \begin_layout Standard In order to describe the most important problem we found when DRBD was used to couple whole datacenters (each encompassing thousands of servers) over metro distances, we strip down that complicated real-life scenario to a simplified laboratory scenario in order to demonstrate the effect with minimal means. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Notice that the following DRBD effect does not appear at crossover cables. The following scenario covers a non-standard case of DRBD. DRBD works fine when no network bottleneck appears! \end_layout \begin_layout Standard The following picture illustrates an effect which has been observed in 1&1 datacenters when running masses of DBRD instances through a single network bottleneck. In addition, the effect is also reproducible by an elder version of the MARS test suite \begin_inset Foot status open \begin_layout Plain Layout The effect has been demonstrated some years ago with DRBD version 8.3.13. By construction, is is independent from any of the DRBD series 8.3.x, 8.4.x, or 9.0.x. \end_layout \end_inset : \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/network-bottleneck-drbd.fig width 80col% \end_inset \end_layout \begin_layout Standard \noindent The simplified scenario is the following: \end_layout \begin_layout Enumerate DRBD is loaded with a low to medium, but constant rate of write operations for the sake of simplicity of the scenario. \end_layout \begin_layout Enumerate The network has some throughput bottleneck, depicted as a red line. For the sake of simplicity, we just linearly decrease it over time, starting from full throughput, down to zero. The decrease is very slowly over time (some minutes, or even hours). \end_layout \begin_layout Standard What will happen in this scenario? \end_layout \begin_layout Standard As long as the actual DRBD write throughput is lower than the network bandwidth (left part of the horizontal blue line), DRBD works as expected. \end_layout \begin_layout Standard Once the maximum network throughput (red line) starts to fall short of the required application throughput (first blue dotted line), we get into trouble. By its very nature, DRBD works \series bold synchronously \series default . Therefore, it \emph on must \emph default transfer all your application writes through the bottleneck, but now it is impossible \begin_inset Foot status open \begin_layout Plain Layout This is independent from the DRBD protocols A through C, because it just depends on an information-theoretic argument independently from any protocol. We have a fundamental conflict between network capabilities and application demands here, which cannot be circumvented due to the \series bold synchronous \series default nature of DRBD. \end_layout \end_inset due to the bottleneck. As a consequence, the application running on top of DRBD will see increasingly higher IO latencies and/or stalls / hangs. We found practical cases (at least with former versions of DRBD) where IO latencies exceeded practical monitoring limits such as \begin_inset Formula $5$ \end_inset s by far, up to the range of \emph on minutes \emph default . As an experienced sysadmin, you know what happens next: your application will run into an incident, and your customers will be dissatisfied. \end_layout \begin_layout Standard In order to deal with such situations, DRBD has lots of tuning parameters. In particular, the \family typewriter timeout \family default parameter and/or the \family typewriter ping-timeout \family default parameter will determine when DRBD will give up in such a situation and simply drop the network connection as an emergency measure. Dropping the network connection is roughly equivalent to an automatic \family typewriter disconnect \family default , followed by an automatic re-connect attempt after \family typewriter connect-int \family default seconds. During the dropped connection, the incident will appear as being resolved, but at some hidden cost \begin_inset Foot status open \begin_layout Plain Layout By appropriately tuning various DRBD parameters, such as \family typewriter timeout \family default and/or \family typewriter ping-timeout \family default , you can keep the impact of the incident below some viable limit. However, the automatic disconnect will then happen earlier and more often in practice. Flaky or overloaded networks may easily lead to an enormous number of automatic disconnects. \end_layout \end_inset . \end_layout \begin_layout Standard What happens next in our scenario? During the \family typewriter disconnect \family default , DRBD will record all positions of writes in its bitmap and/or in its activity log. As soon as the automatic re-connect succeeds after \family typewriter connect-int \family default seconds, DRBD has to do a partial re-sync of those blocks which were marked dirty in the meantime. This leads to an \emph on additional \emph default bandwidth demand \begin_inset Foot status open \begin_layout Plain Layout DRBD parameters \family typewriter sync-rate \family default resp \family typewriter resync-rate \family default may be used to tune the height of the additional demand. In addition, the newer parameters \family typewriter c-plan-ahead \family default , \family typewriter c-fill-target \family default , \family typewriter c-delay-target \family default , \family typewriter c-min-rate \family default , \family typewriter c-max-rate \family default and friends may be used to dynamically adapt to \emph on some \emph default situations where the application throughput \emph on could \emph default fit through the bottleneck. These newer parameters were developed in a cooperation between 1&1 and Linbit, the maker of DRBD. \end_layout \begin_layout Plain Layout Please note that lowering / dynamically adapting the resync rates may help in lowering the \emph on probability \emph default of occurrences of the above problems in practical scenarios where the bottlenec k would recover to viable limits after some time. However, lowering the rates will also increase the \emph on duration \emph default of re-sync operations accordingly. The \emph on total amount of re-sync data \emph default simply does not decrease when lowering \family typewriter resync-rate \family default ; it even tends to increase over time when new requests arrive. Therefore, the \emph on expectancy value \emph default of problems caused by \emph on strong \emph default network bottlenecks (i.e. when not even the ordinary application rate is fitting through) is \emph on not \emph default improved by lowering or adapting \family typewriter resync-rate \family default , but rather the expectancy value mostly depends on the \emph on relation \emph default between the amount of holdback data versus the amount of application write data, both measured for the duration of some given strong bottleneck. \end_layout \end_inset as indicated by the upper dotted blue box. \end_layout \begin_layout Standard Of course, there is \emph on absolutely no chance \emph default to get the increased amount of data through our bottleneck, since not even the ordinary application load (lower dotted lines) could be transferred. \end_layout \begin_layout Standard Therefore, you run at a \series bold very high risk \series default that the re-sync cannot finish before the next \family typewriter timeout \family default / \family typewriter ping-timeout \family default cycle will drop the network connection again. \end_layout \begin_layout Standard What will be the final result when that risk becomes true? Simply, your secondary site will be \emph on permanently \emph default in state \family typewriter inconsistent \family default . This means, you have lost your redundancy. In our scenario, there is no chance at all to become consistent again, because the network bottleneck declines more and more, slowly. It is simply \emph on hopeless \emph default , by construction. \end_layout \begin_layout Standard In case you lose your primary site now, you are lost at all. \end_layout \begin_layout Standard Some people may argue that the probability for a similar scenario were low. We don't agree on such an argumentation. Not only because it really happens in pratice, and it may even last some days until problems are fixed. In case of \series bold rolling disasters \series default , the network is very likely to become flaky and/or overloaded shortly before the final damage. Even in other cases, you can easily end up with inconsistent secondaries. It occurs not only in the lab, but also in practice if you operate some hundreds or even thousands of DRBD instances. \end_layout \begin_layout Standard The point is that you can produce an ill behaviour \emph on systematically \emph default just by overloading the network a bit for some sufficient duration. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset When coupling whole datacenters via some thousands of DRBD connections, any (short) network loss will almost certainly increase the re-sync network load each time the outage appears to be over. As a consequence, overload may be \emph on provoked \emph default by the re-sync repair attempts. This may easily lead to self-amplifying \series bold throughput storms \series default in some resonance frequency (similar to self-destruction of a bridge when an army is marching over it in lockstep). \end_layout \begin_layout Standard The only way for reliable prevention of loss of secondaries is to start any re-connect \emph on only \emph default in such situations where you can \emph on predict in advance \emph default that the re-sync is \emph on guaranteed \emph default to finish before any network bottleneck / loss will cause an automatic disconnect again. We don't know of any method which can reliably predict the future behaviour of a complex network. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset Conclusion: in the presence of network bottlenecks, you run a considerable risk that your DRBD mirrors get destroyed just in that moment when you desperately need them. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Notice that crossover cables usually never show a behaviour like depicted by the red line. Crossover cables are \emph on passive components \emph default which normally \begin_inset Foot status open \begin_layout Plain Layout Exceptions might be mechanical jiggling of plugs, or electro-magnetical interferences. We never noticed any of them. \end_layout \end_inset either work, or not. The binary connect / disconnect behaviour of DRBD has no problems to cope with that. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset or \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Linbit recommends a \series bold workaround \series default for the inconsistencies during re-sync: LVM snapshots. We tried it, but found a \emph on performance penalty \emph default which made it prohibitive for our concrete application. A problem seems to be the cost of destroying snapshots. LVM uses by default a BOW strategy (Backup On Write, which is the counterpart of COW = Copy On Write). BOW increases IO latencies during ordinary operation. Retaining snapshots is cheap, but reverting them may be very costly, depending on workload. We didn't fully investigate that effect, and our experience is a few years old. You might come to a different conclusion for a different workload, for newer versions of system software, or for a different strategy if you carefully investigate the field. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset DRBD problems usually arise \emph on only \emph default when the network throughput shows some \begin_inset Quotes eld \end_inset awkward \begin_inset Quotes erd \end_inset analog behaviour, such as overload, or as occasionally produced by various switches / routers / transmitters, or other potential sources of packet loss. \end_layout \begin_layout Subsection Behaviour of MARS \begin_inset CommandInset label LatexCommand label name "sub:Behaviour-of-MARS" \end_inset \end_layout \begin_layout Standard The behaviour of MARS in the above scenario: \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/network-bottleneck-mars.fig width 80col% \end_inset \end_layout \begin_layout Standard \noindent When the network is restrained, an asynchronous system like MARS will continue to serve the user IO requests (dotted green line) without any impact / incident while the actual network throughput (solid green line) follows the red line. In the meantime, all changes to the block device are recorded at the transactio n logfiles. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Here is one point in favour of DRBD: MARS stores its transaction logs on the filesystem \family typewriter /mars/ \family default . When the network bottleneck is lasting very long (some days or even some weeks), the filesystem will eventually run out of space some day. Section \begin_inset CommandInset ref LatexCommand ref reference "sec:Defending-Overflow" \end_inset discusses countermeasures against that in detail. In contrast to MARS, DRBD allocates its bitmap \emph on statically \emph default at resource creation time. It uses up less space, and you don't have to monitor it for (potential) overflows. The space for transaction logs is the price you have to pay if you want or need anytime consistency, or asynchronous replication in general. \end_layout \begin_layout Standard In order to really grasp the \emph on heart \emph default of the difference between synchronous and asynchronous replication, we look at the following modified scenario: \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/network-flaky-mars.fig width 80col% \end_inset \end_layout \begin_layout Standard \noindent This time, the network throughput (red line) is varying \begin_inset Foot status open \begin_layout Plain Layout In real life, many long-distance lines or even some heavily used metro lines usually show fluctuations of their network bandwidth by an order of magnitude, or even higher. We have measured them. The overall behaviour can be characterized as \begin_inset Quotes eld \end_inset \series bold chaotic \series default \begin_inset Quotes erd \end_inset . \end_layout \end_inset in some unpredictable way. As before, the application throughput served by MARS is assumed to be constant (dotted green line, often superseded by the solid green line). The actual replication network throughput is depicted by the solid green line. \end_layout \begin_layout Standard As you can see, a network dropdown undershooting the application demand has no impact on the application throughput, but only on the replication network throughput. Whenever the network throughput is held back due to the flaky network, it simply catches up as soon as possible by overshooting the application throughput. The amount of lag-behind is visualized as shaded area: downward shading (below the application throughput) means an increase of the lag-behind, while the upwards shaded areas (beyond the application throughput) indicate a decrease of the lag-behind (catch-up). Once the lag-behind has been fully caught up, the network throughput suddenly jumps back to the application throughput (here visible in two cases). \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Note that the existence of lag-behind areas is roughly corresponding to DRBD disconnect states, and in turn to DRBD inconsistent states of the secondary as long as the lag-behind has not been fully cought up. The very rough \begin_inset Foot status open \begin_layout Plain Layout Of course, this visualization is not exact. On one hand, the DRBD inconsistency phase may start later as depicted here, because it only starts \emph on after \emph default the first automatic disconnect, upon the first automatic re-connect. In addition, the amount of resync data may be smaller than the amount of corresponding MARS transaction logfile data, because the DRBD bitmap will coalesce multiple writes to the same block into one single transfer. On the other hand, DRBD will transfer no data at all during its disconnected state, while MARS continues its best. This leads to a prolongation of the DRBD inconsistent phase. Depending on properties of the workload and of the network, the real duration of the inconsistency phase may be both shorter or longer. \end_layout \end_inset duration of the corresponding DRBD inconsistency phase is visualized as magenta line at the time scale. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset MARS utilizes the existing network bandwidth as best as possible in order to pipe through as much data as possible, provided that there exists some data requiring expedition. Conceptually, there exists no better way due to information theoretic limits (besides data compression). \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Note that \emph on in average \emph default during a longer period of time, the network must have emough capacity for transporting all of your data. MARS cannot magically break through information-theoretic limits. It cannot magically transport gigabytes of data over modem lines. Only \emph on relatively short \emph default network problems / packet loss can be compensated. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset In case of lag-behind, the version of the data replicated to the secondary site corresponds to some time in the past. Since the data is always transferred in the same order as originally submitted at the primary site, the secondary never gets inconsistent. Your mirror always remains usable. Your only potential problem could be the outdated state, corresponding to some state in the past. However, the \begin_inset Quotes eld \end_inset as-best-as-possible \begin_inset Quotes erd \end_inset approach to the network transfer ensures that your version is always \emph on as up-to-date as possible \emph default even under ill-behaving network bottlenecks. \series bold There is simply no better way to do it. \series default In presence of temporary network bottlenecks such as network congenstion, there exists no better method than prescribed by the information theoretic limit (red line, neglecting data compression). \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset In order to get all of your data through the line, somewhen the network must be healthy again. Otherwise, data will be recorded until the capacity of the \family typewriter /mars/ \family default filesystem is exhausted, leading to an emergency mode (see section \begin_inset CommandInset ref LatexCommand ref reference "sec:Resolution-of-Emergency" \end_inset ). \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset MARS' property of never sacrificing local data consistency (at the possible cost of actuality, as long as you have enough capacity in \family typewriter /mars/ \family default ) is called \series bold Anytime Consistency \series default . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Even when the capacity of \family typewriter /mars/ \family default is exhausted and when emergency mode is entered, the replicas will not become inconsistent by themselves. However, when the emergency mode is later \emph on cleaned up \emph default for a replica, it will become temporarily inconsistent during the fast full sync. Details are in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Resolution-of-Emergency" \end_inset . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Conclusion: you can even use \series bold traffic shaping \series default on MARS' TCP connections in order to globally balance your network throughput (of course at the cost of actuality, but without sacrificing local data consistency). If you would try to do the same with DRBD, you could easily provoke a disaster. MARS simply tolerates any network problems, provided that there is enough disk space for transaction logfiles. Even in case of completely filling up your disk with transaction logfiles after some days or weeks, you will not lose local consistency anywhere (see section \begin_inset CommandInset ref LatexCommand ref reference "sec:Defending-Overflow" \end_inset ). \end_layout \begin_layout Standard Finally, here is yet another scenario where MARS can cope with the situation: \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/network-constant-mars.fig width 80col% \end_inset \end_layout \begin_layout Standard \noindent This time, the network throughput limit (solid red line) is assumed to be constant. However, the application workload (dotted green line) shows some heavy peaks. We know from our 1&1 datacenters that such an application behaviour is very common (e.g. in case of certain kinds of DDOS attacks etc). \end_layout \begin_layout Standard When the peaks are exceeding the network capabilities for some time, the replication network throughput (solid green line) will be limited for a short time, stay a little bit longer at the limit, and finally drop down again to the normal workload. In other words, you get a flexible buffering behaviour, coping with the peaks. \end_layout \begin_layout Standard Similar scenarios (where both the application workload has peaks and the network is flaky to some degree) are rather common. If you would use DRBD there, you were likely to run into regular application performance problems and/or frequent automatic disconnect cycles, depending on the height and on the duration of the peaks, and on network resources. \end_layout \begin_layout Section Long Distances / High Latencies \end_layout \begin_layout Standard In general and in some theories, latencies are conceptually independent from throughput, at least to some degree. There exist all 4 possible combinations: \end_layout \begin_layout Enumerate There exist communication lines with high latencies but also high throughput. Examples are raw fibre cables at the ground of the Atlantic. \end_layout \begin_layout Enumerate High latencies on low-throughput lines is very easy to achieve. If you never saw it, you never ran interactive \family typewriter vi \family default over \family typewriter ssh \family default in parallel to downloads on your old-fashioned modem line. \end_layout \begin_layout Enumerate Low latencies need not be incompatible with high throughput. See Myrinet, InfiniBand or high-speed point-to-point interconnects, such as modern RAM busses. \end_layout \begin_layout Enumerate Low latency combined with low throughput is also possible: in an ATM system (or another pre-reservation system for bandwidth), just increase the multiplex factor on low-capacity but short lines, which is only possible at the cost of assigned bandwidth. \end_layout \begin_layout Standard In the \emph on internet \emph default practice, however, it is very likely that high latencies will also lead to worse throughput, because of the \emph on congestion control algorithms \emph default running all over the world. \end_layout \begin_layout Standard We have experimented with extremely large TCP send/receive buffers plus various window sizes and congestion control algorithms over long-distance lines between the USA and Europe. Yes, it is possible to improve the behaviour to some degree. But magic does not happen. Natural laws will always hold. You simply cannot travel faster than the speed of light. \end_layout \begin_layout Standard Our experience leads to the following rule of thumb, not formally proven by anything, but just observed in practice: \end_layout \begin_layout Quotation In general \begin_inset Foot status open \begin_layout Plain Layout We have heard of cases where even less than 50 km were not working with DRBD. It depends on application workload, on properties of the line, and on congestio n caused by other traffic. Some other people told us that according to \emph on their \emph default experience, much lesser distances should be considered operable, only in the range of a few single kilometers. However, they agree that DRBD is rock stable when used on crossover cables. \end_layout \end_inset , synchronous data replication (not limited to applications of DRBD) works reliably only over distances \begin_inset Formula $<50$ \end_inset km, or sometimes even less. \end_layout \begin_layout Standard There may be some exceptions, e.g. when dealing with low-end workstation loads. But when you are responsible for a whole datacenter and/or some centralized storage units, don't waste your time by trying (almost) impossible things. We recommend to use MARS in such use cases. \end_layout \begin_layout Section Higher Consistency Guarantees vs Actuality \end_layout \begin_layout Standard We already saw in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Network-Bottlenecks" \end_inset that certain types of network bottlenecks can easily (and reproducibly) destroy the consistency of your DRBD secondary, while MARS will preserve local consistency at the cost of actuality ( \series bold anytime consistency \series default ). \end_layout \begin_layout Standard Some people, often located at database operations, are obtrusively arguing that actuality is such a high good that it must not be sacrificed under any circumstances. \end_layout \begin_layout Standard Anyone arguing this way has at least the following choices (list may be incomplete): \end_layout \begin_layout Enumerate None of the above use cases for MARS apply. For instance, short distance replication over crossover cables is sufficient (which occurs very often), or the network is reliable enough such that bottlenecks can never occur (e.g. because the total load is extremely low, or conversely the network is extremely overengineered / expensive), or the occurrence of bottlenecks can \emph on provably \emph default be taken into account. In such cases, DRBD is clearly the better solution than MARS, because it provides better actuality than the current version of MARS, and it uses up less disk resources. \end_layout \begin_layout Enumerate In the presence of network bottlenecks, people didn't notice and/or didn't understand and/or did under-estimate the risk of accidental invalidation of their DRBD secondaries. They should carefully check that risk. They should convince themselves that the risk is \emph on really \emph default bearable. Once they are hit by a systematic chain of events which \emph on reproducibly \emph default provoke the bad effect, it is too late \begin_inset Foot status open \begin_layout Plain Layout Some people seem to need a bad experience before they get the difference between risk caused by reproducible effects and inverted luck. \end_layout \end_inset . \end_layout \begin_layout Enumerate In the presence of network bottlenecks, people found a solution such that DRBD does not automatically re-connect after the connection has been dropped due to network problems (c.f. \family typewriter ko-count \family default parameter). So the risk of inconsistency \emph on appears \emph default to have vanished. In some cases, people did not notice that the risk has \emph on not completely \begin_inset Foot status open \begin_layout Plain Layout Hint: what's the \emph on conceptual \emph default difference beween an automatic and a manual re-connect? Yes, you can try to \emph on lower \emph default the risk in some cases by transferring risks to human analysis and human decisions, but did you take into account the possibility of human errors? \end_layout \end_inset \emph default vanished, and/or they did not notice that now the actuality produced by DRBD is even drastically worse than that of MARS (in the same situation). It is true that DRBD provides better actuality in \family typewriter connected \family default state, but for a full picture the actuality in \family typewriter disconnected \family default state should not be neglected \begin_inset Foot status open \begin_layout Plain Layout Hint: a potential hurdle may be the fact that the current format of \family typewriter /proc/drbd \family default does neither display the timestamp of the first \emph on relevant \emph default network drop nor the total amount of lag-behind user data (which is \emph on not \emph default the same as the number of dirty bits in the bitmap), while \family typewriter marsadm view \family default can display it. So it is difficult to judge the risks. Possibly a chance is inspection of DRBD messages in the syslog, but quantificat ion could remain hard. \end_layout \end_inset . So they didn't notice that their argumentation on the importance of actuality may be fundamentally wrong. A possible way to overcome that may be re-reading section \begin_inset CommandInset ref LatexCommand ref reference "sub:Behaviour-of-MARS" \end_inset and comparing its outcome with the corresponding outcome of DRBD in the same situation. \end_layout \begin_layout Enumerate People are stuck in contradictive requirements because the current version of MARS does not yet support synchronous or pseudo-synchronous operation modes. This should be resolved some day. \end_layout \begin_layout Standard \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset A common misunderstanding is about the actuality guarantees provided by filesystems. The buffer cache / page cache uses by default a \series bold writeback strategy \series default for performance reasons. Even modern journalling filesystems will (by default) provide only consistency guarantees, but no strong actuality guarantee. In case of power loss, some transactions may be even \emph on rolled back \emph default in order to restore consistency. According to POSIX \begin_inset Foot status open \begin_layout Plain Layout The above argumentation also applies to Windows filesystems in analogous way. \end_layout \end_inset and other standards, the only \emph on reliable \emph default way to achieve actuality is usage of system calls like \family typewriter sync() \family default , \family typewriter fsync() \family default , \family typewriter fdatasync() \family default , flags like \family typewriter O_DIRECT \family default , or similar. For performance reasons, the \emph on vast majority of applications \emph default don't use them at all, or use them only sparingly! \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset It makes no sense to require strong actuality guarantees from any block layer replication (whether DRBD or future versions of MARS) while higher layers such as filesystems or even applications are already sacrificing them! \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset In summary, the \series bold anytime consistency \series default provided by MARS is an argument you should consider, even if you need an extra hard disk for transaction logfiles. \end_layout \begin_layout Chapter Quick Start Guide \end_layout \begin_layout Standard This chapter is for impatient but experienced sysadmins who already know DRBD. For more complete information, refer to chapter \begin_inset CommandInset ref LatexCommand nameref reference "chap:The-Sysadmin-Interface" \end_inset . \end_layout \begin_layout Section Preparation: What you Need \begin_inset CommandInset label LatexCommand label name "sec:Preparation:-What-you" \end_inset \end_layout \begin_layout Standard Typically, you will use MARS at servers in a datacenter for replication of big masses of data. \end_layout \begin_layout Standard Typically, you will use MARS for replication \emph on between \emph default multiple datacenters, when the distances are greater than \begin_inset Formula $\approx50$ \end_inset km. Many other solutions, even from commercial storage vendors, will not work reliably over large distances when your network is not \emph on extremely \emph default reliable, or when you try to push huge masses of data from high-performance applications through a network bottleneck. If you ever encountered suchalike problems (or try to avoid them in advance), MARS is for you. \end_layout \begin_layout Standard You can use MARS both at dedicated storage servers (e.g. for serving Windows clients), or at standalone Linux servers where CPU and storage are not separated. \end_layout \begin_layout Standard In order to protect your data from low-level disk failures, you should use a hardware RAID controller with BBU. Software RAID is explicitly \emph on not \emph default recommended, because it generally provides worse performance due to the lack of a hardware BBU (for some benchmark comparisons with/out BBU, see \begin_inset Flex URL status collapsed \begin_layout Plain Layout https://github.com/schoebel/blkreplay/raw/master/doc/blkreplay.pdf \end_layout \end_inset ). \end_layout \begin_layout Standard Typically, you will need more than one RAID set \begin_inset Foot status open \begin_layout Plain Layout For low-cost storage, RAID-5 is no longer regarded safe for today's typical storage sizes, because the error rate is regarded too high. Therefore, use RAID-6. If you need more than 15 disks in total, create multiple RAID sets (each having at most 15 disks, better about 12 disks) and stripe them via LVM (or via your hardware RAID controller if it supports RAID-60). \end_layout \end_inset for big masses of data. Therefore, use of LVM is also recommended \begin_inset Foot status open \begin_layout Plain Layout You may also combine MARS with commercial storage boxes connected via Fibrechann el or iSCSI, but we have not yet operational experiences at 1&1 with such setups. \end_layout \end_inset for your data. \end_layout \begin_layout Standard MARS' tolerance of networking problems comes with some cost. You will need some extra space for the transaction logfiles of MARS, residing at the \family typewriter /mars/ \family default filesystem. \end_layout \begin_layout Standard The exact space requirements for \family typewriter /mars/ \family default depend on the \emph on average write rate \emph default of your application, not on the size of your data. We found that only few applications are writing more than 1 TB per day. Most are writing even less than 100 GB per day. Usually, you want to dimension \family typewriter /mars/ \family default such that you can survive a network loss lasting 3 days / about one weekend. This can be achieved with current technology rather easily: as a simple rule of thumb, just use one \series bold dedicated disk \series default having a capacity of 4 TB or more. Typically, that will provide you with plenty of headroom even for bigger networking incidents. \end_layout \begin_layout Standard Dedicated disks for \family typewriter /mars/ \family default have another advantage: their mechanical head movement is completely independen t from your data head movements. For best performance, attach that dedicated disk to your hardware RAID controller with BBU, building a separate RAID set (even if it consists only of a single disk -- notice that the \series bold hardware BBU \series default is the crucial point). \end_layout \begin_layout Standard If you are concerned about reliability, use two disks switched together as a relatively small RAID-1 set. For extremely high performance demands, you may consider (and check) RAID-10. \end_layout \begin_layout Standard Since the transaction logfiles are highly sequential in their access pattern, a cheap but high-capacity SATA disk (or nearline-SAS disk) is usually sufficien t. At the time of this writing, standard SATA SSDs have shown to be \emph on not \emph default (yet) preferable. Although they offer high random IOPS rate, their sequential throughput is worse, and their long-term stability is questioned by many people at the time of this writing. However, as technology evolves and becomes more mature, this could change in future. \end_layout \begin_layout Standard Use \family typewriter ext4 \family default for \family typewriter /mars/ \family default . Avoid \family typewriter ext3 \family default , and don't use \family typewriter xfs \family default \begin_inset Foot status open \begin_layout Plain Layout It seems that the late internal resource allocation strategy of \family typewriter xfs \family default (or another currently unknown reason) could be the reason for some resource deadlocks which appear only with \family typewriter xfs \family default and only under \emph on extremely \emph default high IO load in combination with high memory pressure. \end_layout \end_inset at all. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Notice that the filesystem \family typewriter /mars/ \family default has nothing to do with an ordinary filesystem. It is completely reserved for MARS internal purposes, namely as a \series bold storage container \series default for MARS' persistent data. It does not obey any userspace rules like FHS (filesystem hierarchy standard), and it should not be accessed by any userspace tool execpt the official \family typewriter marsadm \family default tool. Its internal data format should be a regarded as a \series bold blackbox \series default by you. The internal data format may change in future, or the complete \family typewriter /mars/ \family default filesystem may be even replaced by a totally different container format, while the official \family typewriter marsadm \family default interface is supposed to remain stable. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset That said, you might look into its contents \emph on by hand \emph default for curiosity or for \emph on debugging purposes \emph default , and only as root. But don't program any tools / monitoring scripts / etc bypassing the official \family typewriter marsadm \family default tool. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset Like DRBD, the current version of MARS has \series bold no security \series default built in. MARS assumes that it is running in a \series bold trusted network \series default . Anyone who can connect to the MARS ports (default 7777 to 7779) can potentially breach in and become root! Therefore, you \series bold must \series default protect your network by appropriate means, such as firewalling and/or encrypted VPN. \end_layout \begin_layout Standard Currently, MARS provides no shared secret like DRBD, because a simple shared secret is way too weak to provide any real security (potentially misleading people about the real level of security). Future versions of MARS should provide at least 2-factor authorization, and encryption via dynamic session keys. Until that is implemented, use a secured VPN instead! And don't forget to \emph on audit \emph default it for security holes! \end_layout \begin_layout Section Setup Primary and Secondary Cluster Nodes \end_layout \begin_layout Standard If you already use DRBD, you may migrate to MARS (or even back from MARS to DRBD) if you use \emph on external \begin_inset Foot status open \begin_layout Plain Layout \emph on Internal \emph default DRBD metadata should also work as long as the filesystem inside your block device / disk already exists and is not re-created. The latter would destroy the DRBD metadata, but even that will not hurt you really: you can always switch back to DRBD using \emph on external \emph default metadata, as long as you have some small spare space somewhere. \end_layout \end_inset \emph default DRBD metadata (which is not touched by MARS). \end_layout \begin_layout Subsection Kernel and MARS Module \end_layout \begin_layout Standard At the time of this writing, a small pre-patch for the Linux kernel is needed. It it trivial and consists mostly of \family typewriter EXPORT_SYMBOL() \family default statements. The pre-patch must be applied to the kernel source tree before building your (custom) kernel. Future versions of MARS are planned to require no pre-patch anymore. \end_layout \begin_layout Standard The MARS kernel module can be built in two different ways: \end_layout \begin_layout Enumerate inplace in the kernel source tree: \family typewriter cd block/ && git clone git://github.com/schoebel/mars \end_layout \begin_layout Enumerate as a separate kernel module, only for experienced \begin_inset Foot status open \begin_layout Plain Layout You should be familiar with the problems arising from orthogonal combination of different kernel versions with different MARS module versions and with different \family typewriter marsadm \family default userspace tool versions at the package management level. Hint: \family typewriter modinfo \family default is your friend. \end_layout \end_inset sysadmins: see file \family typewriter Makefile.dist \family default (tested with some older versions of Debian; may need some extra work with other distros). \end_layout \begin_layout Standard Further / more accurate / latest instructions can be found in \family typewriter README \family default and in \family typewriter INSTALL \family default . You must not only install the kernel and the \family typewriter mars.ko \family default kernel module to all of your cluster nodes, but also the \family typewriter marsadm \family default userspace tool. \end_layout \begin_layout Subsection Setup your Cluster Nodes \begin_inset CommandInset label LatexCommand label name "sub:Setup-your-Cluster" \end_inset \end_layout \begin_layout Standard For your cluster, you need at least two nodes. In the following, they will be called A and B. In the beginning, A will have the \family typewriter primary \family default role, while B will be your initial \family typewriter secondary \family default . The roles may change later. \end_layout \begin_layout Enumerate You must be \family typewriter root \family default . \end_layout \begin_layout Enumerate On each of A and B, create the \family typewriter /mars/ \family default mountpoint. \end_layout \begin_layout Enumerate On each node, create an \family typewriter ext4 \family default filesystem on your separate disk / RAID set via \family typewriter mkfs.ext4 \family default (for requirements on size etc see section \begin_inset CommandInset ref LatexCommand nameref reference "sec:Preparation:-What-you" \end_inset ). \end_layout \begin_layout Enumerate On each node, mount that filesystem to \family typewriter /mars/ \family default . It is advisable to add an entry to \family typewriter /etc/fstab \family default . \end_layout \begin_layout Enumerate For security reasons, execute \family typewriter chmod 0700 /mars \family default everyhwere after \family typewriter /mars/ \family default has been mounted. If you forget this step, any following \family typewriter marsadm \family default command will drop you a warning, but will fix the problem for you. \end_layout \begin_layout Enumerate On node A, say \family typewriter marsadm create-cluster \family default . \begin_inset Newline newline \end_inset This must be done \emph on exactly once \emph default , on exactly one node of your cluster. Never do this twice or on different nodes, because that would create two different clusters which would have nothing to do with each other. The \family typewriter marsadm \family default tool protects you against accidentally joining / merging two different clusters. If you accidentally created two different clusters, just umount that \family typewriter /mars/ \family default partition and start over with step 3 at that node. \end_layout \begin_layout Enumerate On node B, you must have a working \family typewriter ssh \family default connection to node A (as \family typewriter root \family default ). Test it by saying \family typewriter ssh A w \family default on node B. It should work without entering a password (otherwise, use \family typewriter ssh-agent \family default to achieve that). In addition, \family typewriter rsync \family default must be installed. \end_layout \begin_layout Enumerate On node B, say \family typewriter marsadm join-cluster A \end_layout \begin_layout Enumerate Only \emph on after \begin_inset Foot status open \begin_layout Plain Layout In fact, you may already \family typewriter modprobe mars \family default at node A after the \family typewriter marsadm create-cluster \family default . Just don't do any of the \family typewriter *-cluster \family default operations when the kernel module is loaded. All other operations should have no such restriction. \end_layout \end_inset \emph default that, do \family typewriter modprobe mars \family default on each node. \end_layout \begin_layout Section Creating and Maintaining Resources \begin_inset CommandInset label LatexCommand label name "sec:Creating-and-Maintaining" \end_inset \end_layout \begin_layout Standard In the following example session, a block device \family typewriter /dev/lv-x/mydata \family default (shortly called \emph on disk \emph default ) must already exist on both nodes A and B, respectively, having the same \begin_inset Foot status open \begin_layout Plain Layout Actually, the disk at the initially secondary side may be larger than that at the initially primary side. This will waste space and is therefore not recommended. \end_layout \end_inset size. For the sake of simplicity, the disk (underlying block device) as well as its later logical resource name as well as its later virtual device name will all be named uniformly by the same suffix \family typewriter mydata \family default . In general, you might name each of them differently, but that is not recommende d since it may easily lead to confusion in larger installations. \end_layout \begin_layout Standard You may have already some data inside your disk \family typewriter /dev/lv-x/mydata \family default at the initially primary side A. Before using it for MARS, it must be unused for any other purpose (such as being mounted, or used by DRBD, etc). MARS will require \series bold exclusive access \series default to it. \end_layout \begin_layout Enumerate On node A, say \family typewriter marsadm create-resource mydata /dev/lv-x/mydata \family default . \begin_inset Newline newline \end_inset As a result, a directory \family typewriter /mars/resource-mydata/ \family default will be created on node A, containing some symlinks. Node A will automatically start in the primary role for this resource. Therefore, a new pseudo-device \family typewriter /dev/mars/mydata \family default will also appear after a few seconds. \begin_inset Newline newline \end_inset Note that the initial contents of \family typewriter /dev/mars/mydata \family default will be exactly the same as in your pre-existing disk \family typewriter /dev/lv-x/mydata \family default . \begin_inset Newline newline \end_inset If you like, you may already use \family typewriter /dev/mars/mydata \family default for mounting your already pre-existing data, or for creating a fresh filesystem , or for exporting via iSCSI, and so on. You may even do so before any other cluster node has joined the resource (so-called \begin_inset Quotes eld \end_inset standalone mode \begin_inset Quotes erd \end_inset ). But you can also do so later after setup of (one ore many) secondaries. \end_layout \begin_layout Enumerate Wait a few seconds until the directory \family typewriter /mars/resource-mydata/ \family default and its symlink contents also appears on cluster node B. The command \family typewriter marsadm wait-cluster \family default may be helpful. \end_layout \begin_layout Enumerate On node B, say \family typewriter marsadm join-resource mydata /dev/lv-x/mydata \family default . \begin_inset Newline newline \end_inset As a result, the initial full-sync from node A to node B should start automatica lly. \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Of course, your old contents of your disk \family typewriter /dev/lv-x/mydata \family default at side B (and \emph on only \emph default there!) is overwritten by the version from side A. Since you are an experienced sysadmin, you knew that, and it was just the effect you deliberately wanted to achieve. If you didn't check that your old contents didn't contain any valuable data (or if you accidentally provided a wrong disk device argument), it is too late now. The \family typewriter marsadm \family default command checks that the disk device argument is really a block device, and that exclusive access to it is possible (as well as some further safety checks, e.g. matching sizes). However, MARS cannot know the \emph on purpose \emph default of your generic block device. MARS (as well as DRBD) is completely ignorant of the \emph on contents \emph default of a generic block device; it does not interpret it in any way. Therefore, you may use MARS (as well as DRBD) for mirroring Windows filesystems , or raw devices from databases, or virtual machines, or whatever. \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Hint: by default, MARS uses the so-called \begin_inset Quotes eld \end_inset fast fullsync \begin_inset Quotes erd \end_inset algorithm. It works similar to \family typewriter rsync \family default , first reading the data on both sides and computing an md5 checksum for each block. Heavy-weight data is only transferred over the long-distance network upon checksum mismatch. This is extremely fast if your data is already (almost) identical on both sides. Conversely, if you know in advance that your initial data is completely different on both sides, you may choose to switch off the fast fullsync algorithm via \family typewriter echo 0 > /proc/sys/mars/do_fast_fullsync \family default in order to save the additional IO overhead and network latencies introduced by the separate checksum comparison steps. \end_layout \begin_layout Enumerate Optionally, only for experienced sysadmins who \emph on really \emph default know what they are doing: if you will create a \emph on new \emph default filesystem on \family typewriter /dev/mars/mydata \family default \emph on after(!) \emph default having created the MARS resource as well as \emph on after \emph default having already joined it on every replica, you may abandon the fast fullsync phase \emph on before \emph default creating the fresh filesystem, because the old content of \family typewriter /dev/mars/mydata \family default will then be just garbage not used by the freshly created filesystem \begin_inset Foot status open \begin_layout Plain Layout It is \emph on vital \emph default that the transaction logfile contents created by \family typewriter mkfs \family default is \emph on fully \emph default propagated to the secondaries and then replayed there. \end_layout \begin_layout Plain Layout Analogously, another exception is also possible, but at your own risk (be careful, really!): when migrating your data from DRBD to MARS, and you have ensured that (1) at the end of using DRBD both your replicas were really equal (you should have checked that), and (2) before and after setting up any side of MARS ( \family typewriter create-resource \family default as well as \family typewriter join-resource \family default ) nothing has been written at all to it (i.e. no usage, neither of \family typewriter /dev/lv/mydata \family default nor of \family typewriter /dev/mars/mydata \family default has occurred in any way), the first transaction logfile \family typewriter /mars/resource-mydata/log-000000001-$primary \family default created by MARS will be empty. Check whether this is really true! Then, and only then, you may also issue a \family typewriter fake-sync \family default . \end_layout \end_inset . Then, and only then, you may say \family typewriter marsadm fake-sync mydata \family default in order to abort the sync operation. \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset Never do a \family typewriter fake-sync \family default unless you are \series bold absolutely sure \series default that you really don't need to sync the data! Otherwise, you are \emph on guaranteed \emph default to have produced harmful inconsistencies. If you accidentally issued \family typewriter fake-sync \family default , you may startover the fast full sync at your secondary side by saying \family typewriter marsadm invalidate mydata \family default (analogously to the corresponding DRBD command). \end_layout \begin_layout Section Keeping Resources Operational \end_layout \begin_layout Subsection Logfile Rotation / Deletion \begin_inset CommandInset label LatexCommand label name "sub:Logfile-Rotation" \end_inset \end_layout \begin_layout Standard As explained in section \begin_inset CommandInset ref LatexCommand nameref reference "sec:The-Transaction-Logger" \end_inset , all changes to your resource data are recorded in transaction logfiles residing on the \family typewriter /mars/ \family default filesystem. These files are always growing over time. In order to avoid filesystem overflow, the following must be done in regular time intervals: \end_layout \begin_layout Enumerate \family typewriter marsadm log-rotate all \family default \begin_inset Newline newline \end_inset This starts appending to a new logfile on all of your resources. The logfiles are automatically numbered by an increasing 9-digit logfile number. This will suffice for many centuries even if you would logrotate once a minute. Practical frequencies for logfile rotation are more like once an hour, or every 10 minutes when having highly-loaded storage servers. \end_layout \begin_layout Enumerate \family typewriter marsadm log-delete-all all \family default \begin_inset Newline newline \end_inset This determines all logfiles from all resources which are no longer needed (i.e. which are \emph on fully \emph default replayed, on \emph on all \emph default relevant secondaries). All superfluous logfiles are then deleted, including all copies on all secondaries. \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset The current version of MARS deletes either \emph on all \emph default replicas of a logfile everywhere, or \emph on none \emph default of the replicas. This is a simple rule, but has the drawback that one node may hinder other nodes from freeing space in \family typewriter /mars/ \family default . In particular, the command \family typewriter marsadm pause-replay $res \family default (as well as \family typewriter marsadm disconnect $res \family default ) will freeze the space reclamation in the whole cluster when the pause is lasting very long. \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Best practice is to do both \family typewriter log-rotate \family default and \family typewriter log-delete-all \family default in a \family typewriter cron \family default job. In addition, you should establish some regular monitoring of the free space present in the \family typewriter /mars/ \family default filesystem. \end_layout \begin_layout Standard More detailed information about about avoidance of \family typewriter /mars/ \family default overflow is in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Defending-Overflow" \end_inset . \end_layout \begin_layout Subsection Switch Primary / Secondary Roles \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/switching.fig width 90col% \end_inset \end_layout \begin_layout Standard \noindent In contrast to DRBD, MARS distinguishes between \emph on intended \emph default and \emph on forced \emph default switching. This distinction is necessary due to differences in the communication architect ure (asynchronous communication vs synchronous communication, see sections \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Lamport-Clock" \end_inset and \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Symlink-Tree" \end_inset ). \end_layout \begin_layout Standard Asynchronous communication means that (in worst case) a message may take (almost) arbitrary time in a distorted network to propagate to another node. As a consequence, the risk for accidentally creating an (unintended) split brain is increased (compared to a synchronous system like DRBD). \end_layout \begin_layout Standard In order to minimize this risk, MARS has invested a lot of effort into an internal handover protocol when you start an \emph on intended \emph default primary switch. \end_layout \begin_layout Subsubsection Intended Switching / Planned Handover \begin_inset CommandInset label LatexCommand label name "sub:Intended-Switching" \end_inset \end_layout \begin_layout Standard Before starting a planned handover from your old primary \family typewriter A \family default to a new primary \family typewriter B \family default , you should check the replication of the resource. As a human, use \family typewriter marsadm view mydata \family default . For scripting, use the macros from section \begin_inset CommandInset ref LatexCommand ref reference "sub:Predefined-Trivial-Macros" \end_inset (see also section \begin_inset CommandInset ref LatexCommand ref reference "sec:Scripting-HOWTO" \end_inset ; an example can be found in \begin_inset Flex URL status collapsed \begin_layout Plain Layout contrib/example-scripts/check-mars-switchable.sh \end_layout \end_inset ). The network should be OK, and the amount of replication delay should be as low as possible. Otherwise, handover may take a very long time. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Best practice is to \series bold prepare a planned handover \series default by the following steps: \end_layout \begin_layout Enumerate Check the network and the replication lag. It should be low (a few hundred megabytes, or a low number of gigabytes - see also the rough time forecast shown by \family typewriter marsadm view mydata \family default when there is a larger replication delay, or directly access the forecast by \family typewriter marsadm view-replinfo \family default ). \end_layout \begin_layout Enumerate Stop your application, then umount \family typewriter /dev/mars/mydata \family default on host \family typewriter A \family default . \end_layout \begin_layout Enumerate When scripting, or when typing extremely fast, or for better safety, say \family typewriter marsadm wait-umount mydata \family default host \family typewriter B \family default . When your network is OK, the propagation of the device usage state \begin_inset Foot status open \begin_layout Plain Layout Notice that the usage check for \family typewriter /dev/mars/mydata \family default on host \family typewriter B \family default is based on the \emph on open count \emph default transferred from \emph on another \emph default node \family typewriter A \family default . Since MARS is operating asynchronously (in contrast to DRBD), it may take some time until our node \family typewriter B \family default knows that the device is no longer used at \family typewriter A \family default . This can lead to a race condition if you automate an intended takeover with a script like \family typewriter ssh root@A \begin_inset Quotes eld \end_inset umount /dev/mars/mydata \begin_inset Quotes erd \end_inset ; ssh root@B \begin_inset Quotes eld \end_inset marsadm primary mydata \begin_inset Quotes erd \end_inset \family default because your second ssh command may be faster than the internal MARS symlink tree propagation (cf section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Symlink-Tree" \end_inset ). In order to prevent such races, you are strongly advised to use the command \end_layout \begin_layout Itemize \family typewriter marsadm wait-umount mydata \end_layout \begin_layout Plain Layout on node \family typewriter B \family default before trying to become primary. See also section \begin_inset CommandInset ref LatexCommand ref reference "sec:Scripting-HOWTO" \end_inset . \end_layout \end_inset should take only a few seconds. Otherwise, check for any network problems or any other problems. \end_layout \begin_layout Enumerate On host \family typewriter B \family default , wait until \family typewriter marsadm view mydata \family default (or \family typewriter view-diskstate \family default ) shows \family typewriter UpToDate \family default . It is possible to omit this step, but then you have no control on the duration of the handover, and in case of any transfer problems, disk space problems, etc you are potentially risking to produce a split brain (although \family typewriter marsadm \family default will do its best to avoid it). Doing the wait by yourself, \emph on before \emph default starting \family typewriter marsadm primary \family default , has a big advantage: you can abort the handover cycle at any time, just by re-mounting the device \family typewriter /dev/mars/mydata \family default at the old primary \family typewriter A \family default again, and by re-starting your application. Once you have started \family typewriter marsadm primary \family default on host \family typewriter B \family default , you might have to switch back, or possibly even via \family typewriter primary --force \family default (see sections \begin_inset CommandInset ref LatexCommand ref reference "sub:Forced-Switching" \end_inset and \begin_inset CommandInset ref LatexCommand ref reference "sub:Split-Brain-Resolution" \end_inset ). \end_layout \begin_layout Standard Switching the roles is very similar to DRBD: just issue the command \end_layout \begin_layout Itemize \family typewriter marsadm primary mydata \end_layout \begin_layout Standard on your formerly secondary node \family typewriter B \family default . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset The most important difference to DRBD: don't use an intermediate \family typewriter marsadm secondary mydata \family default anywhere. Although it would be possible, it has some \emph on disadvantages \emph default . Always switch \emph on directly \emph default ! \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset In contrast to DRBD, MARS remembers the designated primary, even when your system crashes and reboots. While in case of a crash you have to re-setup DRBD with commands like \family typewriter drbdadm up \begin_inset Formula $\ldots$ \end_inset ; drbdadm primary \begin_inset Formula $\ldots$ \end_inset \family default , MARS will automatically resume its former roles just by saying \family typewriter modprobe mars \family default . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Another fundamental difference to DRBD: when the network is healthy, there can only exist \emph on one \emph default designated primary at a time (modulo some communication delays caused by the \begin_inset Quotes eld \end_inset eventually consistent \begin_inset Quotes erd \end_inset communication model, see section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Lamport-Clock" \end_inset ). By saying \family typewriter marsadm primary mydata \family default on host \family typewriter B \family default , \series bold all other \series default hosts (including \family typewriter A \family default ) will \series bold automatically go into secondary role \series default after a while! \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset You simply \emph on don't need \emph default an intermediate \family typewriter marsadm secondary mydata \family default for planned handover! \end_layout \begin_layout Standard Precondition for \family typewriter marsadm primary \family default is that you are up, that means in attached and connected state (cf. \family typewriter marsadm up \family default ), and that any old primary (in this case \family typewriter A \family default ) does not use its \family typewriter /dev/mars/mydata \family default device any longer, and that the network is healthy. If some (parts of) logfiles are not yet (fully) transferred to the new primary, you will need enough space on \family typewriter /mars/ \family default at the target side. If one of the preconditions described in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Operation-of-the" \end_inset is violated, \family typewriter marsadm primary \family default may refuse to start. \end_layout \begin_layout Standard The preconditions try to protect you from doing silly things, such as accidental ly provoking a split brain error state. We try to avoid split brain as best as we can. Therefore, we distinguish between \emph on intended \emph default and \emph on emergeny \emph default switching. Intended switching will try to avoid split brain \emph on as best as it can \emph default . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Don't \emph on rely \emph default on split brain avoidance, in particular when scripting any higher-level applications such as cluster managers (cf. section \begin_inset CommandInset ref LatexCommand ref reference "sec:Scripting-HOWTO" \end_inset ). \family typewriter marsadm \family default does its best, but at least in case of (unnoticed) network outages / partitions (or \emph on extremely, really extremely \emph default slow / overloaded networks), an attempt to become \family typewriter UpToDate \family default may fail. If you want to \emph on ensure \emph default that no split brain can result from intended primary switching, please obey the the best practices from above, and please give the \family typewriter primary \family default command only after your secondary is \emph on known \begin_inset Foot status open \begin_layout Plain Layout As noted in many places in this manual, checking this cannot be done by looking at the local state of a single cluster node. You have to check several nodes. \family typewriter marsadm \family default can only check the \emph on local \emph default node reliably! \end_layout \end_inset \emph default to be \emph on really \emph default \family typewriter UpToDate \family default (see \family typewriter marsadm wait-cluster \family default and \family typewriter marsadm view \family default and other macros described in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Inspecting-the-State" \end_inset ). \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset A \emph on very rough \emph default estimation of the time to become \family typewriter UpToDate \family default is displayed by \family typewriter marsadm view mydata \family default or other macros (e.g. \family typewriter view-replinfo \family default ). However, on very flaky networks, the estimation may not only flicker much, but also be inaccurate. \end_layout \begin_layout Subsubsection Forced Switching \begin_inset CommandInset label LatexCommand label name "sub:Forced-Switching" \end_inset \end_layout \begin_layout Standard In case the connection to the old primary is lost for whatever reason, we just don't know anything about its \emph on current \emph default state (which may deviate from its \emph on last known \emph default state). The following command sequence will skip many checks and tell your node to become primary forcefully: \end_layout \begin_layout Itemize \family typewriter marsadm pause-fetch mydata \end_layout \begin_deeper \begin_layout Itemize \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset notice that this is similar to \family typewriter drbdadm disconnect mydata \family default as you are probably used from DRBD. For better compatibility with DRBD, you may use the alternate syntax \family typewriter marsadm disconnect mydata \family default instead. However, there is a subtle difference to DRBD: DRBD will drop \emph on both \emph default sides of its single bi-directional connection and no longer try to re-connect from any of both sides, while \family typewriter pause-fetch \family default is equivalent to \family typewriter pause-fetch-local \family default , which instructs only the \emph on local \emph default host to stop fetching logfiles. Other members of the cluster, including the former primary, are \emph on not \emph default instructed to do so. They may continue fetching logfiles over their own private TCP connections, potentially using many connections in parallel, and potentially even from any \emph on other \emph default member of the resource, if they think they can get the data from there. In order to instruct \begin_inset Foot status open \begin_layout Plain Layout Notice that not all such instructions may arrive at all sites when the network is interrupted (or extremely slow). \end_layout \end_inset \emph on all \emph default members of the resource to stop fetching logfiles, you may use \family typewriter marsadm pause-fetch-global mydata \family default instead (cf section \begin_inset CommandInset ref LatexCommand ref reference "sub:Operation-of-the" \end_inset ). \end_layout \end_deeper \begin_layout Itemize \family typewriter marsadm primary mydata --force \end_layout \begin_deeper \begin_layout Itemize \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset this is the forceful switchover. Use \family typewriter --force \family default only if you know what you are doing! \end_layout \end_deeper \begin_layout Itemize \family typewriter marsadm resume-fetch mydata \end_layout \begin_deeper \begin_layout Itemize As such, the new primary does not really need this, because primaries are producing their own logfiles without need for fetching. This is only to undo the previous \family typewriter pause-fetch \family default , in order to avoid future surprises when the new primary will somewhen change to secondary mode again (in the far-distant future), and you have forgotten to remember the fact that fetching had been switched off. \end_layout \end_deeper \begin_layout Standard When using \family typewriter --force \family default , many precondition checks and other internal checks are skipped, and in particular the internal handover protocol for split brain avoidance. \end_layout \begin_layout Standard Therefore, use of \family typewriter --force \family default is \emph on likely \emph default to \series bold provoke a split brain \series default . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset \series bold Split brain \series default is always an \series bold erroneous state \series default which should be never entered deliberately! Once you have entered it accidental ly, you \series bold must \series default resolve it ASAP (see section \begin_inset CommandInset ref LatexCommand ref reference "sub:Split-Brain-Resolution" \end_inset ), otherwise you cannot operate your resource in the long term. \end_layout \begin_layout Standard In order to impede you from giving an accidental \family typewriter --force \family default , the precondition is different: \family typewriter --force \family default works only in \emph on locally disconnected \emph default state. This is similar to DRBD. \end_layout \begin_layout Standard Remember: \family typewriter marsadm primary \family default without \family typewriter --force \family default tries to prevent split brain as best as it can. Use of the \family typewriter --force \family default option will almost \emph on certainly \emph default provoke a split brain, at least if the old primary continues to operate on its local \family typewriter /dev/mars/mydata \family default device. Therefore, you are \series bold strongly advised \series default to do this \series bold only \series default after \end_layout \begin_layout Enumerate \family typewriter marsadm primary \family default without \family typewriter --force \family default has failed \emph on for no good reason \emph default \begin_inset Foot status open \begin_layout Plain Layout Most reasons will be displayed by \family typewriter marsadm \family default when it is rejecting the switchover. \end_layout \end_inset , and \end_layout \begin_layout Enumerate You are sure you \emph on really \emph default want to switch, even when that eventually leads to a split brain. You also declare that you are willing to do \emph on manual \emph default split-brain resolution as described in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Split-Brain-Resolution" \end_inset , or even destruction / reconstruction of a damaged node as described in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Final-Destroy-of" \end_inset . \end_layout \begin_layout Standard \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Notice: in case of \emph on connection loss \emph default (e.g. networking problems / network partitions), you may not be able to reliably detect whether a split brain actually resulted, or not. \end_layout \begin_layout Paragraph Some Background \end_layout \begin_layout Standard In contrast to DRBD, split brain situations are handled differently by MARS . When two primaries are accidentally active at the same time, each of them writes into different logfiles \family typewriter /mars/resource-mydata/log-000000001-A \family default and \family typewriter /mars/resource-mydata/log-000000001-B \family default where the \emph on origin \emph default host is always recorded in the filename. Therefore, both nodes \emph on can theoretically \emph default run in primary mode independently from each other, at least for some time. They \emph on might \emph default even \family typewriter log-rotate \family default independently from each other. However, this is really no good idea. The replication to third nodes will likely get stuck, and your \family typewriter /mars/ \family default filesystem(s) will eventually run out of space. Any further secondary node (when having \begin_inset Formula $k>2$ \end_inset replicas) will certainly get into serious problems: it simply does not know which split-brain version it should follow. Therefore, you will certainly loose the actuality of your redundancy. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset \family typewriter marsadm secondary \family default is \emph on strongly discouraged \emph default . It tells the whole cluster that \emph on nobody \emph default is designated as primary any more. \emph on All \emph default nodes should go into secondary mode, globally. In the current version of MARS, the secondaries will no long fetch any logfiles, since they don't know which version is the \begin_inset Quotes eld \end_inset right \begin_inset Quotes erd \end_inset one. Syncing is also not possible. When the device \family typewriter /dev/mars/mydata \family default is in use somewhere, it will remain in \emph on actual \emph default primary mode during that time. As soon as the local \family typewriter /dev/mars/mydata \family default is released, the node will \emph on actually \emph default go into secondary mode if it is no longer designated as primary. You should avoid it in advance by always \emph on directly \emph default switching over from one primary to another one, without intermediate \family typewriter secondary \family default command. This is different from DRBD. \end_layout \begin_layout Standard \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Split brain situations are detected \emph on passively \emph default by secondaries. Whenever a secondary detects that somewhere a split brain has happend, it refuses to replay any logfiles behind the split point (and also to fetch them when possible), or anywhere where something appears suspect or ambiguous. This tries to keep its local disk state always being consistent, but outdated with respect to any of the split brain versions. As a consequence, becoming primary may be impossible, because it cannot always know which logfiles are the correct ones to replay before \family typewriter /dev/mars/mydata \family default can appear. The ambiguity must be resolved first. \end_layout \begin_layout Standard \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset If you \emph on really \emph default need the local device \family typewriter /dev/mars/mydata \family default to disappear \emph on everywhere \emph default in a split brain situation, you don't need a \emph on strongly discouraged \emph default \family typewriter marsadm secondary \family default command for this. \family typewriter marsadm detach \family default or \family typewriter marsadm down \family default can do it also, without destroying knowledge about the former designated primary. \end_layout \begin_layout Subsection Split Brain Resolution \begin_inset CommandInset label LatexCommand label name "sub:Split-Brain-Resolution" \end_inset \end_layout \begin_layout Standard Split brain can naturally occur during a long-lasting network outage (aka network partition) when you (forcefully) switch primaries inbetween, or due to final loss of your old primary node (fatal node crash) when not all logfile data had been transferred immediately before the final crash. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset Remember that split brain is an \series bold erroneous state \series default which must be resolved as soon as possible! \end_layout \begin_layout Standard Whenever split brain occurs for whatever reason, you have two choices for resolution: either destroy one of your versions, or retain it under a different resource name. \end_layout \begin_layout Standard In any of both cases, do the following steps ASAP: \end_layout \begin_layout Enumerate \series bold Manually \series default check which (surviving) version is the \begin_inset Quotes eld \end_inset right \begin_inset Quotes erd \end_inset one. Any error is up to you: destroying the wrong version is \emph on your \emph default fault, not the fault of MARS. \end_layout \begin_layout Enumerate If you did not already switch your primary to the final destination determined in the previous step, do it now (see description in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Forced-Switching" \end_inset ). Don't use an intermediate \family typewriter marsadm secondary \family default command (as known from DRBD): \emph on directly \emph default switch to the new designated primary! \end_layout \begin_layout Enumerate On each non-right version (which you don't want to retain) which had been primary before, umount your \family typewriter /dev/mars/mydata \family default or otherwise stop using it (e.g. stop iSCSI or other users of the device). Wait until each of them has actually left primary state and until their local logfile(s) have been fully written back to the underlying disk. \end_layout \begin_layout Enumerate Wait until the network works again. All your (surviving) cluster nodes \emph on must \emph default \begin_inset Foot status open \begin_layout Plain Layout If you are a MARS expert and you really know what you are doing (in particular, you can anticipate the effects of the Lamport clock and of the symlink update protocol including the \begin_inset Quotes eld \end_inset eventually consistent \begin_inset Quotes erd \end_inset behaviour including the not-yet-consistent intermediate states, see sections \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Lamport-Clock" \end_inset and \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Symlink-Tree" \end_inset ), you may deviate from this requirement. \end_layout \end_inset be able to communicate with each other. If that is not possible, or if it takes too long, you may fall back to the method described in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Final-Destroy-of" \end_inset , but do this only as far as necessary. \end_layout \begin_layout Standard The next steps are different for different use cases: \end_layout \begin_layout Paragraph Destroying a Wrong Split Brain Version \end_layout \begin_layout Standard Continue with the following steps, each on those cluster node(s) where you do not want to retain its split-brain version. In preference, start with the old \begin_inset Quotes eld \end_inset wrong \begin_inset Quotes erd \end_inset primaries first (see advice at the end of this section): \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash begin{enumerate} \backslash setcounter{enumi}{4} \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash item \end_layout \end_inset \family typewriter marsadm invalidate mydata \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash end{enumerate} \end_layout \end_inset \end_layout \begin_layout Standard \noindent When no split brain is reported anymore after that (via \family typewriter marsadm view all \family default ), you are done. You need to repeat this on other secondaries only when necessary. \end_layout \begin_layout Standard In very rare cases when things are screwed up very heavily (e.g. a partly destroyed \family typewriter /mars/ \family default partition), you may try an alternate method described in appendix \begin_inset CommandInset ref LatexCommand ref reference "chap:Alternative-Methods-for" \end_inset . \end_layout \begin_layout Paragraph Keeping a Split Brain Version \end_layout \begin_layout Standard On those cluster node(s) where you want to retain the version (e.g. for inspection purposes): \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash begin{enumerate} \backslash setcounter{enumi}{4} \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash item \end_layout \end_inset \family typewriter marsadm leave-resource mydata \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash item \end_layout \end_inset After having done this on \emph on all \emph default those cluster nodes, check that the split brain is gone (e.g. by saying \family typewriter marsadm view mydata \family default ), as documented above. In very rare cases, you might also need a \family typewriter log-purge-all \family default (see page \begin_inset CommandInset ref LatexCommand pageref reference "log-purge-all$res" \end_inset ). \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash item \end_layout \end_inset Check that each underlying local disk \family typewriter /dev/lv-x/mydata \family default is really usable afterwards, e.g. by test-mounting it (or \family typewriter fsck \family default if you can afford it). If all is OK, don't forget to umount it before proceeding with the next step. \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash item \end_layout \end_inset Create a completely new MARS resource out of the underlying disk \family typewriter /dev/lv-x/mydata \family default having a different name, such as \family typewriter mynewdata \family default (see description in section \begin_inset CommandInset ref LatexCommand vref reference "sec:Creating-and-Maintaining" \end_inset ). \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash end{enumerate} \end_layout \end_inset \end_layout \begin_layout Paragraph Keeping a Good Version \end_layout \begin_layout Standard When you had a secondary which did not participate in the split brain, but just got confused and therefore stopped replaying logfiles immediately before the split-brain point, it may very well happen \begin_inset Foot status open \begin_layout Plain Layout In general, such a \begin_inset Quotes eld \end_inset good \begin_inset Quotes erd \end_inset behaviour cannot be guaranteed for all secondaries. Race conditions in complex networks may asynchronously transfer \begin_inset Quotes eld \end_inset wrong \begin_inset Quotes erd \end_inset logfile data to a secondary much earlier than conflicting \begin_inset Quotes eld \end_inset good \begin_inset Quotes erd \end_inset logfile data which will be marked \begin_inset Quotes eld \end_inset good \begin_inset Quotes erd \end_inset only in the \emph on future. \emph default It is impossible to predict this in advance. \end_layout \end_inset that you don't need to do any action for it. When all wrong versions have disappeared from the cluster (by \family typewriter invalidate \family default or \family typewriter leave-resource \family default as described before), the confusion should be over, and the secondary should automatically resume tracking of the new unique version. \end_layout \begin_layout Standard Please check that \emph on all \emph default of your secondaries are no longer stuck. You need to execute split brain resolution only for \emph on stuck \emph default nodes. \end_layout \begin_layout Standard \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Hint / advice for \begin_inset Formula $k>2$ \end_inset replicas: it is a good idea to start split brain resolution \emph on first \emph default with those (few) nodes which had been (accidentally) primary before, but are not the new designated primary. Usually, you had 2 primaries during split brain, so this will apply only to \emph on one \emph default of them. Leave the other one intact, by not umounting \family typewriter /dev/mars/mydata \family default at all, and keeping your applications running. Even during emergency mode, see section \begin_inset CommandInset ref LatexCommand ref reference "sub:Emergency-Mode" \end_inset . \emph on First \emph default resolve the problem of the \begin_inset Quotes eld \end_inset wrong \begin_inset Quotes erd \end_inset primary(s) via \family typewriter invalidate \family default or \family typewriter leave-resource \family default . Wait for a short while. Then check the rest of your secondaries, whether they now are already following the new (unique) primary, and finally check whether the split brain warning reported by \family typewriter marsadm view all \family default is gone everywhere. This way, you can often skip unnecessary invalidations of replicas. \end_layout \begin_layout Subsection Final Destruction of a Damaged Node \begin_inset CommandInset label LatexCommand label name "sub:Final-Destroy-of" \end_inset \end_layout \begin_layout Standard When a node has eventually died, do the following steps ASAP: \end_layout \begin_layout Enumerate \emph on Physically \emph default remove the dead node from your network. Unplug all network cables! Failing to do so might provoke a disaster in case it somehow resurrects in an uncontrolled manner, such as a partly-damaged \family typewriter /mars/ \family default filesystem, a half-defective kernel, RAM / kernel memory corruption, disk corruption, or whatever. Don't risk any such unpredictable behaviour! \end_layout \begin_layout Enumerate \series bold Manually \series default check which of the surviving versions will be the \begin_inset Quotes eld \end_inset right \begin_inset Quotes erd \end_inset one. Any error is up to you: resurrecting an unnecessarily old / outdated version and/or destroying the newest / best version is \emph on your \emph default fault, not the fault of MARS. \end_layout \begin_layout Enumerate If you did not already switch your primary to the final destination determined in the previous step, do it now (see description in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Forced-Switching" \end_inset ). \end_layout \begin_layout Enumerate On a surviving node, but preferably \emph on not \emph default the new designated primary, give the following commands: \end_layout \begin_deeper \begin_layout Enumerate \family typewriter marsadm --host=your-damaged-host down mydata \end_layout \begin_layout Enumerate \family typewriter marsadm --host=your-damaged-host leave-resource mydata \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset Check for misspellings, in particular the hostname of the dead node, and check the command syntax before typing return! Otherwise, you may forcefully destroy the wrong node! \end_layout \end_deeper \begin_layout Enumerate In case any of the previous commands should fail (which is rather likely), repeat it with an additional \family typewriter --force \family default option. Don't use \family typewriter --force \family default in the first place, alway try first without it! \end_layout \begin_layout Enumerate Repeat the same with \emph on all \emph default resources which were formerly present at \family typewriter your-damaged-host \family default . \end_layout \begin_layout Enumerate Finally, say \family typewriter marsadm --host=your-damaged-host leave-cluster \family default (optionally augmented with \family typewriter --force \family default ). \end_layout \begin_layout Standard Now your surviving nodes should \emph on believe \emph default that the old node \family typewriter your-damaged-host \family default does no longer exist, and that it does no longer participate in any resource. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset Even if your dead node comes to life again in some way: always ensure that the mars kernel module cannot run any more. \emph on Never \emph default do a \family typewriter modprobe mars \family default on a node marked as dead this way! \end_layout \begin_layout Standard Further instructions for complicated cases are in appendix \begin_inset CommandInset ref LatexCommand ref reference "chap:Alternative-De--and" \end_inset and \begin_inset CommandInset ref LatexCommand ref reference "sub:Cleanup-in-case" \end_inset . \end_layout \begin_layout Subsection Online Resizing during Operation \end_layout \begin_layout Standard You should have LVM or some other means of increasing the physical size of your disk (e.g. via firmware of some RAID controllers). The network must be healthy. Do the following steps: \end_layout \begin_layout Enumerate Increase your local disks (usually \family typewriter /dev/vg/mydata \family default ) \emph on everywhere \emph default in the whole cluster. In order to avoid wasting space, increase them \emph on uniformly \emph default to the same size (when possible). The \family typewriter lvresize \family default tool is documented elsewhere. \end_layout \begin_layout Enumerate Check that all MARS switches are on. If not, say \family typewriter marsadm up mydata \family default everywhere. \end_layout \begin_layout Enumerate At the primary: \family typewriter marsadm resize mydata \end_layout \begin_layout Enumerate If you have intermediate layers such as iSCSI, you may need some \family typewriter iscsiadm \family default update or other command. \end_layout \begin_layout Enumerate Now you may increase your filesystem. This is specific for the filesystem type and documented elsewhere. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Hint: the secondaries will start syncing the increased new part of the underlyin g primary disk. In many cases, this is not really needed, because the new junk data just does not care. If you are sure and if you know what you are doing, you may use \family typewriter marsadm fake-sync mydata \family default to abort such unnecessary traffic. \end_layout \begin_layout Section The State of MARS \begin_inset CommandInset label LatexCommand label name "sec:The-State-of" \end_inset \end_layout \begin_layout Standard In general, MARS tries to \emph on hide \emph default any network failures from you as best as it can. After a network problem, any internal low-level socket connections are \emph on transparently \emph default tried to re-open ASAP, without need for sysadmin intervention. In difference to DRBD, network failures will \emph on not \emph default automatically alter the state of MARS, such as switching to \family typewriter disconnected \family default after a \family typewriter ko_timeout \family default or similar. From a high-level sysadmin viewpoint, communication may just take a very long time to succeed. \end_layout \begin_layout Standard When the behaviour of MARS is different from DRBD, it is usually intended as a feature. \end_layout \begin_layout Standard MARS is not only an \series bold asynchronous \series default system at block IO level, but also \series bold at control level \series default . \end_layout \begin_layout Standard This is \emph on necessary \emph default because in a widely distributed long-distance system running on slow or even temporarily failing networks, actions may take a long time, and there may be many actions \series bold started in parallel \series default . \end_layout \begin_layout Standard \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Synchronous concepts are generally not sufficient for expressing that. Because of inherent asynchronicity and of dynamic creation / joining of resources, it is neither possible to comprehensively depict a complex distribut ed MARS system, nor a comprehensive standalone snippet of MARS, as a finite state transition diagram \begin_inset Foot status open \begin_layout Plain Layout Probably it could be possible to formally model MARS as a Petri net. However, complete Petri nets are tending to become very conplex, and to describe lots of low-level details. Expressing hierarchy, in a top-down fashion, is cumbersome. We find no clue in trying to do so. \end_layout \end_inset . \end_layout \begin_layout Standard Although MARS tries to \emph on approximate \emph default / \emph on emulate \emph default the synchronous control behaviour of DRBD at the interface level ( \family typewriter marsadm \family default ) in many situations as best as it can, the \emph on internal \emph default control model is necessarily asynchronous. As an experiencend sysadmin, you will be curious how it works in principle. When you know something about it, you will no longer be surprised when some (detail) behaviour is different from DRBD. \end_layout \begin_layout Standard The general principle is an asynchronous 2-edge handshake protocol, which is used almost everywhere in MARS: \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/handshake.fig width 80col% \end_inset \end_layout \begin_layout Standard We have a binary todo switch, which can be either in state \begin_inset Quotes eld \end_inset on \begin_inset Quotes erd \end_inset or \begin_inset Quotes eld \end_inset off \begin_inset Quotes erd \end_inset . In addition, we have an actual response indicator, which is similar to an LED indicating the actual status. In our example, we imagine that both are used for controlling a big ventilator, having a huge inert mass. Imagine a big machine from a power plant, which is as tall as a human. \end_layout \begin_layout Standard We start in a situation where the binary switch is off, and the ventilator is stopped. At point 1, we turn on the switch. At that moment, a big contactor will sound like \begin_inset Quotes eld \end_inset zonggg \begin_inset Quotes erd \end_inset , and a big motor will start to hum. At first you won't hear anything else. It will take a while, say 1 minute, until the big wheel will have reached its final operating RPM, due to the huge inert mass. During that spin-up, the lights in your room will become slightly darker. When having reached the full RPM at point 2, your workplace will then be noisier, but in exchange your room lights will be back at ordinary strength, and the actual response LED will start to lit in order to indicate that the big fan is now operational. \end_layout \begin_layout Standard Assume we want to turn the system off. When turning the todo switch to \begin_inset Quotes eld \end_inset off \begin_inset Quotes erd \end_inset at point 3, first nothing will seem to happen at all. The big wheel will keep spinning due to its heavy inert mass, and the RPM as well as the sound will go down only slowly. During spin-down, the actual response LED will stay illuminated, in order to warn you that you should not touch the wheel, otherwise you may get injuried \begin_inset Foot status open \begin_layout Plain Layout Notice that it is only safe to access the wheel when \emph on both \emph default the switch and the LED are off. Conversely, if at least one of them is on, something is going on inside the machine. Transferred to MARS: always look at \emph on both \emph default the todo switch and the correponding actual indicator in order to not miss something. \end_layout \end_inset . The LED will only go off after, say, 2 minutes, when the wheel has actually stopped at point 4. After that, the cycle may potentially start over again. \end_layout \begin_layout Standard As you can see, all four possible cartesian product combinations between two boolean values are occurring in the diagram. \end_layout \begin_layout Standard The same handshake protocol is used in MARS for communication between userspace and kernelspace, as well as for communication in the widely distributed system. \end_layout \begin_layout Section Inspecting the State of MARS \begin_inset CommandInset label LatexCommand label name "sec:Inspecting-the-State" \end_inset \end_layout \begin_layout Standard The main command for viewing the current state of MARS is \end_layout \begin_layout Itemize \family typewriter marsadm view mydata \end_layout \begin_layout Standard or its more specialized variant \end_layout \begin_layout Itemize \family typewriter marsadm view- \emph on $macroname \emph default mydata \end_layout \begin_layout Standard where \family typewriter \emph on $macroname \family default \emph default is one of the macros described in chapter \begin_inset CommandInset ref LatexCommand ref reference "chap:The-Macro-Processor" \end_inset , or a macro which has been written by yourself. \end_layout \begin_layout Standard As always, you may replace the resource name \family typewriter mydata \family default with the special keyword \family typewriter all \family default in order to get the state of all locally joined resources, as well as a list of all those resources. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset When using the variant \family typewriter marsadm view all \family default , additionally the global communication status will be displayed. This helps humans in diagnosing problems. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Hint: use the compound command \family typewriter watch marsadm view all \family default for continuous display of the current state of MARS. When starting this side-by-side in \family typewriter ssh \family default terminal windows for all your cluster nodes, you can easily watch what's going on in the whole cluster. \end_layout \begin_layout Chapter Basic Working Principle \end_layout \begin_layout Standard Even if you are impatient, please read this chapter. At the \emph on surface \emph default , MARS appears to be very similar to DRBD. It looks like almost being a drop-in replacement for DRBD. \end_layout \begin_layout Standard When taking this naïvely, you could easily step into some trivial pitfalls, because the internal working principle of MARS is totally different from DRBD. Please forget (almost) anything you already know about the internal working principles of DRBD, and look at the very different working principles of MARS. \end_layout \begin_layout Section The Transaction Logger \begin_inset CommandInset label LatexCommand label name "sec:The-Transaction-Logger" \end_inset \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/MARS_Data_Flow.pdf lyxscale 60 width 100text% \end_inset \end_layout \begin_layout Standard \noindent The basic idea of MARS is to record all changes made to your block device in a so-called \series bold transaction logfile \series default . \emph on Any \emph default write reqeuest is treated like a transaction which changes the contents of your block device. \end_layout \begin_layout Standard This is similar in concept to some database systems, but there exists no separate \begin_inset Quotes eld \end_inset commit \begin_inset Quotes erd \end_inset operation: \emph on any \emph default write request is acting like a commit. \end_layout \begin_layout Standard The picture shows the flow of write requests. Let's start with the primary node. \end_layout \begin_layout Standard Upon submission of a write request on \family typewriter /dev/mars/mydata \family default , it is first buffered in a \emph on temporary \emph default memory buffer. \end_layout \begin_layout Standard The temporary memory buffer serves multiple purposes: \end_layout \begin_layout Itemize It keeps track of the order of write operations. \end_layout \begin_layout Itemize Additionally, it keeps track of the positions in the underlying disk \family typewriter /dev/lv-x/mydata \family default . In particular, it detects when the same block is overwritten multiple times. \end_layout \begin_layout Itemize During pending write operation, any concurrent reads are served from the memory buffer. \end_layout \begin_layout Standard After the write has been buffered in the temporary memory buffer, the main logger thread of the transaction logger creates a so-called \emph on log entry \emph default and starts an \begin_inset Quotes eld \end_inset append \begin_inset Quotes erd \end_inset operation on the transaction logfile. The log entry contains vital information such as the logical block number in the underlying disk, the length of the data, a timestamp, some header magic in order to detect corruption, the log entry sequence number, of course the data itself, and optional information like a checksum or compression information. \end_layout \begin_layout Standard Once the log entry has been written through to the \family typewriter /mars/ \family default filesystem via fsync(), the application waiting for the write operation at \family typewriter /dev/mars/mydata \family default is signalled that the write was successful. \end_layout \begin_layout Standard This may happen even \emph on before \emph default the writeback to the underlying disk \family typewriter /dev/lv-x/mydata \family default has started. Even when you power off the system right now, the information is not lost: it is present in the logfile, and can be reconstructed from there. \end_layout \begin_layout Standard Notice that the order of log records present in the transaction log defines a total order among the write requests which is \emph on compatible \emph default to the partial order of write requests issued on \family typewriter /dev/mars/mydata \family default . \end_layout \begin_layout Standard Also notice that despite its sequential nature, the transaction logfile is typically \emph on not \emph default the performance bottleneck of the system: since appending to a logfile is almost purely sequential IO, it runs much faster than random IO on typical datacenter workloads. \end_layout \begin_layout Standard In order to reclaim the temporary memory buffer, its content must be written back to the underlying disk \family typewriter /dev/lv-x/mydat \family default a somewhen. After writeback, the temporary space is freed. The writeback can do the following optimizations: \end_layout \begin_layout Enumerate writeback may be in \emph on any \emph default order; in particular, it may be \emph on sorted \emph default according to ascending sector ´numbers. This will reduce the average seek distances of magnetic disks in general. \end_layout \begin_layout Enumerate when the same sector is overwritten multiple times, only the \begin_inset Quotes eld \end_inset last \begin_inset Quotes erd \end_inset version need to be written back, skipping some intermediate versions. \end_layout \begin_layout Standard In case the primary node crashes during writeback, it suffices to replay the log entries from some point in the past until the end of the transaction logfile. It does no harm if you accidentally replay some log entries twice or even more often: since the replay is in the original total order, any temporary inconsistency is \emph on healed \emph default by the logfile application. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset In mathematics, the property that you can apply your logfile twice to your data (or even as often as you want), is called \series bold idempotence \series default . This is a very desirable property: it ensures that nothing goes wrong when replaying \begin_inset Quotes eld \end_inset too much \begin_inset Quotes erd \end_inset / starting your replay \begin_inset Quotes eld \end_inset too early \begin_inset Quotes erd \end_inset . Idempotence is even more beneficial: in case anything should go wrong with your data on your disk (e.g. IO errors), replaying your logfile once more often may \begin_inset Foot status open \begin_layout Plain Layout Miracles cannot be guaranteed, but \emph on higher chances \emph default and \emph on improvements \emph default can be expected (e.g. better chances for \family typewriter fsck \family default ). \end_layout \end_inset even \series bold heal \series default some defects. Good news for desperate sysadmins forced to work with flaky hardware! \end_layout \begin_layout Standard The basic idea of the asynchronous replication of MARS is rather simple: just transfer the logfiles to your secondary nodes, and replay them onto their copy of the disk data (also called \emph on mirror \emph default ) in the same order as the total order defined by the primary. \end_layout \begin_layout Standard Therefore, a mirror of your data on any secondary may be outdated, but it always corresponds to some version which was valid in the past. This property is called \series bold anytime consistency \begin_inset Foot status open \begin_layout Plain Layout Your secondary nodes are always consistent in themselves. Notice that this kind of consistency is a \emph on local \emph default consistency model. There exists no global consistency in MARS. Global consistency would be practically impossible in long-distance replication where Einstein's law of the speed of light is limiting global consistency. The front-cover pictures showing the planets Earth and Mars tries to lead your imagination away from global consistency models as used in \begin_inset Quotes eld \end_inset DRBD Think(tm) \begin_inset Quotes erd \end_inset , and try to prepare you mentally for local consistency as in \begin_inset Quotes eld \end_inset MARS Think(tm) \begin_inset Quotes erd \end_inset . \end_layout \end_inset . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset As you can see in the picture, the process of transfering the logfiles is \emph on independent \emph default from the process which replays the logfiles onto the data at some secondary site. Both processes can be switched on / off separately (see commands \family typewriter marsadm {dis,}connect \family default and \family typewriter marsadm {pause,resume}-replay \family default in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Operation-of-the" \end_inset ). This may be \emph on exploited \emph default : for example, you may replicate your logfiles as soon as possible (to protect against catastrophic failures), but deliberately wait one hour until it is replayed (under regular circumstances). If your data inside your filesystem \family typewriter /mydata/ \family default at the primary site is accidentally destroyed by \family typewriter rm -rf /mydata/ \family default , you have an old copy at the secondary site. This way, you can substitute \emph on some parts \begin_inset Foot status open \begin_layout Plain Layout Please note that MARS cannot \emph on fully \emph default substitute a backup system, because it can keep only \emph on physical \emph default copies, and does not create logical copies. \end_layout \end_inset \emph default of conventional backup functionality by MARS. In case you need the actual version, just replay in \begin_inset Quotes eld \end_inset fast-forward \begin_inset Quotes erd \end_inset mode (similar to old-fashioned video tapes). \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Future versions of MARS Full are planned to also allow \begin_inset Quotes eld \end_inset fast-backward \begin_inset Quotes erd \end_inset rewinding, of course at some cost. \end_layout \begin_layout Section The Lamport Clock \begin_inset CommandInset label LatexCommand label name "sec:The-Lamport-Clock" \end_inset \end_layout \begin_layout Standard MARS is always \emph on asynchonously \emph default communicating in the distributed system on \emph on any \emph default topics, even strategic decisions. \end_layout \begin_layout Standard If there were a \emph on strict \emph default global consistency model, which would be roughly equivalent to a standalone model, we would need \emph on locking \emph default in order to serialize conflicting requests. It is known for many decades that \emph on distributed locks \emph default do not only suffer from performance problems, but they are also cumbersome to get them working reliably in scenarios where nodes or network links may fail at any time. \end_layout \begin_layout Standard Therefore, MARS uses a very different consistency model: \series bold Eventually Consistent \series default . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Notice that the network bottleneck problems described in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Network-Bottlenecks" \end_inset are \emph on demanding \emph default an \begin_inset Quotes eld \end_inset eventually consistent \begin_inset Quotes erd \end_inset model. You have \series bold no chance \series default against natural laws, like Einstein's laws. In order to cope with the problem area, you have to \emph on invest some additional effort \emph default . Unfortunately, asynchronous communication models are more tricky to program and to debug than simple strictly consistent models. In particular, you \emph on have to cope with \emph default additional \series bold race conditions \series default \emph on inherent \emph default \emph on to \emph default the \begin_inset Quotes eld \end_inset eventually consistent \begin_inset Quotes erd \end_inset model. In the face of the laws of the universe, motivate yourself by looking at the graphics at the cover page: the planets are a \emph on symbol \emph default for what you have to do! \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Example: the asynchronous communication protocol of MARS leads to a different behaviour from DRBD in case of \series bold network partitions \series default (temporary interruption of communication between some cluster nodes), because MARS \emph on remembers \emph default the old state of remote nodes over long periods of time, while DRBD knows absolutely nothing about its peers in disconnected state. Sysadmins familiar with DRBD might find the following behaviour unusual: \end_layout \begin_layout Standard \noindent \align center \size tiny \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size tiny Event \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny DRBD Behaviour \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny MARS Behaviour \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 1. the network partitions \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny automatic disconnect \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny nothing happens, but replication lags behind \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 2. on A: \family typewriter umount $device \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny works \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny works \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 3. on A: \family typewriter {drbd,mars}adm secondary \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny works \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny works \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 4. on B: \family typewriter {drbd,mars}adm primary \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny works, split brain happens \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold \size tiny refused \series default because B believes that A is primary \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 5. the network resumes \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny automatic connect attempt fails \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny communication automatically resumes \end_layout \end_inset \end_inset \end_layout \begin_layout Standard \noindent If you intentionally want to switch over (and to produce a split brain as a side effect), the following variant must be used with MARS: \end_layout \begin_layout Standard \noindent \align center \size tiny \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size tiny Event \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny DRBD Behaviour \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny MARS Behaviour \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 1. the network partitions \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny automatic disconnect \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny nothing happens, but replication lags behind \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 2. on A: \family typewriter umount $device \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny works \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny works \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 3. on A: \family typewriter {drbd,mars}adm secondary \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny works \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny works (but \emph on not remmonended! \emph default ) \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 4. on B: \family typewriter {drbd,mars}adm primary \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny split brain, but nobody knows \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \series bold \size tiny refused \series default because B believes that A is primary \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 5. on B: \family typewriter marsadm disconnect \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny works, nothing happens \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 6. on B: \family typewriter marsadm primary --force \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny works, split brain happens on B, but A doesn't know \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 7. on B: \family typewriter marsadm connect \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny - \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny works, nothing happens \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny 8. the network resumes \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny automatic connect attempt fails \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size tiny communication resumes, A now detects the split brain \end_layout \end_inset \end_inset \end_layout \begin_layout Standard \noindent In order to implement the consistency model \begin_inset Quotes eld \end_inset eventually consistent \begin_inset Quotes erd \end_inset , MARS uses a so-called Lamport \begin_inset Foot status open \begin_layout Plain Layout Published in the late 1970s by Leslie Lamport, also known as inventor of \begin_inset ERT status open \begin_layout Plain Layout \backslash LaTeX \end_layout \end_inset . \end_layout \end_inset clock. MARS uses a special variant called \begin_inset Quotes eld \end_inset physical Lamport clock \begin_inset Quotes erd \end_inset . \end_layout \begin_layout Standard The physical Lamport clock is another almost-realtime clock which \emph on can \emph default run independently from the Linux kernel system clock. However, the Lamport clock tries to remain as near as possible to the system clock. \end_layout \begin_layout Standard Both clocks can be queried at any time via \family typewriter cat /proc/sys/mars/lamport_clock \family default . The result will show both clocks in parallel, in units of seconds since the Unix epoch, with nanosecond resolution. \end_layout \begin_layout Standard When there are no network messages at all, both the system clock and the Lamport clock will show almost the same time (except some minor differences of a few nanoseconds resulting from the finite processor clock speed). \end_layout \begin_layout Standard The physical Lamport clock works rather simple: \emph on any \emph default message on the network is augmented with a Lamport time stamp telling when the message was \emph on sent \emph default according to the local Lamport clock of the sender. Whenever that message is received by some receiver, it checks whether the time ordering relation would be violated: whenever the Lamport timestamp in the message would claim that the sender had sent it \emph on after \emph default it arrived at the receiver (according to drifts in their respective local clocks), something must be wrong. In this case, the local Lamport clock of the \emph on receiver \emph default is advanced shortly after the sender Lamport timestamp, such that the time ordering relation is no longer violated. \end_layout \begin_layout Standard As a consequence, any local Lamport clock may precede the corresponding local system clock. In order to avoid accumulation of deltas between the Lamport and the system clock, the Lamport clock will run slower after that, possibly until it reaches the system clock again (if no other message arrives which sets it forward again). After having reached the system clock, the Lamport clock will continue with \begin_inset Quotes eld \end_inset normal \begin_inset Quotes erd \end_inset speed. \end_layout \begin_layout Standard MARS uses the local Lamport clock for anything where other systems would use the local system clock: for example, timestamp generation in the \family typewriter /mars/ \family default filesystem. Even symlinks created there are timestamped according to the Lamport clock. Both the kernel module and the userspace tool \family typewriter marsadm \family default are always operating in the timescale of the Lamport clock. Most importantly, all timestamp comparisons are always carried out with respect to Lamport time. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Bigger differences between the Lamport and the system clock can be annoying from a human point of view: when typing \family typewriter ls -l /mars/resource-mydata/ \family default many timestamps may appear as if they were created in the \begin_inset Quotes eld \end_inset future \begin_inset Quotes erd \end_inset , because the \family typewriter ls \family default command compares the output formatting against the system clock (it does not even know of the existence of the MARS Lamport clock). \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset Always use \family typewriter ntp \family default (or another clock synchronization service) in order to pre-synchronize your system clocks as close as possible. Bigger differences are not only annoying, but may lead some people to wrong conclusions and therefore even lead to bad human decisions! \end_layout \begin_layout Standard In a professional datacenter, you should use \family typewriter ntp \family default anyway, and you should monitor its effectiveness anyway. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Hint: many internal logfiles produced by the MARS kernel module contain Lamport timestamps written as numerical values. In order to convert them into human-readable form, use the command \family typewriter marsadm cat /mars/5.total.status \family default or similar. \end_layout \begin_layout Section The Symlink Tree \begin_inset CommandInset label LatexCommand label name "sec:The-Symlink-Tree" \end_inset \end_layout \begin_layout Standard \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset The symlink tree as described here will be replaced by another representation in future versions of MARS. Therefore, don't do any scripting by directly accessing symlinks! Use the primitive macros described in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Predefined-Trivial-Macros" \end_inset . \end_layout \begin_layout Standard The current \family typewriter /mars/ \family default filesystem container format contains not only transaction logfiles, but also acts as a generic storage for (persistent) state information. Both configuration information and runtime state information are currently stored in symlinks. Symlinks are \begin_inset Quotes eld \end_inset misused \begin_inset Foot status open \begin_layout Plain Layout This means, the symlink targets need not be other files or directories, but just any values like integers or strings. \end_layout \end_inset \begin_inset Quotes erd \end_inset in order to represent some \family typewriter key -> value \family default pairs. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset It is not yet clear / decided, but there is a \emph on chance \emph default that the \emph on concept \emph default of \family typewriter key -> value \family default pairs will be retained in future versions of MARS. Instead of being represented by symlinks, another representation will be used, such that hopefully the \family typewriter key \family default part will remain in the form of a pathname, even if there were no longer a physical representation in an actual filesystem. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset A fundamentally different behaviour than DRBD: when your DRBD primary crashed some time ago, and now comes up again, you have to setup DRBD again by a sequence of commands like \family typewriter modprobe drbd; drbdadm up all; drbdadm primary all \family default or similar. In contrast, MARS needs only \family typewriter modprobe mars \family default (after \family typewriter /mars/ \family default has been mounted by \family typewriter /etc/fstab \family default ). The \emph on persistence \emph default of the symlinks residing in \family typewriter /mars/ \family default will automatically remember your previous state, even if some your resources were primary while others were secondary (mixed operations). You don't need to do any actions in order to \begin_inset Quotes eld \end_inset restore \begin_inset Quotes erd \end_inset a previous state, no matter how \begin_inset Quotes eld \end_inset complex \begin_inset Quotes erd \end_inset it was. \end_layout \begin_layout Standard (Almost) all symlinks appearing in the \family typewriter /mars/ \family default directory tree are automatically replicated thoughout the whole cluster, provided that the cluster \family typewriter uuid \family default s are equal \begin_inset Foot status open \begin_layout Plain Layout This is protection against accidental \begin_inset Quotes eld \end_inset merging \begin_inset Quotes erd \end_inset of two unrelated clusters which had been created at different times with different \family typewriter uuids \family default . \end_layout \end_inset at all sites. Thus the \family typewriter /mars/ \family default directory forms some kind of \emph on global namespace \emph default . \end_layout \begin_layout Standard In order to avoid name clashes, each pathname created at node A follows a convention: the node name A should be a suffix of the pathname. Typically, internal MARS names follow the scheme \family typewriter /mars/ \emph on something \emph default /myname-A \family default . When using the expert command \family typewriter marsadm {get,set}-link \family default (which will likely be replaced by something else in future MARS releases), you should follow the best practice of systematically using pathnames like \family typewriter /mars/userspace/myname-A \family default or similar. As a result, each node will automatically get informed about the state at any other node, like B when the corresponding information is recorded on node B under the name \family typewriter /mars/userspace/myname-B \family default (context-dependent names). \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Experts only: the symlink replication works generically. You might use the \family typewriter /mars/userspace/ \family default directory in order to place your own symlink there (for whatever purpose, which need not have to do with MARS). However, the symlinks are likely to disappear. Use \family typewriter marsadm {get,set}-link \family default instead. There is a chance that these abstract commands (or variants thereof) will be retained, by acting on the new data representation in future, even if the old symlink format will vanish some day. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Important: the convention of placing the \series bold creator host name \series default inside your pathnames should be used wherever possible. The name part is a kind of \begin_inset Quotes eld \end_inset ownership indicator \begin_inset Quotes erd \end_inset . It is crucial that no other host writes any symlink not \begin_inset Quotes eld \end_inset belonging \begin_inset Quotes erd \end_inset to him. Other hosts may read foreign information as often as they want, but never modify them. This way, your cluster nodes are able to \emph on communicate \emph default with each other via symlink / information updates. \end_layout \begin_layout Standard Although experts might create (and change) the current symlinks with userspace tools like \family typewriter ln -s \family default , you should use the following marsadm commands instead: \end_layout \begin_layout Itemize \family typewriter marsadm set-link myvalue /mars/userspace/mykey-A \end_layout \begin_layout Itemize \family typewriter marsadm delete-file /mars/userspace/mykey-A \end_layout \begin_layout Standard There are many reasons for this: first, the \family typewriter marsadm set-link \family default command will automatically use the Lamport clock for symlink creation, and therefore will avoid any errors resulting from a \begin_inset Quotes eld \end_inset wrong \begin_inset Quotes erd \end_inset system clock (as in \family typewriter ln -s \family default ). Second, the \family typewriter marsadm delete-file \family default (which also deletes symlinks) works on the \emph on whole cluster \emph default . And finally, there is a chance that this will work in future versions of MARS even after the symlinks have vanished. \end_layout \begin_layout Standard What's the difference? If you would try to remove your symlink locally by hand via \family typewriter rm -f \family default , you will be surprised: since the symlink has been replicated to the other cluster nodes, it will be re-transferred from there and will be resurrected locally after some short time. This way, you cannot delete any object reliably, because your whole cluster (which may consist of many nodes) remembers all your state information and will \begin_inset Quotes eld \end_inset correct \begin_inset Quotes erd \end_inset it whenever \begin_inset Quotes eld \end_inset necessary \begin_inset Quotes erd \end_inset . \end_layout \begin_layout Standard In order to solve the deletion problem, MARS uses some internal deletion protocol using auxiliary symlinks residing in \family typewriter /mars/todo-global/. \family default The deletion protocol ensures that all replicas get deleted in the whole cluster, and only thereafter the auxiliary symlinks in \family typewriter /mars/todo-global/ \family default are also deleted eventually. \end_layout \begin_layout Standard You may update your already existing symlink via \family typewriter marsadm set-link some-other-value /mars/userspace/mykey-A \family default . The new value will be propagated throughout the cluster according to a \series bold timestamp comparison protocol \series default : whenever node B notices that A has a \emph on newer \emph default version of some symlink (according to the Lamport timestamp), it will replace its elder version by the newer one. The opposite does \emph on not \emph default work: if B notices that A has an elder version, just nothing happens. This way, the timestamps of symlinks can only progress in forward direction, but never backwards in time. \end_layout \begin_layout Standard As a consequence, symlink updates made \begin_inset Quotes eld \end_inset by hand \begin_inset Quotes erd \end_inset via \family typewriter ln -sf \family default may get lost when the local system clock is much more earlier than the Lamport clock. \end_layout \begin_layout Standard When your cluster is fully connected by the network, the last timestamp will finally win everywhere. Only in case of network outages leading to \emph on network partitions \emph default , some information may be \emph on temporarily inconsistent \emph default , but only for the duration of the network outage. The timestamp comparison protocol in combination with the Lamport clock and with the persistence of the \family typewriter /mars/ \family default filesystem will automatically heal any temporary inconsistencies as soon as possible, even in case of temporary node shutdown. \end_layout \begin_layout Standard The meaning of some internal MARS symlinks residing in \family typewriter /mars/ \family default will be hopefully documented in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Documentation-of-the" \end_inset some day. \end_layout \begin_layout Section Defending Overflow of \family typewriter /mars/ \begin_inset CommandInset label LatexCommand label name "sec:Defending-Overflow" \end_inset \end_layout \begin_layout Standard This section describes an important difference to DRBD. The metadata of DRBD is allocated \emph on statically \emph default at \emph on creation \emph default \emph on time \emph default of the resource. In contrast, the MARS transaction logfiles are allocated \emph on dynamically \emph default at \emph on runtime \emph default . \end_layout \begin_layout Standard This leads to a potential risk from the perspective of a sysadmin: what happens if the \family typewriter /mars/ \family default filesystem runs out of space? \end_layout \begin_layout Standard No risk, no fun. If you want a system which survives long-lasting network outages while keeping your replicas always consistent (anytime consistency), you \emph on need \emph default dynamic memory for that. It is \emph on impossible \emph default to solve that problem using static memory \begin_inset Foot status open \begin_layout Plain Layout The bitmaps used by DRBD don't preserve the \emph on order \emph default of write operations. They cannot do that, because their space is \begin_inset Formula $O(k)$ \end_inset for some constant \begin_inset Formula $k$ \end_inset . In contrast, MARS preserves the order. Preserving the order as such (even when only \emph on facts \emph default about the order were recorded without recording the actual data contents) requires \begin_inset Formula $O(n)$ \end_inset space where \begin_inset Formula $n$ \end_inset is infinitely growing over time. \end_layout \end_inset . \end_layout \begin_layout Standard Therefore, DRBD and MARS have different application areas. If you just want a simple system for mirroring your data over short distances like a crossover cable, DRBD will be a suitable choice. However, if you need to replicate over longer distances, or if you need higher levels of reliability even when multiple failures may accumulate (such as network loss during a \emph on re \emph default sync of DRBD), the transaction logs of MARS can solve that, but at some \emph on cost \emph default . \end_layout \begin_layout Subsection Countermeasures \end_layout \begin_layout Subsubsection Dimensioning of \family typewriter /mars/ \begin_inset CommandInset label LatexCommand label name "sub:Dimensioning-of-/mars/" \end_inset \end_layout \begin_layout Standard The first (and most important) measure against overflow of \family typewriter /mars/ \family default is simply to dimension it large enough to survive longer-lasting problems, at least one weekend. \end_layout \begin_layout Standard Recommended size is at least one dedicated disk, residing at a hardware RAID controller with BBU (see section \begin_inset CommandInset ref LatexCommand ref reference "sec:Preparation:-What-you" \end_inset ). During normal operation, that size is needed only for a small fraction, typically a few percent or even less than one percent. However, it is your \series bold safety margin \series default . Keep it high enough! \end_layout \begin_layout Subsubsection Monitoring \end_layout \begin_layout Standard The next (equally important) measure is \series bold monitoring in userspace \series default . \end_layout \begin_layout Standard Following is a list of countermeasures both in userspace and in kernelspace, in the order of \begin_inset Quotes eld \end_inset defensive walling \begin_inset Quotes erd \end_inset : \end_layout \begin_layout Enumerate Regular userspace monitoring must throw an INFO if a certain freespace limit \begin_inset Formula $l_{1}$ \end_inset of \family typewriter /mars/ \family default is undershot. Typical values for \begin_inset Formula $l_{1}$ \end_inset are 30%. Typical actions are automated calls of \family typewriter marsadm log-rotate all \family default followed by \family typewriter marsadm log-delete-all all \family default . You have to implement that yourself in sysadmin space. \end_layout \begin_layout Enumerate Regular userspace monitoring must throw a WARNING if a certain freespace limit \begin_inset Formula $l_{2}$ \end_inset of \family typewriter /mars/ \family default is undershot. Typical values for \begin_inset Formula $l_{2}$ \end_inset are 20%. Typical actions are (in addition to \family typewriter log-rotate \family default and \family typewriter log-delete-all \family default ) alarming human supervisors via SMS and/or further stronger automated actions. \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Frequently large space is occupied by files stemming from debugging output, or from other programs or processes. A hot candidate is \begin_inset Quotes eld \end_inset forgotten \begin_inset Quotes erd \end_inset removal of debugging output to \family typewriter /mars/ \family default . Sometimes, an \family typewriter rm -rf $(find /mars/ -name \begin_inset Quotes eld \end_inset *.log \begin_inset Quotes erd \end_inset ) \family default can work miracles. \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Another source of space hogging is a \begin_inset Quotes eld \end_inset forgotten \begin_inset Quotes erd \end_inset \family typewriter pause-sync \family default or \family typewriter disconnect \family default . Therefore, a simple \family typewriter marsadm connect-global all \family default followed by \family typewriter marsadm resume-replay-global all \family default may also work miracles (if you didn't want to freeze some mirror deliberately). \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset If you just wanted to freeze a mirror at an outdated state for a very long time, you simply \emph on cannot \emph default do that without causing infinite growth of space consumption in \family typewriter /mars/ \family default . Therefore, a \family typewriter marsadm leave-resource $res \family default at \emph on exactly that(!) \emph default secondary site where the mirror is frozen, can also work miracles. If you want to automate this in unserspace, be careful. It is easy to get unintended effects when choosing the wrong site for \family typewriter leave-resource \family default . \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Hint: you can / should start some of these measures even earlier at the INFO level (see item 1), or even earlier. \end_layout \begin_layout Enumerate Regular userspace monitoring must throw an ERROR if a certain freespace limit \begin_inset Formula $l_{3}$ \end_inset of \family typewriter /mars/ \family default is undershot. Typical values for \begin_inset Formula $l_{3}$ \end_inset are 10%. Typical actions are alarming the CEO via SMS and/or even stronger automated actions. For example, you may choose to automatically call \family typewriter marsadm leave-resource $res \family default on some or all secondary nodes, such that the primary will be left alone and now has a chance to really delete its logfiles because no one else is any longer potentially needing it. \end_layout \begin_layout Enumerate First-level kernelspace action, automatically executed when \family typewriter \begin_inset Flex URL status open \begin_layout Plain Layout /proc/sys/mars/required_free_space_4_gb \end_layout \end_inset \family default + \family typewriter \begin_inset Flex URL status open \begin_layout Plain Layout /proc/sys/mars/required_free_space_3_gb \end_layout \end_inset \family default + \family typewriter \begin_inset Flex URL status open \begin_layout Plain Layout /proc/sys/mars/required_free_space_2_gb \end_layout \end_inset \family default + \family typewriter \begin_inset Flex URL status open \begin_layout Plain Layout /proc/sys/mars/required_free_space_1_gb \end_layout \end_inset \family default is undershot: \begin_inset Newline newline \end_inset a warning will be issued. \end_layout \begin_layout Enumerate Second-level kernelspace action, automatically executed when \family typewriter \begin_inset Flex URL status open \begin_layout Plain Layout /proc/sys/mars/required_free_space_3_gb \end_layout \end_inset \family default + \family typewriter \begin_inset Flex URL status open \begin_layout Plain Layout /proc/sys/mars/required_free_space_2_gb \end_layout \end_inset \family default + \family typewriter \begin_inset Flex URL status open \begin_layout Plain Layout /proc/sys/mars/required_free_space_1_gb \end_layout \end_inset \family default is undershot: \begin_inset Newline newline \end_inset all locally secondary resources will delete local copies of transaction logfiles which are no longer needed locally. This is a desperate action of the kernel module. \end_layout \begin_layout Enumerate Third-level kernelspace action, automatically executed when \family typewriter \begin_inset Flex URL status open \begin_layout Plain Layout /proc/sys/mars/required_free_space_2_gb \end_layout \end_inset \family default + \family typewriter \begin_inset Flex URL status open \begin_layout Plain Layout /proc/sys/mars/required_free_space_1_gb \end_layout \end_inset \family default is undershot: \begin_inset Newline newline \end_inset all locally secondary resources will stop fetching transaction logfiles. This is a more desperate action of the kernel module. You don't want to get there (except for testing). \end_layout \begin_layout Enumerate Last desperate kernelspace action when all else has failed and \family typewriter \begin_inset Flex URL status open \begin_layout Plain Layout /proc/sys/mars/required_free_space_1_gb \end_layout \end_inset \family default is undershot: \begin_inset Newline newline \end_inset all locally primary resources will enter \series bold emergency mode \series default (see description below in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Emergency-Mode" \end_inset ). This is the most desperate action of the kernel module. You don't want to get there (except for testing). \end_layout \begin_layout Standard In addition, the kernel module obeys a general global limit \family typewriter \begin_inset Flex URL status open \begin_layout Plain Layout /proc/sys/mars/required_total_space_0_gb \end_layout \end_inset + \family default the sum of all of the above limits. When the \emph on total size \emph default of \family typewriter /mars/ \family default undershots that sum, the kernel module refuses to start at all, because it assumes that it is senseless to try to operate MARS on a system with such low memory resources. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset The current level of emergency kernel actions may be viewed at any time via \family typewriter \begin_inset Flex URL status collapsed \begin_layout Plain Layout /proc/sys/mars/mars_emergency_mode \end_layout \end_inset \family default . \end_layout \begin_layout Subsubsection Throttling \end_layout \begin_layout Standard The last measure for defense of overflow is \series bold throttling your performance pigs \series default . \end_layout \begin_layout Standard Motivation: in rare cases, some users with \family typewriter ssh \family default access can do \emph on very \emph default silly things. For example, some of them are creating their own backups via user-cron jobs, and they do it every 5 minutes. Some example guy created a zip archive (almost 1GB) by regularly copying his old zip archive into a new one, then appending deltas to the new one, and finally deleting the old archive. Every 5 minutes. Yes, every 5 minutes, although almost never any new files were added to the archive. Essentially, he copied over his archive, for nothing. This led to massive bulk write requests, for ridiculous reasons. \end_layout \begin_layout Standard In general, your hard disks (or even RAID systems) allow much higher write IO rates than you can ever transport over a standard TCP network from your primary site to your secondary, at least over longer distances (see use cases for MARS in chapter \begin_inset CommandInset ref LatexCommand ref reference "chap:Use-Cases-for" \end_inset ). Therefore, it is easy to create a such a high write load that it will be \emph on impossible \emph default to replicate it over the network, \emph on by construction \emph default . \end_layout \begin_layout Standard Therefore, we \emph on need \emph default some mechanism for throttling bulk writers whenever the network is weaker than your IO subsystem. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Notice that DRBD will \emph on always \emph default throttle your writes whenever the network forms a bottleneck, due to its synchronous operation mode. In contrast, MARS allows for buffering of performance peaks in the transaction logfiles. \emph on Only when \emph default your buffer in \family typewriter /mars/ \family default runs short (cf subsection \begin_inset CommandInset ref LatexCommand ref reference "sub:Dimensioning-of-/mars/" \end_inset ), MARS will start to throttle your application writes. \end_layout \begin_layout Standard There are a lot of screws named \family typewriter /proc/sys/mars/write_throttle_* \family default with the following meaning: \end_layout \begin_layout Description \family typewriter write_throttle_start_percent \family default Whenever the used space in \family typewriter /mars/ \family default is below this threshold, no throttling will occur at all. Only when this threshold is exceeded, throttling will start \emph on slowly \emph default . Typical values for this are 60%. \end_layout \begin_layout Description \family typewriter write_throttle_end_percent \family default Maximum throttling will occur once this space threshold is reached, i.e. the throttling is now at its maximum effect. Typical values for this are 90%. When the actual space in \family typewriter /mars/ \family default lies between \family typewriter write_throttle_start_percent \family default and \family typewriter write_throttle_end_percent \family default , the strength of throttling will be interpolated linearly between the extremes. In practice, this should lead to an equilibrum between new input flow into \family typewriter /mars/ \family default and output flow over the network to secondaries. \end_layout \begin_layout Description \family typewriter write_throttle_size_threshold_kb \family default (readonly) This parameter shows the internal strength calculation of the throttling. Only write \begin_inset Foot status open \begin_layout Plain Layout Read requests are never throttled at all. \end_layout \end_inset requests exceeding this size (in KB) are throttled at all. Typically, this will hurt the bulk performance pigs first, while leaving ordinary users (issuing small requests) unaffected. \end_layout \begin_layout Description \family typewriter write_throttle_ratelimit_kb \family default Set the global IO rate in KB/s for those write requests which are throttled. In case of strongest \begin_inset Foot status open \begin_layout Plain Layout In case of lighter throttling, the input flow into \family typewriter /mars/ \family default may be higher because small requests are not throttled. \end_layout \end_inset throttling, this parameters determines the input flow into \family typewriter /mars/ \family default . The default value is 5.000 KB/s. Please adjust this value to your application needs and to your environment. \end_layout \begin_layout Description \family typewriter write_throttle_rate_kb \family default (readonly) Shows the current rate of exactly those requests which are actually throttled (in contrast to \emph on all \emph default requests). \end_layout \begin_layout Description \family typewriter write_throttle_cumul_kb \family default (logically readonly) Same as before, but the cumulative sum of all throttled requests since startup / reset. This value can be reset from userspace in order to prevent integer overflow. \end_layout \begin_layout Description \family typewriter write_throttle_count_ops \family default (logically readonly) Shows the cumulative number of throttled requests. This value can be reset from userspace in order to prevent integer overflow. \end_layout \begin_layout Description \family typewriter write_throttle_maxdelay_ms \family default Each request is delayed at most for this timespan. Smaller values will improve the responsiveness of your userspace application, but at the cost of potentially retarding the requests not sufficiently. \end_layout \begin_layout Description \family typewriter write_throttle_minwindow_ms \family default Set the minimum length of the measuring window. The measuring window is the timespan for which the average (throughput) rate is computed (see \family typewriter write_throttle_rate_kb \family default ). Lower values can increase the responsiveness of the controller algorithm, but at the cost of accuracy. \end_layout \begin_layout Description \family typewriter write_throttle_maxwindow_ms \family default This parameter must be set sufficiently much greater than \family typewriter write_throttle_minwindow_ms \family default . In case the flow of throttled operations pauses for some natural reason (e.g. switched off, low load, etc), this parameter determines when a completely new rate calculation should be started over \begin_inset Foot status open \begin_layout Plain Layout Motivation: if requests would pause for one hour, the measuring window could become also an hour. Of course, that would lead to completely meaningless results. Two requests in one hour is \begin_inset Quotes eld \end_inset incorrect \begin_inset Quotes erd \end_inset from a human point of view: we just have to ensure that averages are computed with respect to a reasonable maximum time window in the magnitude of 10s. \end_layout \end_inset . \end_layout \begin_layout Subsection Emergency Mode and its Resolution \begin_inset CommandInset label LatexCommand label name "sub:Emergency-Mode" \end_inset \end_layout \begin_layout Standard When \family typewriter /mars/ \family default is almost full and there is really absolutely no chance of getting rid of any local transaction logfile (or free some space in any other way), there is only one exit strategy: stop creating new logfile data. \end_layout \begin_layout Standard This means that the ability for replication gets lost. \end_layout \begin_layout Standard When entering emergency mode, the kernel module will execute the following steps for all resources where the affected host is acting as a primary: \end_layout \begin_layout Enumerate Do a kind of \begin_inset Quotes eld \end_inset logrotate \begin_inset Quotes erd \end_inset , but create a \emph on hole \emph default in the sequence of transaction logfile numbers. The \begin_inset Quotes eld \end_inset new \begin_inset Quotes erd \end_inset logfile is left empty, i.e. no data ist written to it (for now). The hole in the numbering will prevent any secondaries from replaying any logfiles behind the hole (should they ever contain some data, e.g. because the emergency mode has been left again). This works because the secondaries are regularly checking the logfile numbers for contiguity, and they will refuse to replay anything which is not contiguous. As a result, the secondaries will be left in a consistent, but outdated state (at least if they already were consistent before that). \end_layout \begin_layout Enumerate The kernel module writes back all data present in the temporary memory buffer (see figure in section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Transaction-Logger" \end_inset ). This may lead to a (short) delay of user write requests until that has finished (typically fractions of a second or a few seconds). The reason is that the temporary memory buffer must not be increased in parallel during this phase (race conditions). \end_layout \begin_layout Enumerate After the temporary memory buffer is empty, all local IO requests (whether reads or writes) are directly going to the underlying disk. This has the same effect as if MARS would not be present anymore. Transaction logging does no longer take place. \end_layout \begin_layout Enumerate Any sync from any secondary is stopped ASAP. In case they are resuming their sync somewhen later, they will start over from the beginning (position \begin_inset Formula $0$ \end_inset ). \end_layout \begin_layout Standard In order to leave emergency mode, the sysadmin should do the following steps: \end_layout \begin_layout Enumerate Free enough space. For example, delete any foreign files on \family typewriter /mars/ \family default which have nothing to do with MARS, or resize the \family typewriter /mars/ \family default filesystem, or whatever. \end_layout \begin_layout Enumerate If \family typewriter \begin_inset Flex URL status open \begin_layout Plain Layout /proc/sys/mars/mars_reset_emergency \end_layout \end_inset \family default is not set, now it is time to set it. Normally, it should be already set. \end_layout \begin_layout Enumerate Notice: as long as not enough space has been freed, a message containing \family typewriter \begin_inset Quotes eld \end_inset EMEGENCY MODE HYSTERESIS \begin_inset Quotes erd \end_inset \family default (or similar) will be displayed by \family typewriter marsadm view all \family default . As a consequence, any sync will be automatically halted. This applies to freshly invoked syncs also, for example created by \family typewriter invalidate \family default or \family typewriter join-resource \family default . \end_layout \begin_layout Enumerate On the secondaries, use \family typewriter marsadm invalidate $res \family default in order to request updating your outdated mirrors. \end_layout \begin_layout Enumerate On the primary: \family typewriter marsadm log-delete-all all \end_layout \begin_layout Enumerate As soon as emough space has been freed everywhere to leave the \family typewriter EMEGENCY MODE HYSTERESIS \family default , sync should really start. Until that it had been halted. \end_layout \begin_layout Standard Alternatively, there is another method by roughly following the instructions from appendix \begin_inset CommandInset ref LatexCommand ref reference "chap:Alternative-Methods-for" \end_inset , but in a slightly different order. In this case, do \family typewriter leave-resource \family default everywhere on \emph on all \emph default secondaries, but \emph on don't \emph default start the \family typewriter join-resource \family default phase \emph on for now \emph default . Then cleanup all your secondaries via \family typewriter log-purge-all \family default , and finally \family typewriter log-delete-all all \family default at the primary, and wait until the emergency has vanished everywhere. Only after that, re- \family typewriter join-resource \family default your secondaries. \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Expert advice for \begin_inset Formula $k=2$ \end_inset replicas: this means you had only 1 mirror per resource before the overflow happened. Provided that you have enough space on your LVMs and on \family typewriter /mars/ \family default , and provided that transaction logging has automatically restarted after \family typewriter leave-resource \family default and \family typewriter log-purge-all \family default , you can recover redundancy by creating a \emph on new \emph default replica via \family typewriter marsadm join-resource $res \family default on a \emph on third \emph default node. Only after the initial full sync has finished there, run \family typewriter join-resource \family default at your original mirror. This way, you will always retain at least one \series bold consistent mirror \series default somewhere. After all is up-to-date, you can delete the superfluous mirror by \family typewriter marsadm leave-resource $res \family default and reclaim the disk space from its underlying LVM disk. \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset If you already have \begin_inset Formula $k>2$ \end_inset replicas in total, it may be a wise idea to prefer the \family typewriter leave-resource ; log-purge-all ; join-resource \family default method in front of \family typewriter invalidate \family default because it does not invalidate \emph on all \emph default your replicas at the same time (when handled properly in the right order). \end_layout \begin_layout Chapter The Macro Processor \begin_inset CommandInset label LatexCommand label name "chap:The-Macro-Processor" \end_inset \end_layout \begin_layout Standard \family typewriter marsadm \family default comes with a customizable macro processor. It can be used for high-level complex display of the state of MARS (so-called \emph on complex macros \emph default ), as well as for low-level display of lots of individual state values (so-calle d \emph on primitive macros \emph default ). \end_layout \begin_layout Standard From the commandline, any macro can be called via \family typewriter marsadm view- \emph on $macroname \emph default mydata \family default . The short form \family typewriter marsadm view mydata \family default is equivalent to \family typewriter marsadm view-default mydata \family default . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset In general, the command \family typewriter marsadm view- \emph on $macroname \emph default all \family default will first call the macro \family typewriter \emph on $macroname \family default \emph default in a loop for \emph on all \emph default resources we are a \emph on member locally \emph default . Finally, a trailing macro \family typewriter \emph on $macroname \emph default -global \family default will be called with an empty \family typewriter %{res} \family default argument, provided that such a macro is defined. This way, you can produce per-resource output followed by global output which does not depend on a particular resource. \end_layout \begin_layout Section Predefined Macros \end_layout \begin_layout Standard The macro processor is a very flexible and versatile tool for \series bold customizing \series default . You can create your own macros, but probably the rich set of predefined macros is already sufficient for your needs. \end_layout \begin_layout Subsection Predefined Complex and High-Level Macros \begin_inset CommandInset label LatexCommand label name "sub:Predefined-Complex-and" \end_inset \end_layout \begin_layout Standard The following predefined complex macros try to address the information needs of humans. Use them only in scripts when you are prepared about the fact that the output format may change during development of MARS. \end_layout \begin_layout Standard Notice: the definitions of predefined complex macros may be updated in the course of the MARS project. However, the primitive macros recursively called by the complex ones will be hopefully rather stable in future (with the exception of bugfixes). If you want to retain an old / outdated version of a complex macro, just check it out from git, follow the instructions in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Creating-your-own" \end_inset , and preferably give it a different name in order to avoid confusion with the newer version. In general, it should be possible to use old macros with newer versions of \family typewriter marsadm \family default \begin_inset Foot status open \begin_layout Plain Layout You might need to check out also old versions of further macros and adapt their names, whenever complex macros call each other. \end_layout \end_inset . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter default \family default This is equivalent to \family typewriter marsadm view mydata \family default without \family typewriter \emph on -maroname \family default \emph default suffix. It shows a one-line status summary for each resource, optionally followed by informational lines such as progress bars whenever a sync or a fetch of logfiles is currently running. The status line has the following fields: \end_layout \begin_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter %{res} \family default resource name. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter %include{diskstate} \family default see \family typewriter diskstate \family default macro below. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter %include{replstate} \family default see \family typewriter replstate \family default macro below. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter %include{flags} \family default see \family typewriter flags \family default macro below. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter %include{role} \family default see \family typewriter role \family default macro below. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter %include{primarynode} \family default see \family typewriter primarynode \family default macro below. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter %include{commstate} \family default see \family typewriter commstate \family default macro below. \end_layout \end_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \begin_inset space ~ \end_inset After that, optional lines such as progress bars are appearing only when something unusual is happening. These lines are subject to future changes. For examples, wasted disk space due to missing \family typewriter resize \family default is reported when \family typewriter %{threshold} \family default is exceeded. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter 1and1 \family default \begin_inset space ~ \end_inset or \begin_inset space ~ \end_inset \family typewriter default-1and1 \family default A variant of \family typewriter default \family default for internal use by 1&1 Internet AG. You may call this complex macro by saying \family typewriter marsadm view-1and1 all \family default . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Note: the \family typewriter marsadm view-1and1 \family default command has been intensely tested in Spring 2014 to produce exactly the same output than the 1&1 internal \begin_inset Foot status open \begin_layout Plain Layout In addition to allow for customization, the macro processor is also meant as an exit strategy for removing dependencies from non-free software. \series bold Please put your future macros also under GPL! \end_layout \end_inset tool \family typewriter marsview \family default \begin_inset Foot status open \begin_layout Plain Layout There are some subtle differences: numbers are displayed in a different precision, some bug fixes in the macro version (which might have occurred \emph on in the meantime \emph default ) may lead to different output as a side effect from bug fixes in \emph on predefined \emph default macros, because the original \family typewriter marsview \family default command is currently not actively maintained. Documentation of \family typewriter marsview \family default can be found in the corresponding manpage, see \family typewriter man marsview \family default . By construction, this is also the (unmaintained) documentation of \family typewriter marsadm view-1and1 \family default and other \family typewriter -1and1 \family default macros. Notice that all \family typewriter *-1and1 \family default macros are not officially supported by the developer of MARS, and they may disappear in a future major release. However, they could be useful for your own customization macros. \end_layout \end_inset \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Customization via your own macros (see section \begin_inset CommandInset ref LatexCommand ref reference "sub:Creating-your-own" \end_inset ) is explicitly encouraged by the developer. It would be nice if a vibrant user community would emerge, helping each other by exchange of macros. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Hint: in order to produce your own customized inspection / monitoring tools, you may ask the author for an official reservation of a macro sub-namespace such as \family typewriter *- \emph on yourcompanyname \family default \emph default . You will be fully responsible for your own reserved namespace and can do with it whatever you want. The official MARS release will guarantee that \emph on no name clashes \emph default with your reserved sub-namespace will occur in future. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter default-global \family default Currently, this just calls \family typewriter comminfo \family default (see below). May be extended in future. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter diskstate \family default Shows the status of the underlying disk device, in the following order of precedence \begin_inset Foot status open \begin_layout Plain Layout When an earlier list item is displayed, no combinations with following items are possible. This kind of \begin_inset Quotes eld \end_inset hiding effect \begin_inset Quotes erd \end_inset can lead to an \emph on information loss \emph default . In order to get a non-lossy picture from the state of your system, please look at the \family typewriter flags \family default which are able to display cartesian combinations of more detailed internal states. \end_layout \end_inset : \end_layout \begin_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter NotJoined \family default (cf \family typewriter %get-disk{} \family default ) No underlying disk device is configured. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter NotPresent \family default (cf \family typewriter %disk-present{} \family default ) The underlying disk device (as configured, see \family typewriter marsadm view-get-disk \family default ) does not exist or the device node is not accessible. Therefore MARS cannot work. Check that LVM or other software is properly configured and running. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter Detached \family default (cf \family typewriter InConsistent \family default , \family typewriter NeedsReplay \family default , \family typewriter %todo-attach{} \family default , \family typewriter %is-attach{} \family default ) The underlying disk is willingly switched off (see \family typewriter marsadm detach \family default ), and it actually is no longer opened by MARS. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter Detaching \family default (cf \family typewriter %todo-attach{} \family default and \family typewriter %is-attach{} \family default ) Access to the underlying disk is switched off, but actually not yet \family typewriter close() \family default d by MARS. This can happen for a long time on a primary when other secondaries are accessing the disk remotely for syncing. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter DefectiveLog[ \emph on description-text \emph default ] \family default (cf \family typewriter %replay-code{} \family default ) Typicially this indicates an \family typewriter md5 \family default checksum error in a transaction logfile, or another (hardware / filesystem) defect. This occurs extremely rarely in practice, but has been observed more frequently during a massive failure of air conditioning in a datacenter, when disk temperatures raised to more than 80° Celsius. Notice that a secondary \series bold refuses \series default to apply any knowingly defective logfile data to the disk. Although this message is \emph on not directly \emph default referring to the underlying disk, it is mentioned here because of its superior \series bold relevance \series default for the diskstate. A damaged transaction logfile will always affect the \emph on actuality \emph default of the disk, but not its \emph on integrity \emph default (by itself). What to do in such a case? \end_layout \begin_deeper \begin_layout Enumerate When the damage is only at one of your secondaries, you should first ensure that the primary has a good logfile after a \family typewriter marsadm log-rotate \family default , then try \family typewriter marsadm invalidate \family default at the damaged secondary. It is crucial that the primary has a fresh correct logfile behind the error position, and that it is continuing to operate correctly. \end_layout \begin_layout Enumerate When \emph on all \emph default of your secondaries are reporting \family typewriter DefectiveLog \family default , the primary could have \emph on produced \emph default a damaged logfile (e.g. in RAM, in a DMA channel, etc) while continuing to operate, and all of your secondaries got that defective logfile. After \family typewriter marsadm log-delete-all all \family default , you can check this by comparing the \family typewriter md5sum \family default of the first primary logfile (having the lowest serial number) with the versions on your replicas. The problem is that you don't know whether the primary side has a silent corruption on any of its disks, or not. You will need to take an operational decision whether to switchover to a secondary via \family typewriter primary --force \family default , or whether to continue operation at the primary and \family typewriter invalidate \family default your secondaries. \end_layout \begin_layout Enumerate When the original primary is affected in a very bad way, such that it crashed badly and afterwards even recovery of the \emph on primary \emph default is impossible \begin_inset Foot status open \begin_layout Plain Layout In such a rare case, the \emph on original primary \emph default (but not any other host) \series bold refuses \series default to come up during recovery with \emph on his own \emph default logfile originally produced by \emph on himself \emph default . This is not a bug, but saves you from incorrectly assuming that your original primary disk were consistent - it is \emph on known \emph default to be inconsistent, but recovery is impossible due to the damaged logfile. Thus \emph on this one \emph default replica is trapped by defective hardware. The other replicas shouldn't. \end_layout \end_inset due to this error (which typically occurs extremely rarely, observed two times during 7 millions of operating hours on defective hardware), you need to take an operational decision between the following alternatives: \end_layout \begin_deeper \begin_layout Enumerate switchover to a former secondary via \family typewriter primary --force \family default , producing a split brain, and producing some (typically small) data loss. However, integrity is more important than actuality in such an extreme case. \end_layout \begin_layout Enumerate deconstruction of the resource at \emph on all \emph default replicas via \family typewriter leave-resource --force \family default , running \family typewriter fsck \family default or similar tools by hand at the underlying disks, selecting the best replica out of them, and finally re-constructing the resource again. \end_layout \begin_layout Enumerate restore your backup. \end_layout \end_deeper \end_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter NoAttach \family default (cf \family typewriter %is-attach{} \family default ) The underlying disk is currently not opened by MARS. Reasons may be that the kernel module is not loaded, or an exclusive \family typewriter open() \family default is currently not possible because somebody else has already opened it. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter InConsistent \family default (cf \family typewriter %is-consistent{} \family default ) A logfile replay and/or sync is known to be needed / or to complete (e.g. after \family typewriter invalidate \family default has started) in order to restore local consistency (for details, look at \family typewriter flags \family default ). \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Hint: in the current implementation of MARS, this will never happen on secondari es during ordinary replay (but only when either sync has not yet finished, or when the \emph on initial \emph default logfile replay after the sync has not yet finished), because the ordinary logfile replay always maintains anytime consistency once a consistent state had been reached. \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset \emph on Only \emph default in case of a primary node crash, and \emph on only \emph default after attempts have failed to become primary again (e.g. IO errors, etc), this \emph on can \emph default (but need not) mean that something went wrong. Even in such an extremely unlikely event, chances are high that \family typewriter fsck \family default can fix any remaining problems (and, of course, you can also switchover to a former secondary). \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset When this message appears, simply start MARS again (e.g. \family typewriter modprobe mars; marsadm up all \family default ), in whatever role you are intending. This will \emph on automatically \emph default try to replay any necessary transaction logfile(s) in order to fix the inconsistency. Only if the automatic fix fails and this message persists for a long time without progress, you \emph on might \emph default have a problem. Typically, as observed at a large installation at 1&1, this happens extremely rarely, and then typically indicates that your hardware is likely to be defective. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter OutDated[FR] \family default (cf \family typewriter %work-reached{} \family default ) Only at secondaries. Tells whether it is \emph on currently known \emph default that the disk has any lag-behind when compared to the \emph on currently known \emph default state of the current designated primary (if there exists one). Only meaningful if a current designated primary exists. Notice that this kind of status display is subject to \emph on natural races \emph default , for example when new logfile data has been produced in parallel, or network propagation is very slow. Additional information is in brackets: \end_layout \begin_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter [F] \family default Fetch is known to be needed. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter [R] \family default Replay is known to be needed. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter [FR] \family default Both are known to be needed. \end_layout \end_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter WriteBack \family default (cf \family typewriter %is-primary{} \family default ) Appears only at actual primaries (whether designated or not), when the writeback from the RAM buffer is active (see section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Transaction-Logger" \end_inset ) \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter Recovery \family default (cf \family typewriter %todo-primary{} \family default ) Appears only at the designated primary before it actually has become primary. Similar to database recovery, this indicates the recovery phase after a crash \begin_inset Foot status open \begin_layout Plain Layout In some cases, \family typewriter primary --force \family default may also trigger this message. \end_layout \end_inset . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter EmergencyMode \family default (cf \family typewriter %is-emergency{} \family default ) A current designated primary exists, and it is known that this host has entered emergency mode. See section \begin_inset CommandInset ref LatexCommand ref reference "sub:Emergency-Mode" \end_inset . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter UpToDate \family default Displayed when none of the above has been detected. \end_layout \end_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter diskstate-1and1 \family default A variant for internal use by 1&1 Internet AG. See above note. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter replstate \family default Shows the status of the replication in the following order of precedence: \end_layout \begin_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter ModuleNotLoaded \family default (cf \family typewriter %is-module-loaded{} \family default ) No kernel module is loaded, and as a consequence no \family typewriter /proc/sys/mars/ \family default does exist. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter UnResponsive \family default (cf \family typewriter %is-alive{%{host}} \family default ) The main thread \family typewriter mars_light \family default did not do any noticable work for more than \family typewriter %{window} \family default (default 30) seconds. Notice that this may happen when deleting \emph on extremely \emph default large logfiles (up to hundreds of gigabytes or terabytes). If this happens for a \emph on very \emph default long time, you should check whether you might need a reboot in order to fix the hang. The time window may be changed by \family typewriter --window=$seconds \family default . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter NotJoined \family default (cf \family typewriter %get-disk{} \family default ) No underlying disk device is configured for this resource. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter NotStarted \family default (cf \family typewriter %todo-attach{} \family default ) Replication has not been started. \end_layout \begin_layout Itemize When the current host is designated as a primary, the rest of the precedence list looks as follows: \end_layout \begin_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter EmergencyMode \family default (cf. \family typewriter %is-emergency{} \family default ) See section \begin_inset CommandInset ref LatexCommand ref reference "sub:Emergency-Mode" \end_inset . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter Replicating \family default (cf. \family typewriter %is-primary{} \family default ) Primary mode has been entered. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter NotYetPrimary \family default (catchall) This means the current host \emph on should \emph default act as a primary (see \family typewriter marsadm primary \family default or \family typewriter marsadm primary --force \family default ), but currently doesn't (yet). This happens during logfile replay, before primary mode is actually entered. Notice that replay of very big logfiles may take a long time. \end_layout \end_deeper \begin_layout Itemize When the current host is \emph on not \emph default designated as a primary: \end_layout \begin_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter PausedSync \family default (cf. \family typewriter %sync-rest{} \family default and \family typewriter %todo-sync{} \family default ) Some data needs to be synced, but sync is currently switched off. See \family typewriter marsadm {pause,resume}-sync \family default . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter Syncing \family default (cf. \family typewriter %is-sync{} \family default ) Sync is currently running. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter PausedFetch \family default (cf. \family typewriter %todo{fetch} \family default ) Fetch is currently switched off. See \family typewriter marsadm {pause,resume}-fetch \family default . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter PausedReplay \family default (cf. \family typewriter %todo{replay} \family default ) Replay is currently switched off. See \family typewriter marsadm {pause,resume}-replay \family default . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter NoPrimaryDesignated \family default (cf. \family typewriter %get-primary{} \family default ) A \family typewriter secondary \family default command has been given somewhere in the cluster. Thus no designated primary exists. All resource members are in state \family typewriter Secondary \family default or try to approach it. Sync and other operations are not possible. This state is therefore not recommended. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter PrimaryUnreachable \family default (cf. \family typewriter %is-alive{} \family default ) A current designated primary has been set, but this host has not been remotely updated for more than 30 seconds (see also \family typewriter --window=$seconds \family default ). \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter Replaying \family default (catchall) None of the previous conditions have triggered. \end_layout \end_deeper \end_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter replstate-1and1 \family default A variant for internal use by 1&1 Internet AG. See above note. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter flags \family default For each of disk, consistency, attach, sync, fetch, and replay, show exactly one character. Each character is either a capital one, or the corresponding lowercase one, or a dash. The meaning is as follows: \end_layout \begin_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 disk/device: \family typewriter D \family default = the device \family typewriter /dev/mars/mydata \family default is present, \family typewriter d \family default = only the underlying disk \family typewriter /dev/lv-x/mydata \family default is present, \family typewriter - \family default = none present / configured. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 consistency: this relates to the \emph on underlying disk \emph default , not to \family typewriter /dev/mars/mydata \family default ! \family typewriter C \family default = locally consistent, \family typewriter c \family default = maybe inconsistent (no guarantee), - = cannot determine. Notice: this does not tell anything about \emph on actuality \emph default . Notice: like the other flags, this flag is subject to races and therefore should be relied on only in \emph on detached \emph default state! See also description of macro \family typewriter is-consistent \family default below. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 attach: \family typewriter A \family default = attached, \family typewriter a \family default = currently trying to attach/detach but not yet ready (intermediate state), \family typewriter - \family default = attach is switched off. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 sync: \family typewriter S \family default = sync finished, \family typewriter s \family default = currently syncing, \family typewriter - \family default = sync is switched off. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 fetch: \family typewriter F \family default = according to knowlege, fetched logfiles are up-to-date, \family typewriter f \family default = currently fetching (some parts of) a logfile, \family typewriter - \family default = fetch is switched off. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 replay: \family typewriter R \family default = all fetched logfiles are replayed, \family typewriter r \family default = currently replaying, \family typewriter - \family default = replay is switched off. \end_layout \end_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter flags-1and1 \family default A variant for internal use by 1&1 Internet AG. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter todo-role \family default Shows the \emph on designated \emph default state: \family typewriter None \family default , \family typewriter Primary \family default or \family typewriter Secondary \family default . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter role \family default Shows the \emph on actual \emph default state: \family typewriter None \family default , \family typewriter NotYetPrimary \family default , \family typewriter Primary \family default , \family typewriter RemainsPrimary \family default , or \family typewriter Secondary \family default . Any differences to the designated state are indicated by a prefix to the keyword \family typewriter Primary \family default : \family typewriter NotYet \family default means that it \emph on should \emph default become primary, but actually hasn't. Vice versa, \family typewriter Remains \family default means that it \emph on should \emph default leave primary state in order to become secondary, but actually cannot do that because the \family typewriter /dev/mars/mydata \family default device is currently in use . \begin_inset Newline newline \end_inset \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter %todo-primary{} == 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter %todo-primary{} == 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter %is-primary{} == 0 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter None \family default / \family typewriter Secondary \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter NotYetPrimary \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter %is-primary{} == 1 \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter RemainsPrimary \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter Primary \end_layout \end_inset \end_inset \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter role-1and1 \family default A variant for internal use by 1&1 Internet AG. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter primarynode \family default Display \family typewriter (none) \family default or the hostname of the designated primary. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter primarynode-1and1 \family default A variant for internal use by 1&1 Internet AG. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter commstate \family default When the last metadata communication to the designated primary is longer ago than \family typewriter ${window} \family default (see also \family typewriter --window= \emph on seconds \family default \emph default option), display that age in human readable form. See also primitive macro \family typewriter %alive-age{} \family default . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter syncinfo \family default Shows an informational progress bar when sync is running. Intended for humans. Scripts should not rely on any details from this. Scripts may use this only as an \emph on approximate \emph default means for detecting progress (when comparing the \emph on full \emph default output text to a prior version and finding \emph on any \emph default difference, they may conclude that some progress has happened, how small whatsoever). \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter syncinfo-1and1 \family default A variant for internal use by 1&1 Internet AG. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter replinfo \family default Shows an informational progress bar when fetch is running. This should not be used for scripting at all, because it contains realtime information in human-readable form. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter replinfo-1and1 \family default A variant for internal use by 1&1 Internet AG. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter fetch-line \family default Additional details, called by \family typewriter replinfo \family default . Shows the amount of data to be fetched, as well as the current transfer rate and a very rough estimation of the future duration. When primitive macros \family typewriter %fetch-age{} \family default or \family typewriter %fetch-lag{} \family default exceed \family typewriter ${window} \family default , their values are also displayed for human informational purposes. See description of these primitive macros. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter replay-line \family default Additional details, called by \family typewriter replinfo \family default . Shows the amount of data to be replayed, as well as the current replay rate and a very rough estimation of the future duration. When primitive macro \family typewriter %replay-age{} \family default exceeds \family typewriter ${window} \family default , it is also displayed for human informational purposes. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter comminfo \family default When the network communication is in an unusual condition, display it. Otherwise, don't produce any output. \end_layout \begin_layout Subsection Predefined Primitive Macros \begin_inset CommandInset label LatexCommand label name "sub:Predefined-Trivial-Macros" \end_inset \end_layout \begin_layout Subsubsection Intended for Humans \end_layout \begin_layout Standard In the following, shell glob notation \family typewriter {a,b} \family default is used to document similar variants of similar macros in a single place. When you actually call the macro, you must choose one of the possible variants (excluding the braces). \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter the-err-msg \family default Show reported errors for a resource. When the resource argument is missing or empty, show global error information. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter all-err-msg \family default Like before, but show all information including those which are \family typewriter OK \family default . This way, you get a list \begin_inset Foot status open \begin_layout Plain Layout The list may be extended in future versions of MARS. \end_layout \end_inset of \emph on all \emph default potential error information present in the system. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {all,the}-wrn-msg \family default Show all / reported warnings in the system. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {all,the}-inf-msg \family default Show all / reported informational messages in the system. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {all,the}-msg \family default Show all / reported messages regardless of its classification. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {all,the}-global-msg \family default Show global messages not associated with any resource (the resource argument of the \family typewriter marsadm \family default command is ignored in this case). \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {all,the}-global-{inf,wrn,err}-msg \family default Dito, but more specific. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {all,the}-pretty-{global-,}{inf-,wrn-,err-,}msg \family default Dito, but show numerical timestamps in a human readable form. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {all,the}-{global-,}{inf-,wrn-,err-,}count \family default Instead of showing the messages, show their count (number of lines). \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter errno-text \family default This macro takes 1 argument, which must represent a Linux \family typewriter errno \family default number, and converts it to human readable form (similar to the C \family typewriter strerror() \family default function). \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter todo-{attach,sync,fetch,replay,primary} \family default Shows a boolean value (0 or 1) indicating the current state of the correspondin g todo switch (whether on or off). The meaning of todo switches is illustrated in section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-State-of" \end_inset . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter get-resource-{fat,err,wrn} \family default Access to the internal error status files. This is not an official interface and may thus change at any time without notice. Use this only for human inspection, not for scripting! \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset These macros, as well as the error status files, are likely to disappear in future versions of MARS. They should be used for debugging only. At least when merging into the upstream Linux kernel, only the \family typewriter *-msg \family default macros will likely survive. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter get-resource-{fat,err,wrn}-count \family default Dito, but get the number of lines instead of the text. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter replay-code \family default Indicate the current state of logfile replay / recovery: \end_layout \begin_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 (empty) Unknown. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 0 No replay is currently running. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 1 Replay is currently running. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 2 Replay has successfully stopped. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 <0 See Linux \family typewriter errno \family default code. Typically this indicates a damaged logfile, or another filesystem error at \family typewriter /mars \family default . \end_layout \end_deeper \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter is-{attach,sync,fetch,replay,primary,module-loaded} \family default Shows a boolean value (0 or 1) indicating the \emph on actual \emph default state, whether the corresponding action has been actually carried out, or not (yet). Notice that the values indicated by \family typewriter is-* \family default may differ from the \family typewriter todo-* \family default values when something is not (yet) working. More explanations can be found in section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-State-of" \end_inset . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter is-split-brain \family default Shows whether split brain (see section \begin_inset CommandInset ref LatexCommand ref reference "sub:Split-Brain-Resolution" \end_inset ) has been detected, or not. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter is-consistent \family default Shows whether the \emph on underlying disk \emph default is in a locally consistent state, i.e. whether it \emph on could \emph default be (potentially) detached and then used for read-only test-mounting \begin_inset Foot status open \begin_layout Plain Layout Notice that the \emph on writeback \emph default at the primary side is out-of-order by default, for performance reasons. Therefore, the underlying disk is only guaranteed to be consistent when there is no data left to be written back. Notice that this condition is racy by construction. When your primary node crashes during writeback and then comes up again, you must do a \family typewriter modprobe mars \family default first in order to automatically replay the transaction logfiles, which will automatically heal such temporary inconsistencies. \end_layout \end_inset . Don't confuse this with the consistency of \family typewriter /dev/mars/mydata \family default , which is by construction \emph on always \emph default locally consistent once it has appeared \begin_inset Foot status open \begin_layout Plain Layout Exceptions are possible when using \family typewriter marsadm fake-sync \family default . Even in split brain situations, \family typewriter marsadm primary --force \family default tries to prevent any further potential exception as best as it can, by not letting \family typewriter /dev/mars/mydata \family default to appear and by insisting on split brain resolution first. In future implementations, this might change if more pressure is put on the developer to sacrifice consistency in preference to not waiting for a full logfile replay. \end_layout \end_inset . By construction of MARS, the disk of secondaries will \emph on always \emph default remain in a locally consistent state once the initial sync has finished as well as the initial logfile replay. Notice that local consistency does not necessarily imply actuality (see high-level explanation in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Behaviour-of-MARS" \end_inset ). \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter is-emergency \family default Shows whether emergency mode (see section \begin_inset CommandInset ref LatexCommand ref reference "sub:Emergency-Mode" \end_inset ) has been entered for the named resource, or not. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter rest-space \family default (global, no resource argument necessary) Shows the \emph on logically \emph default available space in \family typewriter /mars/ \family default , which may deviate from the physically available space as indicated by the \family typewriter df \family default command. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter get-{disk,device} \family default Show the name of the underlying disk, or of the \family typewriter /dev/mars/mydata \family default device (if it is available). \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {disk,device}-present \family default Show (as a boolean value) whether the underlying disk, or the \family typewriter /dev/mars/mydata \family default device, is available. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter device-opened \family default Show (as a number) how often \family typewriter /dev/mars/mydata \family default has been actually openend, e.g. by \family typewriter mount \family default or by some processes like \family typewriter dd \family default , or by iSCSI, etc. \end_layout \begin_layout Subsubsection Intended for Scripting \end_layout \begin_layout Standard While complex macros may output a whole bunch of information, the following primitive macros are outputting exactly one value. They are intended for script use (cf. section \begin_inset CommandInset ref LatexCommand ref reference "sec:Scripting-HOWTO" \end_inset ). Of course, curious humans may also try them :) \end_layout \begin_layout Standard In the following, shell glob notation \family typewriter {a,b} \family default is used to document similar variants of similar macros in a single place. When you actually call the macro, you must choose one of the possible variants (excluding the braces). \end_layout \begin_layout Paragraph Name Querying \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter cluster-members \family default Show a newline-separated list of all host names participating in the cluster. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter resource-members \family default Show a newline-separated list of all host names participating in the particular resource \family typewriter %{res} \family default . Notice that this may be a subset of \family typewriter %cluster-members{} \family default . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {my,all}-resources \family default Show a newline-separated list of either all resource names existing in the cluster, or only those where the current host \family typewriter %{host} \family default is member. Optionally, you may specify the hostname as a parameter, e.g. \family typewriter %my-resources{ \emph on otherhost \emph default } \family default . \end_layout \begin_layout Paragraph Amounts of Data Inquiry \end_layout \begin_layout Standard \begin_inset Float figure placement h wide false sideways false status open \begin_layout Plain Layout \noindent \align center \begin_inset Graphics filename images/fetch-replay-total.fig width 80col% \end_inset \end_layout \begin_layout Plain Layout \begin_inset Caption Standard \begin_layout Plain Layout overview on amounts / cursors \begin_inset CommandInset label LatexCommand label name "fig:overview-on-amounts" \end_inset \end_layout \end_inset \end_layout \end_inset \end_layout \begin_layout Standard \noindent The following macros are meaningful for both primary and secondary nodes: \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter deletable-size \family default Show the total amount of \emph on locally present \emph default logfile data which \emph on could \emph default be deleted by \family typewriter marsadm log-delete-all mydata \family default . This differs almost always from both \family typewriter replay-pos \family default and \family typewriter occupied-size \family default due to granularity reasons (only whole logfiles can be deleted). Units are \emph on bytes \emph default , not kilobytes. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter occupied-size \family default Show the total amount of \emph on locally present \emph default logfile data (sum of all file sizes). This is often roughly approximate to \family typewriter fetch-pos \family default , but it may differ vastly (in both directions) when logfiles are not completely transferred, when some are damaged, during split brain, after a \family typewriter join-resource \family default / \family typewriter invalidate \family default , or when the resource is in emergency mode (see section \begin_inset CommandInset ref LatexCommand ref reference "sub:Emergency-Mode" \end_inset ). \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter disk-size \family default Show the size of the underlying local disk in bytes. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter resource-size \family default Show the logical size of the resource in bytes. When this value is lower than \family typewriter disk-size \family default , you are wasting space. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter device-size \family default At a primary node, this may differ from \family typewriter resource-size \family default only for a very short time during the \family typewriter resize \family default operation. At secondaries, there will be no difference. \end_layout \begin_layout Standard \noindent The following macros are only meaningful for secondary nodes. By information theoretic limits, they can only tell what is \emph on locally known \emph default . They \series bold cannot \series default reflect the \begin_inset Quotes eld \end_inset true (global) state \begin_inset Foot status open \begin_layout Plain Layout Notice that according to Einstein's law, and according to observations by Lamport, the concept of \begin_inset Quotes eld \end_inset true state \begin_inset Quotes erd \end_inset does not exist at all in a distributed system. Anything you can know in a distributed system is always local knowlege, which races with other (remote) knowlege, and may be outdated at \emph on any \emph default time. \end_layout \end_inset \begin_inset Quotes erd \end_inset of a cluster, in particular during network partitions. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {sync,fetch,replay,work}-size \family default Show the total amount of data which is / was to be processed by either sync, fetch, or replay. \family typewriter work-size \family default is equivalent to \family typewriter fetch-size \family default . \family typewriter replay-size \family default is equivalent to \family typewriter fetch-pos \family default (see below). Units are \emph on bytes \emph default , not kilobytes. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {sync,fetch,replay,work}-pos \family default Show the total amount of data which is already processed (current \begin_inset Quotes eld \end_inset cursor \begin_inset Quotes erd \end_inset position). \family typewriter work-pos \family default is equivalent to \family typewriter replay-pos \family default . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset The 0% point is the \emph on locally contiguous \emph default amount of data since the last \family typewriter create-resource \family default , \family typewriter join-resource \family default , or \family typewriter invalidate \family default , or since the last emergency mode, but possibly shortened by \family typewriter log-delete \family default s. Notice that the 0% point may be different on different cluster nodes, because their resource history may be different or non-contiguous during split brain, or after a \family typewriter join-resource \family default , or after \family typewriter invalidate \family default , or during / after emergency mode. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {sync,fetch,replay,work}-rest \family default Shows the difference between \family typewriter *-size \family default and \family typewriter *-pos \family default (amount of work to do). \family typewriter work-rest \family default is therefore the difference between \family typewriter fetch-size \family default and \family typewriter replay-pos \family default , which is the \emph on total \emph default amount of work to do (regardless whether to be fetched and/or to be replayed). \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {sync,fetch,replay,work}-reached \family default Boolean value indicating whether \family typewriter *-rest \family default dropped down to zero \begin_inset Foot status open \begin_layout Plain Layout Recall from chapter \begin_inset CommandInset ref LatexCommand ref reference "chap:Use-Cases-for" \end_inset that MARS (in its current stage of development) does only guarantee local consistency, but cannot guarantee actuality in all imaginable situations. Notice that a general notion of \begin_inset Quotes eld \end_inset actuality \begin_inset Quotes erd \end_inset is \emph on undefinable \emph default in a widely distributed system at all, according to Einstein's laws. \end_layout \begin_layout Plain Layout Let's look at an example. In case of a node crash, and after the node is up again, a \family typewriter modprobe mars \family default has to occur, in order to replay the transaction logs of MARS again. However, at the recovery phase before, the journalling \family typewriter ext4 \family default filesystem \family typewriter /mars/ \family default \emph on may \emph default have rolled back some internal symlink updates which have occurred immediately before the crash. MARS is relying on the fact that journalling filesystems like \family typewriter ext4 \family default should do their recovery in a consistent way, possibly by sacrifycing actuality a little bit. Therefore, the above macros cannot guarantee to deliver true information about what is persisted at the moment. \end_layout \begin_layout Plain Layout Notice that there are further potential caveats. \end_layout \begin_layout Plain Layout In case of \family typewriter {sync,fetch}-reached \family default , MARS uses \family typewriter bio \family default callbacks resp. \family typewriter fdatasync() \family default by default, thus the underlying storage layer has \emph on told \emph default us that it \emph on believes \emph default it has commited the data in a reboot-safe way. Whether this is \emph on really \emph default true does not depend on MARS, but on the lower layers of the storage hierarchy. There exists hardware where this claim is known to be wrong under certain circumstances, such as certain hard disk drives in certain modes of operation. Please check the hardware for any violations of storage semantics under certain circumstances such as power loss, and check information sources like magazines about the problem area. Please notice that such a problem, if it exists at all, is independent from MARS. It would also exist if you wouldn't use MARS on the same system. \end_layout \end_inset . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {fetch,replay,work}-threshold-reached \family default Boolean value indicating whether \family typewriter *-rest \family default dropped down to \family typewriter %{threshold} \family default , which is pre-settable by the \family typewriter --threshold= \emph on size \family default \emph default command line option (default is 10 MiB). In asynchronous use cases of MARS, this should be preferred over \family typewriter *-reached \family default for \emph on human display \emph default , because it produces less flickering by the inevitable replication delay. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {fetch,replay,work}-almost-reached \family default Boolean value indicating whether \family typewriter *-rest \family default \emph on almost \emph default / \emph on approximately \emph default dropped down to zero. The default is that at lease 990 permille are reached. In asynchronous use cases of MARS, this can be preferred over \family typewriter *-reached \family default for \emph on human display \emph default only, because it produces less flickering by the inevitable replication delay. However, don't base any decisions on this! \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {sync,fetch,replay,work}-percent \family default The cursor position \family typewriter *-pos \family default as a percentage of \family typewriter *-size \family default . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {sync,fetch,replay,work}-permille \family default The cursor position \family typewriter *-pos \family default as permille of \family typewriter *-size \family default . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {sync,fetch,replay,work}-rate \family default Show the current throughput in bytes \begin_inset Foot status open \begin_layout Plain Layout Notice that the internal granularity reported by the kernel may be coarser, such as KiB. This interfaces abstracts away from kernel internals and thus presents everything in byte units. \end_layout \end_inset per second. \family typewriter work-rate \family default is the \emph on maximum \emph default of \family typewriter fetch-rate \family default and \family typewriter replay-rate \family default . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {sync,fetch,replay,work}-remain \family default Show the \emph on estimated \emph default remaining time for completion of the respective operation. This is just a very raw guess. Units are seconds. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter summary-vector \family default Show the colon-separated CSV value \family typewriter %replay-pos{}:%fetch-pos{}:%fetch-size{} \family default . \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter replay-basenr \family default Get currently first reachable logfile number (see figure \begin_inset CommandInset ref LatexCommand vref reference "fig:overview-on-amounts" \end_inset ). Only for curious humans or for debugging / monitoring - don't base any decisions on this. Use the \family typewriter *-{pos,size} \family default macros instead. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {replay,fetch,work}-lognr \family default Get current logfile number of replay or fetch position, or of the currently known last reachable number (see figure \begin_inset CommandInset ref LatexCommand vref reference "fig:overview-on-amounts" \end_inset ). Only for curious humans or for debugging / monitoring - don't base any decisions on this. Use the \family typewriter *-{pos,size} \family default macros instead. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {replay,fetch,work}-logcount \family default Get current number of logfiles which are already replayed, or are already fetched, or are to be applied in total (see figure \begin_inset CommandInset ref LatexCommand vref reference "fig:overview-on-amounts" \end_inset ). Only for curious humans or for debugging / monitoring - don't base any decisions on this. Use the \family typewriter *-{rest} \family default macros instead. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter alive-timestamp \family default Tell the Lamport Unix timestamp (seconds since 1970) of the last metadata communication to the designated primary (or to any other host given by the first argument). Returns \begin_inset Formula $-1$ \end_inset if no such host exists. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {fetch,replay,work}-timestamp \family default Tell the Lamport Unix timestamp (seconds since 1970) when the last progress has been made. When no such action exists, \begin_inset Formula $-1$ \end_inset is returned. \family typewriter %work-timestamp{ \emph on hostname \emph default } \family default is the maximum of \family typewriter %fetch-timestamp{ \emph on hostname \emph default } \family default and \family typewriter %replay-timestamp{ \emph on hostname \emph default } \family default . When the parameter \family typewriter \emph on hostname \family default \emph default is empty, the local host will be reported (default). Example usage: \family typewriter marsadm view all --macro= \begin_inset Quotes erd \end_inset %replay-timestamp{%todo-primary{}} \begin_inset Quotes erd \end_inset \family default shows the timestamp of the last reported \begin_inset Foot status open \begin_layout Plain Layout Updates of this information are occurring with lower frequency than actual writebacks, for performance reasons. The metadata network update protocol will add further delays. Therefore, the accuracy is only in the range of minutes. \end_layout \end_inset writeback action at the designated primary. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {alive,fetch,replay,work}-age \family default Tell the number of seconds since the last respective action, or \begin_inset Formula $-1$ \end_inset if none exists. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter {alive,fetch,replay,work}-lag \family default Report the time difference (in seconds) between the last \emph on known \emph default action at the local host and at the designated primary (or between any other hosts when 2 parameters are given). Returns \begin_inset Formula $-1$ \end_inset if no such action exists at any of the two hosts. Attention! This need not reflect the \emph on actual \emph default state in case of networking problems. Don't draw wrong conclusions from a high \family typewriter {fetch,replay}-lag \family default value: it could also mean that simply no write operation at all has occurred at the primary side for a long time. Conversely, a low lag value does not imply that the replication is recent: it may refer to \emph on different \emph default write operations at each of the hosts; therefore it only tells that \emph on some \emph default progress has been made, but says nothing about the amount of the progress. \end_layout \begin_layout Paragraph Misc Informational Status \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter get-primary \family default Return the name of the current designated primary node as locally known. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter actual-primary \family default (deprecated) try to determine the name of the node which \emph on appears \emph default to be the actual primary. This only a \series bold \emph on guess \series default \emph default , because it is not generally unique in split brain situations! Don't use this macro. Instead, use \family typewriter is-primary \family default on those nodes you are interested in. The explanations from section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-State-of" \end_inset also apply to \family typewriter get-primary \family default versus \family typewriter actual-primary \family default analogously. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter is-alive \family default Boolean value indicating whether all other nodes participating in \family typewriter mydata \family default are reachable / healthy. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter uuid \family default (global) Show the unique identifier created by \family typewriter create-cluster \family default or by \family typewriter create-uuid \family default . Hint: this is immutable, and it is firmly bound to the \family typewriter /mars/ \family default filesystem. It can only be destroyed by deleting the whole filesystem (see section \begin_inset CommandInset ref LatexCommand ref reference "leave-cluster" \end_inset ). \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter tree \family default (global) Indicate symlink tree version (see section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Symlink-Tree" \end_inset ). \end_layout \begin_layout Paragraph Experts Only \end_layout \begin_layout Standard The following is for hackers who know what they are doing. The following is not officially supported. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter wait-{is,todo}-{attach,sync,fetch,replay,primary}-{on,off} \family default This may be used to program some useful waiting conditions in advanced macro scripts. Use at your own risk! \end_layout \begin_layout Section Creating your own Macros \begin_inset CommandInset label LatexCommand label name "sub:Creating-your-own" \end_inset \end_layout \begin_layout Standard In order to create your own macros, you could start writing them from scratch with your favorite ASCII text editor. However, it is much easier to take an existing macro and to customize it to your needs. In addition, you can learn something about macro programming by looking at the existing macro code. \end_layout \begin_layout Standard Go to a new empty directory and say \end_layout \begin_layout Itemize \family typewriter marsadm dump-macros \end_layout \begin_layout Standard in order to get the most interesting complex macros, or say \end_layout \begin_layout Itemize \family typewriter marsadm dump-all-macros \end_layout \begin_layout Standard in order to additionally get some primitive macros which could be customized if needed. This will write lots of files \family typewriter *.tpl \family default into your current working directory. \end_layout \begin_layout Standard Any modfied or new macro file should be placed either into the current working directory \family typewriter ./ \family default , or into \family typewriter $HOME/.marsadm/ \family default , or into \family typewriter /etc/marsadm/ \family default . They will be searched in this order, and the first match will win. When no macro file is found, the built-in version will be used if it exists. This way, you may override builtin macros. \end_layout \begin_layout Standard Example: if you have a file \family typewriter ./mymacro.tpl \family default you just need to say \family typewriter marsadm view-mymacro mydata \family default in order to invoke it in the resource context \family typewriter mydata \family default . \end_layout \begin_layout Subsection General Macro Syntax \end_layout \begin_layout Standard Macros are simple ASCII text, enriched with calls to other macros. \end_layout \begin_layout Standard ASCII text outside of comments are copied to the output verbatim. Comments are skipped. Comments may have one of the following well-known forms: \end_layout \begin_layout Itemize \family typewriter # skipped text until / including next newline character \end_layout \begin_layout Itemize \family typewriter // skipped text until / including next newline character \end_layout \begin_layout Itemize \family typewriter /* skipped text including any newline characters */ \end_layout \begin_layout Itemize denoted as Perl regex: \family typewriter \backslash \backslash \backslash n \backslash s* \family default (single backslash directly followed by a newline character, and eating up any whitespace characters at the beginning of the next line) Hint: this may be fruitfully used to structure macros in a more readable form / indentatio n. \end_layout \begin_layout Standard Special characters are always initiated by a backslash. The following pre-defined special character sequences are recognized: \end_layout \begin_layout Itemize \family typewriter \backslash n \family default newline \end_layout \begin_layout Itemize \family typewriter \backslash r \family default return (useful for DOS compatibility) \end_layout \begin_layout Itemize \family typewriter \backslash t \family default tab \end_layout \begin_layout Itemize \family typewriter \backslash f \family default formfeed \end_layout \begin_layout Itemize \family typewriter \backslash b \family default backspace \end_layout \begin_layout Itemize \family typewriter \backslash a \family default alarm (bell) \end_layout \begin_layout Itemize \family typewriter \backslash e \family default escape (e.g. for generating ANSI escape sequences) \end_layout \begin_layout Itemize \family typewriter \backslash \family default followed by anything else: assure that the next character is taken verbatim. Although possible, please don't use this for escaping letters, because further escape sequences might be pre-defined in future. Best practice is to use this only for escaping the backslash itself, or for escaping the percent sign when you don't want to call a macro (protect against evaluation), or to escape a brace directly after a macro call (verbatim brace not to be interpreted as a macro parameter). \end_layout \begin_layout Itemize All other characters stand for their own. If you like, you should be able to produce XML, HTML, JSON and other ASCII-base d output formats this way. \end_layout \begin_layout Standard Macro calls have the following syntax: \end_layout \begin_layout Itemize \family typewriter % \emph on macroname \emph default { \emph on arg1 \emph default }{ \emph on arg2 \emph default }{ \emph on argn \emph default } \end_layout \begin_layout Itemize Of course, arguments may be empty, denoted as \family typewriter {} \end_layout \begin_layout Itemize It is possible to supply more arguments than required. These are simply ignored. \end_layout \begin_layout Itemize There must be always at least 1 argument, even for parameterless macros. In such a case, it is good style to leave it empty (even if it is actually ignored). Just write \family typewriter %parameterlessmacro{} \family default in such a case. \end_layout \begin_layout Itemize \family typewriter %{ \emph on varname \emph default } \family default syntax: As a special case, the macro name may be empty, but then the first argument must denote a previously defined variable (such as assigned via \family typewriter %let{varname}{myvalue} \family default , or a pre-defined standard variable like \family typewriter %{res} \family default for the current resource name, see later paragraph \begin_inset CommandInset ref LatexCommand ref reference "par:Predefined-Variables" \end_inset ). \end_layout \begin_layout Itemize Of course, parameter calls may be (almost) arbitrarily nested. \end_layout \begin_layout Itemize Of course, the \emph on correctness \emph default of nesting of braces must be generally obeyed, as usual in any other macro processor language. General rule: for each opening brace, there must be exactly one closing brace somewhere afterwards. \end_layout \begin_layout Standard These rules are hopefully simple and intuitive. There are currently no exceptions. In particular, there is no special infix operator syntax for arithmetic expressions, and therefore no operator precedence rules are necessary. You have to write nested arithmetic expressions always in the above prefix syntax, like \family typewriter %*{7}{%+{2}{3}} \family default (similar to non-inverse polish notation). \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset When deeply nesting macros and their braces, you may easily find yourself in a feeling like in the good old days of Lisp. Use the above backslash-newline syntax to indent your macros in a readable and structured way. Fortunately, modern text editors like (x)emacs or vim have modes for dealing with the correctness of nested braces. \end_layout \begin_layout Subsection Calling Builtin / Primitive Macros \end_layout \begin_layout Standard Primitive macros can be called in two alternate forms: \end_layout \begin_layout Itemize \family typewriter %primitive- \emph on macroname \emph default { \emph on something \emph default } \end_layout \begin_layout Itemize \family typewriter % \emph on macroname \emph default { \emph on something \emph default } \end_layout \begin_layout Standard When using the \family typewriter %primitive-*{} \family default form, you \emph on explicitly disallow \emph default interception of the call by a \family typewriter *.tpl \family default file. Otherwise, you may override the standard definition even of primitive macros by your own template files. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Notice that \family typewriter %call{} \family default conventions are used in such a case. The parameters are passed via \family typewriter %{0} \family default \begin_inset Formula $\ldots$ \end_inset \family typewriter %{n} \family default variables (see description below). \end_layout \begin_layout Paragraph Standard MARS State Inspection Macros \end_layout \begin_layout Standard These are already described in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Predefined-Trivial-Macros" \end_inset . When calling one of them, the call will simply expand to the corresponding value. \end_layout \begin_layout Standard Example: \family typewriter %get-primary{} \family default will expand to the hostname of the current designated primary node. \end_layout \begin_layout Paragraph Further MARS State Inspection Macros \end_layout \begin_layout Paragraph Variable Access Macros \end_layout \begin_layout Itemize \family typewriter %let{ \emph on varname \emph default }{ \emph on expression \emph default } \family default Evaluates both \family typewriter \emph on varname \family default \emph default and the \family typewriter \emph on expression \family default \emph default . The \family typewriter \emph on expression \family default \emph default is then assigned to \family typewriter varname \family default . \end_layout \begin_layout Itemize \family typewriter %let{ \emph on varname \emph default }{ \emph on expression \emph default } \family default Evaluates both \family typewriter \emph on varname \family default \emph default and the \family typewriter \emph on expression \family default \emph default . The \family typewriter \emph on expression \family default \emph default is then appended to \family typewriter varname \family default (concatenation). \end_layout \begin_layout Itemize \family typewriter %{ \emph on varname \emph default } \family default Evaluates \family typewriter \emph on varname \family default \emph default , and outputs the value of the corresponding variable. When the variable does not exist, the empty string is returned. \end_layout \begin_layout Itemize \family typewriter %{++}{ \emph on varname \emph default } \family default or \family typewriter %{ \emph on varname \emph default }{++} \family default Has the obvious well-known side effect e.g. from C or Java. You may also use \family typewriter -- \family default instead of \family typewriter ++ \family default . This is handy for programming loops (see below). \end_layout \begin_layout Itemize \family typewriter %dump-vars{} \family default Writes all currently defined variables (from the currently active scope) to \family typewriter stderr \family default . This is handy for debugging. \end_layout \begin_layout Paragraph CSV Array Macros \end_layout \begin_layout Itemize \family typewriter %{ \emph on varname \emph default }{ \emph on delimiter \emph default }{ \emph on index \emph default } \family default Evaluates all arguments. The contents of \family typewriter \emph on varname \family default \emph default is interpreted as a comma-separated list, delimited by \family typewriter \emph on delimiter \family default \emph default . The \family typewriter \emph on index \family default \emph default 'th list element is returned. \end_layout \begin_layout Itemize \family typewriter %set{ \emph on varname \emph default }{ \emph on delimiter \emph default }{ \emph on index \emph default }{ \emph on expression \emph default } \family default Evaluates all arguments. The contents of the old \family typewriter \emph on varname \family default \emph default is interpreted as a comma-separated list, delimited by \family typewriter \emph on delimiter \family default \emph default . The \family typewriter \emph on index \family default \emph default 'th list element is the assigend to, or substituted by, \family typewriter \emph on expression \family default \emph default . \end_layout \begin_layout Paragraph Arithmetic Expression Macros \end_layout \begin_layout Standard The following macros can also take more than two arguments, carrying out the corresponding arithmetic operation in sequence (it depends on the operator whether this accords to the associative law). \end_layout \begin_layout Itemize \family typewriter %+{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Evaluates the arguments, inteprets them as numbers, and adds them together. \end_layout \begin_layout Itemize \family typewriter %-{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Subtraction. \end_layout \begin_layout Itemize \family typewriter %*{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Multiplication. \end_layout \begin_layout Itemize \family typewriter %/{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Division. \end_layout \begin_layout Itemize \family typewriter %%{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Modulus. \end_layout \begin_layout Itemize \family typewriter %&{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Bitwise Binary And. \end_layout \begin_layout Itemize \family typewriter %|{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Bitwise Binary Or. \end_layout \begin_layout Itemize \family typewriter %^{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Bitwise Binary Exclusive Or. \end_layout \begin_layout Itemize \family typewriter %<<{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Binary Shift Left. \end_layout \begin_layout Itemize \family typewriter %>>{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Binary Shift Right. \end_layout \begin_layout Itemize \family typewriter %min{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Compute the arithmetic minimum of the arguments. \end_layout \begin_layout Itemize \family typewriter %max{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Compute the arithmetic maximum of the arguments. \end_layout \begin_layout Paragraph Boolean Condition Macros \end_layout \begin_layout Itemize \family typewriter %=={ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Numeral Equality. \end_layout \begin_layout Itemize \family typewriter %!={ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Numeral Inequality. \end_layout \begin_layout Itemize \family typewriter %<{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Numeral Less Then. \end_layout \begin_layout Itemize \family typewriter %<={ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Numeral Less or Equal. \end_layout \begin_layout Itemize \family typewriter %>{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Numeral Greater Then. \end_layout \begin_layout Itemize \family typewriter %>={ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Numeral Greater or Equal. \end_layout \begin_layout Itemize \family typewriter %eq{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default \begin_inset space ~ \end_inset String Equality. \end_layout \begin_layout Itemize \family typewriter %ne{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default String Inequality. \end_layout \begin_layout Itemize \family typewriter %lt{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default String Less Then. \end_layout \begin_layout Itemize \family typewriter %le{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default String Less or Equal. \end_layout \begin_layout Itemize \family typewriter %gt{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default String Greater Then. \end_layout \begin_layout Itemize \family typewriter %ge{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default String Greater or Equal. \end_layout \begin_layout Itemize \family typewriter %=~{ \emph on string \emph default }{ \emph on regex \emph default }{ \emph on opts \emph default } \family default or \family typewriter %match{ \emph on string \emph default }{ \emph on regex \emph default }{ \emph on opts \emph default } \family default Checks whether \family typewriter \emph on string \family default \emph default matches the Perl regular expression \family typewriter \emph on regex \family default \emph default . Modifiers can be given via \family typewriter \emph on opts \family default \emph default . \end_layout \begin_layout Paragraph Shortcut Evaluation Operators \end_layout \begin_layout Standard The following operators evaluate their arguments only when needed (like in C). \end_layout \begin_layout Itemize \family typewriter %&&{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Logical And. \end_layout \begin_layout Itemize \family typewriter %and{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Alias for \family typewriter %&&{} \family default . \end_layout \begin_layout Itemize \family typewriter %||{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Logical Or. \end_layout \begin_layout Itemize \family typewriter %or{ \emph on arg1 \emph default }{ \emph on arg2 \emph default } \family default Alias for \family typewriter %||{} \family default . \end_layout \begin_layout Paragraph Unary Operators \end_layout \begin_layout Itemize \family typewriter %!{ \emph on arg \emph default } \family default Logical Not. \end_layout \begin_layout Itemize \family typewriter %not{ \emph on arg \emph default } \family default Alias for \family typewriter %!{} \family default . \end_layout \begin_layout Itemize \family typewriter %~{ \emph on arg \emph default } \family default Bitwise Ńegation. \end_layout \begin_layout Paragraph String Functions \end_layout \begin_layout Itemize \family typewriter %length{ \emph on string \emph default } \family default Return the number of ASCII characters present in \family typewriter \emph on string \family default \emph default . \end_layout \begin_layout Itemize \family typewriter %toupper{ \emph on string \emph default } \family default Return all ASCII characters converted to uppercase. \end_layout \begin_layout Itemize \family typewriter %tolower{ \emph on string \emph default } \family default Return all ASCII characters converted to lowercase. \end_layout \begin_layout Itemize \family typewriter %append{ \emph on varname \emph default }{ \emph on string \emph default } \family default Equivalent to \family typewriter %let{ \emph on varname \emph default }{%{ \emph on varname \emph default } \emph on string \emph default } \family default . \end_layout \begin_layout Itemize \family typewriter %subst{ \emph on string \emph default }{ \emph on regex \emph default }{ \emph on subst \emph default }{ \emph on opts \emph default } \family default Perl regex substitution. \end_layout \begin_layout Itemize \family typewriter %sprintf{ \emph on fmt \emph default }{ \emph on arg1 \emph default }{ \emph on arg2 \emph default }{ \emph on argn \emph default } \family default Perl \family typewriter sprintf() \family default operator. Details see Perl manual. \end_layout \begin_layout Itemize \family typewriter %human-number{ \emph on unit \emph default }{ \emph on delim \emph default }{ \emph on unit-sep \emph default }{ \emph on number \emph default 1}{ \emph on number \emph default 2} \begin_inset Formula $\ldots$ \end_inset \family default Convert a number or a list of numbers into human-readable \family typewriter B \family default , \family typewriter KiB \family default , \family typewriter MiB \family default , \family typewriter GiB \family default , \family typewriter TiB \family default , as given by \family typewriter \emph on unit \family default \emph default . When \family typewriter \emph on unit \family default \emph default is empty, a reasonable unit will be guessed automatically from the maximum of all given numbers. A single result string is produced, where multiple numbers are separated by \family typewriter \emph on delim \family default \emph default when necessary. When \family typewriter \emph on delim \family default \emph default is empty, the slash symbol \family typewriter / \family default is used by default (the most obvious use case is result strings like \family typewriter \begin_inset Quotes eld \end_inset 17/32 KiB \begin_inset Quotes erd \end_inset \family default ). The final unit text is separated from the previous number(s) by \family typewriter \emph on unit-sep \family default \emph default . When \family typewriter \emph on unit-sep \family default \emph default is empty, a single blank is used by default. \end_layout \begin_layout Itemize \family typewriter %human-seconds{ \emph on number \emph default } \family default Convert the given number of seconds into \family typewriter hh:mm:ss \family default format. \end_layout \begin_layout Paragraph Complex Helper Macros \end_layout \begin_layout Itemize \family typewriter %progress{20} \family default Return a string containing a progress bar showing the values from \family typewriter %summary-vector{} \family default . The default width is 20 characters plus two braces. \end_layout \begin_layout Itemize \family typewriter %progress{20}{ \emph on minvalue \emph default }{ \emph on midvalue \emph default }{ \emph on maxvalue \emph default } \family default Instead of taking the values from \family typewriter %summary-vector{} \family default , use the supplied values. \family typewriter minvalue \family default and \family typewriter midvalue \family default indicate two different intermediate points, while \family typewriter maxvalue \family default will determine the 100% point. \end_layout \begin_layout Paragraph Control Flow Macros \end_layout \begin_layout Itemize \family typewriter %if{ \emph on expression \emph default }{ \emph on then-part \emph default } \family default or \family typewriter %if{ \emph on expression \emph default }{ \emph on then-part \emph default }{ \emph on else-part \emph default } \family default Like in any other macro or programming language, this evaluates the \family typewriter expression \family default once, not copying its outcome to the output. If the result is non-empty and is not a string denoting the number \family typewriter 0 \family default , the \family typewriter \emph on then-part \family default \emph default is evaluated and copied to the output. Otherwise, the \family typewriter else-part \family default is evaluated and copied, provided that one exists. \end_layout \begin_layout Itemize \family typewriter %unless{ \emph on expression \emph default }{ \emph on then-part \emph default } \family default or \family typewriter %unless{ \emph on expression \emph default }{ \emph on then-part \emph default }{ \emph on else-part \emph default } \family default Like \family typewriter %if{} \family default , but the expression is logically negated. Essentially, this is a shorthand for \family typewriter %if{%not{expression}}{...} \family default or similar. \end_layout \begin_layout Itemize \family typewriter %elsif{ \emph on expr1 \emph default }{ \emph on then1 \emph default }{ \emph on expr2 \emph default }{ \emph on then2 \emph default } \family default \begin_inset Formula $\ldots$ \end_inset or \family typewriter %elsif{ \emph on expr1 \emph default }{ \emph on then1 \emph default }{ \emph on expr2 \emph default }{ \emph on then2 \emph default } \family default \begin_inset Formula $\ldots$ \end_inset \family typewriter { \emph on odd-else-part \emph default } \family default This is for simplification of boring if-else-if chains. The classical if-syntax (as shown above) has the drawback that inner if-parts need to be nested into outer else-parts, so rather deep nestings may occur when you are programming longer chains. This is an alternate syntax for avoidance of deep nesting. When giving an odd number of arguments, the last argument is taken as final else-part. \end_layout \begin_layout Itemize \family typewriter %elsunless \family default \begin_inset Formula $\ldots$ \end_inset Like \family typewriter %elsif \family default , but \emph on all \emph default conditions are negated. \end_layout \begin_layout Itemize \family typewriter %while{ \emph on expression \emph default }{ \emph on body \emph default } \family default Evaluates the \family typewriter \emph on expression \family default \emph default in a while loop, like in any other macro or programming language. The \family typewriter \emph on body \family default \emph default is evaluated exactly as many times as the \family typewriter \emph on expression \family default \emph default holds. Notice that endless loops can be only avoided by a calling a non-pure macro inspecting external state information, or by creating (and checking) another side effect somewhere, like assigning to a variable somewhere. \end_layout \begin_layout Itemize \family typewriter %until{ \emph on expression \emph default }{ \emph on body \emph default } \family default Like \family typewriter %while{ \emph on expression \emph default }{ \emph on body \emph default } \family default , but negate the expression. \end_layout \begin_layout Itemize \family typewriter %for{ \emph on exp \emph default r1}{ \emph on exp \emph default r2}{ \emph on exp \emph default r3}{ \emph on body \emph default } \family default As you will expect from the corresponding C, Perl, Java, or (add your favorite language) construct. Only the syntactic sugar is a little bit different. \end_layout \begin_layout Itemize \family typewriter %foreach{ \emph on varname \emph default }{ \emph on CSV-delimited-string \emph default }{ \emph on delimiter \emph default }{ \emph on body \emph default } \family default As you can expect from similar \family typewriter foreach \family default constructs in other languages like Perl. Currently, the macro processor has no arrays, but can use comma-separated strings as a substitute. \end_layout \begin_layout Itemize \family typewriter %eval{ \emph on count \emph default }{ \emph on body \emph default } \family default Evaluates the \family typewriter \emph on body \family default \emph default exactly as many times as indicated by the numeric argument \family typewriter \emph on count \family default \emph default . This may be used to re-evaluate the output of other macros once again. \end_layout \begin_layout Itemize \family typewriter %protect{ \emph on body \emph default } \family default Equivalent to \family typewriter %eval{0}{ \emph on body \emph default } \family default , which means that the body is not evaluated at all, but copied to the output verbatim \begin_inset Foot status open \begin_layout Plain Layout \begin_inset ERT status open \begin_layout Plain Layout \backslash TeX \end_layout \end_inset \begin_inset space ~ \end_inset or \begin_inset ERT status open \begin_layout Plain Layout \backslash LaTeX \end_layout \end_inset \begin_inset space ~ \end_inset fans usually know what this is good for ;) \end_layout \end_inset . \end_layout \begin_layout Itemize \family typewriter %eval-down{ \emph on body \emph default } \family default Evaluates the \family typewriter \emph on body \family default \emph default in a loop until the result does not change any more \begin_inset Foot status open \begin_layout Plain Layout Mathematicians knowing Banach's fixedpoint theorem will know what this is good for ;) \end_layout \end_inset . \end_layout \begin_layout Itemize \family typewriter %tmp{ \emph on body \emph default } \family default Evaluates the \family typewriter \emph on body \family default \emph default once in a temporary scope which is thrown away afterwards. \end_layout \begin_layout Itemize \family typewriter %call{ \emph on macroname \emph default }{ \emph on arg1 \emph default }{ \emph on arg2 \emph default }{ \emph on argn \emph default } \family default Like in many other macro languages, this evaluates the named macro in the a new scope. This means that any side effects produced by the called macro, such as variable assignments, will be reverted after the call, and therefore not influence the old scope. However notice that the arguments \family typewriter \emph on arg1 \family default \emph default to \family typewriter \emph on argn \family default \emph default are evaluted in the \emph on old \emph default scope before the call actually happens (possibly producing side effects if they contain some), and their result is respectively assigned to \family typewriter %{1} \family default until \family typewriter %{ \emph on n \emph default } \family default in the new scope, analogously to the Shell or to Perl. In addition, the new \family typewriter %{0} \family default gets the \family typewriter \emph on macroname \family default \emph default . Notice that the argument evaluation happens non-lazily in the old scope and therefore differs from other macro processors like \begin_inset ERT status open \begin_layout Plain Layout \backslash TeX \end_layout \end_inset . \end_layout \begin_layout Itemize \family typewriter %include{ \emph on macroname \emph default }{ \emph on arg1 \emph default }{ \emph on arg2 \emph default }{ \emph on argn \emph default } \family default Like \family typewriter %call{} \family default , but evaluates the named macro in the \emph on current \emph default scope (similar to the \family typewriter source \family default command of the bourne shell). This means that any side effects produced by the called macro, such as variable assignments, will \emph on not \emph default be reverted after the call. Even the \family typewriter %{0} \family default until \family typewriter %{ \emph on n \emph default } \family default variables will continue to exist (and may lead to confusion if you aren't aware of that). \end_layout \begin_layout Itemize \family typewriter %callstack{} \family default Useful for debugging: show the current chain of macro invocations. \end_layout \begin_layout Paragraph Time Handling Macros \end_layout \begin_layout Itemize \family typewriter %time{} \family default Return the current Lamport timestamp (see section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Lamport-Clock" \end_inset ), in units of seconds since the Unix epoch. \end_layout \begin_layout Itemize \family typewriter %sleep{ \emph on seconds \emph default } \family default Pause the given number of seconds. \end_layout \begin_layout Itemize \family typewriter %timeout{ \emph on seconds \emph default } \family default Like \family typewriter %sleep{ \emph on seconds \emph default } \family default , but abort the \family typewriter marsadm \family default command after the total waiting time has exceeded the timeout given by the \family typewriter --timeout= \family default parameter. \end_layout \begin_layout Paragraph Misc Macros \end_layout \begin_layout Itemize \family typewriter %warn{ \emph on text \emph default } \family default Show a WARNING: \end_layout \begin_layout Itemize \family typewriter %die{ \emph on text \emph default } \family default Abort execution with an error message. \end_layout \begin_layout Paragraph Experts Only - Risky \end_layout \begin_layout Standard The following macros are unstable and may change at any time without notice. \end_layout \begin_layout Itemize \family typewriter %get-msg{ \emph on name \emph default } \family default Low-level access to system messages. You should not use this, since this is not extensible (you must know the name in advance). \end_layout \begin_layout Itemize \family typewriter %readlink{ \emph on path \emph default } \family default Low-level access to symlinks. Don't misuse this for circumvention of the abstraction macros from the symlink tree! \end_layout \begin_layout Itemize \family typewriter %setlink{ \emph on value \emph default }{ \emph on path \emph default } \family default Low-level creation of symlinks. Don't misuse this for circumvention of the abstraction macros for the symlink tree! \end_layout \begin_layout Itemize \family typewriter %fetch-info{} \family default etc. Low-level access to internal symlink formats. Don't use this in scripts! Only for curious humans. \end_layout \begin_layout Itemize \family typewriter %is-almost-consistent{} \family default Whatever you guess what this could mean, don't use it, at least never in place of \family typewriter %is-consistent{} \family default - it is risky to base decisions on this. Mostly for historical reasons. \end_layout \begin_layout Itemize \family typewriter %does{ \emph on name \emph default } \family default Equivalent to \family typewriter %is- \emph on name \emph default {} \family default (just more handy for computing the macro name). Use with care! \end_layout \begin_layout Subsection Predefined Variables \begin_inset CommandInset label LatexCommand label name "par:Predefined-Variables" \end_inset \end_layout \begin_layout Itemize \family typewriter %{cmd} \family default The command argument of the invoked \family typewriter marsadm \family default command. \end_layout \begin_layout Itemize \family typewriter %{res} \family default The resource name given to the \family typewriter marsadm \family default command as a command line parameter (or, possibly expanded from \family typewriter all \family default ). \end_layout \begin_layout Itemize \family typewriter %{resdir} \family default The corresponding resource directory. The current version of MARS uses \family typewriter /mars/resource-%{res}/ \family default , but this may change in future. Normally, you should not need this, since anything should be already abstracted for you. In case you \emph on really \emph default need low-level access to something, please prefer this variable over \family typewriter %{mars}/resource-%{res} \family default because it is a bit more abstracted. \end_layout \begin_layout Itemize \family typewriter %{mars} \family default Currently the fixed string \family typewriter /mars \family default . This may change in future, probably with the advent of MARS Full. \end_layout \begin_layout Itemize \family typewriter %{host} \family default The hostname of the local node. \end_layout \begin_layout Itemize \family typewriter %{ip} \family default The IP address of the local node. \end_layout \begin_layout Itemize \family typewriter %{timeout} \family default The value given by the \family typewriter --timeout= \family default option, or the corresonding default value. \end_layout \begin_layout Itemize \family typewriter %{threshold} \family default The value given by the \family typewriter --threshold= \family default option, or the corresonding default value. \end_layout \begin_layout Itemize \family typewriter %{window} \family default The value given by the \family typewriter --window= \family default option, or the corresonding default value. \end_layout \begin_layout Itemize \family typewriter %{force} \family default The number of times the \family typewriter --force \family default option has been given. \end_layout \begin_layout Itemize \family typewriter %{dry-run} \family default The number of times the \family typewriter --dry-run \family default option has been given. \end_layout \begin_layout Itemize \family typewriter %{verbose} \family default The number of times the \family typewriter --verbose \family default option has been given. \end_layout \begin_layout Itemize \family typewriter %{callstack} \family default Same as the \family typewriter %callstack{} \family default macro. The latter gives you an opportunity for overriding, while the former is firmly built in. \end_layout \begin_layout Section Scripting HOWTO \begin_inset CommandInset label LatexCommand label name "sec:Scripting-HOWTO" \end_inset \end_layout \begin_layout Standard Both the \series bold asynchronous communication model \series default of MARS (cf section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Lamport-Clock" \end_inset ) including the Lamport clock, and the \series bold state model \series default (cf section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-State-of" \end_inset ) is something you \emph on definitely \emph default should have in mind when you want to do some scripting. Here is some further concrete advice: \end_layout \begin_layout Itemize Don't access anything on \family typewriter /mars/ \family default directly, except for debugging purposes. Use \family typewriter marsadm \family default . \end_layout \begin_layout Itemize Avoid running scripts in parallel, other than for inspection / monitoring purposes. When you give two \family typewriter marsadm \family default commands in parallel (whether on the same host, or on different hosts belonging to the same cluster), it is very likely to produce a mess. \family typewriter marsadm \family default has no internal locking. There is no cluster-wide locking at all. Unfortunately, some systems like Pacemaker are violating this in many cases (depending on their configuration). Best is if you have a dedicated / more or less centralized \series bold control machine \series default which controls masses of your georedundant working servers. This reduces the risk of running interfering actions in parallel. Of course, you need backup machines for your control machines, and in different locations. Not obeying this advice can easily lead to problems such as complex races which are very difficult to solve in long-distance distributed systems, even in general (not limited to MARS). \end_layout \begin_layout Itemize \family typewriter marsadm wait-cluster \family default is your friend. Whenever your (near-)central script has to switch between different hosts \family typewriter A \family default and \family typewriter B \family default (of the same cluster), use it in the following way: \begin_inset Newline newline \end_inset \family typewriter ssh A \begin_inset Quotes eld \end_inset marsadm action1 \begin_inset Quotes erd \end_inset ; ssh B \begin_inset Quotes eld \end_inset marsadm wait-cluster; marsadm action2 \begin_inset Quotes erd \end_inset \begin_inset Newline newline \end_inset \family default \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Don't ignore this advice! Interference is almost \emph on sure \emph default ! As a rule of thumb, precede almost any action command with some appropriate waiting command! \end_layout \begin_layout Itemize Further friends are any \family typewriter marsadm wait-* \family default commands, such as \family typewriter wait-umount \family default . \end_layout \begin_layout Itemize In some places, busy-wait loops might be needed, e.g. for waiting until a specific resource is \family typewriter UpToDate \family default or matches some other condition. Examples of waiting conditions can be found under \family typewriter github.com/schoebel/test-suite \family default in subdirectory \family typewriter mars/modules/ \family default , specifically \family typewriter 02_predicates.sh \family default or similar. \end_layout \begin_layout Itemize In case of network problems, some command may hang (forever), if you don't set the \family typewriter --timeout= \family default option. Don't forget the check the return state of any failed / timeouted commands, and to take appropriate measures! \end_layout \begin_layout Itemize Test your scripts in failure scenarios! \end_layout \begin_layout Chapter The Sysadmin Interface ( \family typewriter marsadm \family default and \family typewriter /proc/sys/mars/ \family default ) \family typewriter \begin_inset CommandInset label LatexCommand label name "chap:The-Sysadmin-Interface" \end_inset \end_layout \begin_layout Standard In general, the term \begin_inset Quotes eld \end_inset after a while \begin_inset Quotes erd \end_inset means that other cluster nodes will take notice of your actions according to the \begin_inset Quotes eld \end_inset eventually consistent \begin_inset Quotes erd \end_inset propagation protocol described in sections \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Lamport-Clock" \end_inset and \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Symlink-Tree" \end_inset . Please be aware that this \begin_inset Quotes eld \end_inset while \begin_inset Quotes erd \end_inset may last very long in case of network outages or bad firewall rules. \end_layout \begin_layout Standard In the following tables, column \begin_inset Quotes eld \end_inset Cmp \begin_inset Quotes erd \end_inset means compatibility with DRBD. Please note that 100% exact compatibility is not possible, because of the asynchronous communication paradigm. \end_layout \begin_layout Standard The following table documents common options which work with (almost) any command: \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Option \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize --dry-run \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Run the command without actually creating symlinks or touching files or executing rsync. This option \emph on should \emph default be used first at any dangerous command, in order to check what would happen. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Don't use in scripts! Only use by hand! \end_layout \begin_layout Plain Layout \size scriptsize This option does not change the waiting logic. Many commands are waiting until the desired effect has taken place. However, with \family typewriter --dry-run \family default the desired effect will never happen, so the command may wait forever (or abort with a timeout). \end_layout \begin_layout Plain Layout \size scriptsize In addition, this option can lead to additional aborts of the commands due to unmet conditions, which cannot be met because the symlinks are not actually created / altered. \end_layout \begin_layout Plain Layout \size scriptsize Thus this option can give only a \series bold rough estimate \series default of what would happen later! \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize --force \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize almost \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Some preconditions are skipped, i.e. the command will / should work although some (more or less) vital preconditions are violated. \end_layout \begin_layout Plain Layout \size scriptsize Instead of giving \family typewriter --force \family default , you may alternatively prefix your command with \family typewriter force- \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset THIS OPTION IS DANGEROUS! \end_layout \begin_layout Plain Layout \size scriptsize Use it only when you are absolutely sure that you know what you are doing! \end_layout \begin_layout Plain Layout \size scriptsize Use it only as a last resort if the same command without \family typewriter --force \family default has failed \emph on for no good reason \emph default ! \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize --verbose \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Some (few) commands will become more speaky. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize --timeout=$seconds \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Some commands require response from either the local kernel module, or from other cluster nodes. In order to prevent infinite waiting in case of network outages or other problems, the command will fail after the given timeout has been reached. \end_layout \begin_layout Plain Layout \size scriptsize When $seconds is -1, the command will wait forever. \end_layout \begin_layout Plain Layout \size scriptsize When $seconds is 0, the command will not wait in case any precondition is not met, und abort without performing an action.. \end_layout \begin_layout Plain Layout \size scriptsize The default timeout is 5s. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize --window=$seconds \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize The time window for checking the aliveness of other nodes in the network. When no symlink updates have occurred during the last window, the node is considered dead. Default is 30s \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize --threshold=$size \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize The macros containing the substring \family typewriter -threshold- \family default or \family typewriter -almost- \family default are using this as a default value for approximation whether something has been approximately reached. Default is 10MiB. \end_layout \begin_layout Plain Layout \size scriptsize The $size argument may be a number optionally followed by one the lowercase characters k m g t p for indicating kilo mega giga tera or peta bytes as multiples of 1000. When using the corresponding uppercase character, multiples of 1024 are formed instead. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize --host=$host \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize The command acts as if the command were executed on another host $host. This option should not be used regularly, because the local information in the symlink tree may be outdated or even wrong. Additionally, some local information like remote sizes of physical devices (e.g. remote disks) is not present in the symlink tree at all, or is wrong (reflectin g only the \emph on local \emph default state). \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset THIS OPTION IS DANGEROUS! \end_layout \begin_layout Plain Layout \size scriptsize Use it only for final destruction of dead cluster nodes, see section \begin_inset CommandInset ref LatexCommand ref reference "sub:Final-Destroy-of" \end_inset . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize --ip=$ip \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize By default, \family typewriter marsadm \family default always uses the IP for \family typewriter $host \family default as stored in the symlink tree (directory \family typewriter /mars/ips/ \family default ). When such an IP entry does not (yet) exist (e.g. \family typewriter create-cluster \family default or \family typewriter join-cluster \family default ), all local network interfaces are automatically scanned for IPv4 adresses, and the first one is taken. This may lead to wrong decisions if you have multiple network interfaces. \end_layout \begin_layout Plain Layout \size scriptsize In order to override the automatic IP detection and.to explicitly tell the IP address of your storage network, use this option. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize Usually you will need this only at \family typewriter {create,join}-cluster \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize --verbose \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Some (few) commands will become more speaky. \end_layout \end_inset \end_layout \end_inset \end_inset \end_layout \begin_layout Section Cluster Operations \begin_inset CommandInset label LatexCommand label name "sec:Cluster-Operations" \end_inset \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize create-cluster \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the \family typewriter /mars/ \family default filesystem must be mounted and it must be empty ( \family typewriter mkfs.ext4 \family default , see instructions in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Setup-your-Cluster" \end_inset ). The kernel module must \emph on not \emph default be loaded. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: the initial symlink tree is created in \family typewriter /mars/ \family default . Additionally, the \family typewriter /mars/uuid \family default symlink is created for later distribution in the cluster. It uniquely indentifies the cluster in the world. \end_layout \begin_layout Plain Layout \size scriptsize This must be called exactly once at the initial primary. \end_layout \begin_layout Plain Layout Hint: use the \family typewriter --ip= \family default option if you have multiple interfaces. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize join-cluster \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $host \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the \family typewriter /mars/ \family default filesystem must be mounted and it must be empty ( \family typewriter mkfs.ext4 \family default , see instructions in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Setup-your-Cluster" \end_inset ). The kernel module must \emph on not \emph default be loaded. The cluster must have been already created at another node \family typewriter $host \family default . A working ssh connecttion to $host as root must exist (without password). \family typewriter rsync \family default must be installed at all cluster nodes. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: the initial symlink tree \family typewriter /mars/ \family default is replicated from the remote host \family typewriter $host \family default , and the local host has been added as another cluster member. \end_layout \begin_layout Plain Layout \size scriptsize This must be called exactly once at every initial secondary node. \end_layout \begin_layout Plain Layout Hint: use the \family typewriter --ip= \family default option if you have multiple interfaces. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize leave-cluster \begin_inset CommandInset label LatexCommand label name "leave-cluster" \end_inset \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the \family typewriter /mars/ \family default filesystem must be mounted and it must contain a valid MARS symlink tree produced by the other \family typewriter marsadm \family default commands. The local node must no longer be member of any resource (see \family typewriter marsadm leave-resource \family default ). The kernel module should be loaded and the network should be operating in order to also propogate the effect to the other nodes. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: the local node is removed from the replicated symlink tree \family typewriter /mars/ \family default such that other nodes will cease to communicate with it after a while. The converse it not true: the local node may continue \begin_inset Foot status open \begin_layout Plain Layout \size scriptsize Reason: \family typewriter leave-cluster \family default removes only its \emph on own \emph default IP address from \family typewriter /mars/ips/ \family default , but does not destroy the usual symmetry of the symlink tree by leaving the other IPs intact. Therefore, the local node will continue fetching updates from all nodes present in \family typewriter /mars/ips/ \family default . As an effect, the local node will \emph on passively \emph default mirror the symlinks of other cluster members, but not vice versa. There is no communication from the local node to the other ones, turning the local node into a \series bold whitness \series default according to some terminology from Distributed Systems. This is a feature, not a bug. It could be used for porst-mortem analysis, or for monitoring purposes. However, \emph on deletions \emph default of symlinks are not guaranteed to take place, so your whitness may \emph on accumulate \emph default thousands of old symlinks over a long time. If you want to eventually stop all communication to the local node, just run \family typewriter rmmod \family default . \end_layout \end_inset passivley fetching the symlink tree. In order to really stop all communication, the kernel module should be unloaded afterwards. The local \family typewriter /mars/ \family default filesystem may be manually destroyed after that (at least if you need to reuse it). \end_layout \begin_layout Plain Layout \size scriptsize In case of an eventual node loss (e.g. fire, water, ...) this command should be used on another node $helper in order to finally remove $damaged from the cluster via the command \family typewriter marsadm leave-cluster --host=$damaged --force \family default . \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize In case you cannot use \family typewriter leave-resource \family default for any reason, you may do the following: just destroy the \family typewriter /mars/ \family default filesystem on the host \family typewriter $deadhost \family default you want to remove (e.g. by \family typewriter mkfs \family default ), or take other measures to \emph on ensure \emph default that it cannot be accidentally re-used in any way (e.g. physical destruction of the underlying RAID, \family typewriter lvremove \family default , etc). On all other hosts, do \family typewriter rmmod mars \family default , then delete the symlink \family typewriter /mars/ips/ip-$deadhost \family default everywhere by hand, and finally \family typewriter modprobe mars \family default again. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset \size scriptsize Notice that the last \family typewriter leave-resource \family default operation does not delete the cluster as such. It just creates an \emph on empty \emph default cluster which has no longer any members. In particular, the cluster ID \family typewriter /mars/uuid \family default is \emph on not \emph default removed, deliberately \begin_inset Foot status open \begin_layout Plain Layout \size scriptsize This is a feature, not a bug. The \family typewriter uuid \family default is created once, but never alterered anywhere. The only way to get rid of it is \emph on external \emph default deletion (not by \family typewriter marsadm \family default ) \emph on together(!) \emph default with all other contents of \family typewriter /mars/ \family default . This prevents you from accidentally merging half-dead remains which could have survived a disaster for any reason, such as snapshotting filesystems / VMs or whatever. \end_layout \end_inset . \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize Before you can re-use \emph on any \emph default left-over \family typewriter /mars/ \family default filesystem for creating / joining a new / different cluster, you \emph on must \emph default obey the instructions in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Setup-your-Cluster" \end_inset and use \family typewriter mkfs.ext4 \family default accordingly. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize wait-cluster \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize See section \begin_inset CommandInset ref LatexCommand ref reference "sub:Waiting" \end_inset . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize create-uuid \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout Deprecated. Only for compatibility with old version light0.1beta05 or earlier. \end_layout \begin_layout Plain Layout \size scriptsize Precondition: the \family typewriter /mars/ \family default filesystem must be mounted. A \family typewriter uuid \family default (such as automatically created by recent versions of \family typewriter marsadm create-cluster \family default ) must not already exist; i.e. you have a very old and outdated symlink tree. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: the \family typewriter /mars/uuid \family default symlink is created for later distribution in the cluster. It uniquely indentifies the cluster in the world. \end_layout \begin_layout Plain Layout \size scriptsize This must be called at most once at the current primary. \end_layout \end_inset \end_layout \end_inset \end_inset \end_layout \begin_layout Section Resource Operations \begin_inset CommandInset label LatexCommand label name "sec:Resource-Operations" \end_inset \end_layout \begin_layout Standard Common precondition for all resource operations is that the \family typewriter /mars/ \family default filesystem is mounted, that it contains a valid MARS symlink tree produced by other \family typewriter marsadm \family default commands (including a unique \family typewriter uuid \family default ), that your current node is a valid member of the cluster, and that the kernel module is loaded. When communication is impossible due to network outages or bad firewall rules, most commands will succeed, but other cluster nodes may take a long time to notice your changes. \end_layout \begin_layout Standard Instead of executing \family typewriter marsadm \family default commands serveral times for each resource argument, you may give the special resource argument \family typewriter all \family default . This work even when combined with \family typewriter --force \family default , but be cautious when giving dangerous command combinations like \family typewriter marsadm delete-resource --force all \family default . \end_layout \begin_layout Standard \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Beware when combining this with \family typewriter --host=somebody \family default . In some very rare cases, like final destruction of a whole datacenter after an earthquake, you might need a combination like \family typewriter marsadm --host=defective delete-resource --force all \family default . Don't use such combinations if you don't need them \emph on really \emph default ! You can easily shoot yourself in your head if you are not carefully operating such commands! \end_layout \begin_layout Subsection Resource Creation / Deletion / Modification \begin_inset CommandInset label LatexCommand label name "sub:Resource-Creation" \end_inset \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize create-resource \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $disk_dev \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset [$mars_name] \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset [$size] \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the resource argument \family typewriter $res \family default must not denote an already existing resource name in the cluster. The argument \family typewriter $disk_dev \family default must denote an absolute path to a usable local block device, its size must be greater zero. When the optional \family typewriter $mars_name \family default is given, that name must not already exist on the local node; when not given, \family typewriter $mars_name \family default defaults to \family typewriter $res \family default . When the optional \family typewriter $size \family default argument is given, it must be a number, optionally followed by a lowercase suffix \family typewriter k \family default , \family typewriter m \family default , \family typewriter g \family default , \family typewriter t \family default , or \family typewriter p \family default (denoting size factors as multiples of 1000), or an uppercase suffix \family typewriter K \family default , \family typewriter M \family default , \family typewriter G \family default , \family typewriter T \family default or \family typewriter P \family default (denoting size factors as multiples of 1024). The given size must not exceed the actual size of \family typewriter $disk_dev \family default . It will specify the future resource size as shown by \family typewriter marsadm view-resource-size $res \family default . \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: the resource \family typewriter $res \family default is created, the inital role of the current node is primary. The corresponding symlink tree information is asynchonously distributed in the cluster (in the background). The device \family typewriter /dev/mars/$mars_name \family default should appear after a while. \end_layout \begin_layout Plain Layout \size scriptsize Notice: when \family typewriter $size \family default is strictly smaller than the size of \family typewriter $disk_dev \family default , you will unnecessarily waste some space.. \end_layout \begin_layout Plain Layout \size scriptsize This must be called exactly once for any new resource. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize join-resource \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $disk_dev \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset [$mars_name] \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the resource argument \family typewriter $res \family default must denote an already existing resource in the cluster (i.e. its symlink tree information must have been received). The resource must have a designated primary, and it must no be in emergency mode. There must not exist a split brain in the cluster. The local node must not be already member of that resource. The argument \family typewriter $disk_dev \family default must denote an absolute path to a usable (but currently unused) local block device, its size must be greater or equal to the logical size of the resource. When the optional \family typewriter $mars_name \family default is given, that name must not already exist on the local node; when not given, \family typewriter $mars_name \family default defaults to \family typewriter $res \family default . \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: the current node becomes a member of resource \family typewriter $res \family default , the inital role is secondary. The initial full sync should start after a while. \end_layout \begin_layout Plain Layout \size scriptsize Notice: when the size of $disk_dev is strictly greater than the size of the resource, you will unnecessarily waste some space.. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize leave-resource \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the local node must be a member of the resource \family typewriter $res \family default ; its current role must be secondary. Sync, fetch and replay must be paused (see commands \family typewriter pause-{sync,fetch,replay} \family default or their abbreviation \family typewriter down \family default ). The disk must be detatched (see commands \family typewriter detach \family default or \family typewriter down \family default ). The kernel module should be loaded and the network should be operating in order to also propogate the effect to the other nodes. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: the local node is no longer a member of \family typewriter $res \family default . \end_layout \begin_layout Plain Layout \size scriptsize Notice: as a side effect for other nodes, their \family typewriter log-delete \family default may now become possible, since the current node does no longer count as a candidate for logfile application. In addition, a split brain situation may be (partly) resolved by this. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize Please notice that this command \emph on may \emph default lead to (but does not guarantee) split-brain resolution. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset \size scriptsize The contents of the disk is not changed by this command. Before issuing this command, check whether the disk appears to be locally consistent (see \family typewriter view-is-consistent \family default )! After giving this command, any internal information indicating the consistenc y state will be gone, and you will no longer be able to guess consistency properties. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize When you are \emph on sure \emph default .that the disk was consistent before (or is now by manually checking it), you may re-create a new resource out of it via \family typewriter create-resource \family default . \end_layout \begin_layout Plain Layout \size scriptsize In case of an eventual node loss (e.g. fire, water, ...) this command may be used on another node $helper in order to finally remove all the resources $damaged from the cluster via the command \family typewriter marsadm leave-resource $res --host=$damaged --force \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize delete-resource \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the resource must be empty (i.e. all members must have left via \family typewriter leave-resource \family default ). This precondition is overridable by \family typewriter --force \family default , increasing the danger to maximum! It is even possible to combine \family typewriter --force \family default with an invalid resource argument and an invalid \family typewriter --host=somebodyelse \family default argument in order to desperately try to destroy remains of incomplete or pysically damaged hardware. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: all cluster members will somewhen be forcefully removed from \family typewriter $res \family default . In case of network interruptions, the forced removal may take place far in the future. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset THIS COMMAND IS \emph on VERY \emph default DANGEROUS! \end_layout \begin_layout Plain Layout \size scriptsize Use this only in desperate situations, and only manually. Don't call this from scripts. You are forcefully using a sledgehammer, even without \family typewriter --force \family default ! The danger is that the \emph on true \emph default state of other cluster nodes need not be known in case of network problems .Even when it were known, it could be compromised by \series bold byzantine failures \series default . \end_layout \begin_layout Plain Layout \size scriptsize It is strongly advised to try this command with \family typewriter --dry-run \family default first. \end_layout \begin_layout Plain Layout \size scriptsize When combined with \family typewriter --force \family default , this command will definitely \series bold murder \series default other cluster nodes, possibly after a long while, and even when they are operating in primary mode / having split brains / etc. However, there is no guarantee that other cluster nodes will be \emph on really \emph default dead -- it is (theoretically) possible that they remain only \emph on half \emph default \emph on dead \emph default . For example, a half dead node may continue to write data to \family typewriter /mars/ \family default and thus lead to overflow somewhen. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset This command implies a forceful detach, possibly destroying consistency. \size scriptsize It is similar in spirit to a \series bold STONITH \series default . In particular, when a cluster node was operating in primary mode ( \family typewriter /dev/mars/mydata \family default being continuously in use), the forceful detach cannot be carried out until the device is completely unused. In the meantime, the current transaction logfile will be appended to, but the file \emph on might \emph default be already unlinked (orphan file filling up the disk). After the forceful detach, the underlying disk need not be consistent (although MARS does its best). Since this command deletes any symlinks which normally would indicate the consistency state, no guarantees about consistency can be given after this \emph on in general \emph default ! Always check consistency by hand! \end_layout \begin_layout Plain Layout \size scriptsize When possible / as soon as possible, check the local state on the other nodes in order to \emph on really \emph default shutdown the resource everywhere (e.g. to \emph on really \emph default unuse the \family typewriter /dev/mars/mydata \family default device, etc). \end_layout \begin_layout Plain Layout \size scriptsize After this command, you \emph on should \emph default rebuild the resource under a different name, in order to avoid any clashes caused by unexpected resurrection of \begin_inset Quotes eld \end_inset dead \begin_inset Quotes erd \end_inset or \begin_inset Quotes eld \end_inset half-dead \begin_inset Quotes erd \end_inset nodes (beware of shapshot / restores on virtual machines!!). MARS does its best to avoid problems even in case the new resource name should equal the old one, but there can be \emph on no guarantee \emph default in all possible failure scenarios / usage scenarios. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize When possible, prefer \family typewriter leave-resource \family default over this! \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize wait-resource \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset {is-,}{attach, \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset primary, \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset device}{-off,} \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize See section \begin_inset CommandInset ref LatexCommand ref reference "sub:Waiting" \end_inset . \end_layout \end_inset \end_layout \end_inset \end_inset \end_layout \begin_layout Subsection Operation of the Resource \begin_inset CommandInset label LatexCommand label name "sub:Operation-of-the" \end_inset \end_layout \begin_layout Standard Common preconditions are the preconditions from section \begin_inset CommandInset ref LatexCommand ref reference "sec:Resource-Operations" \end_inset , plus the respective resource \family typewriter $res \family default must exist, and the local node must be a member of it. With the single exception of \family typewriter attach \family default itself, all other operations must be started in \family typewriter attached \family default state. \end_layout \begin_layout Standard When \family typewriter $res \family default has the special reserved value \family typewriter all \family default , the following operations will work on all resources where the current node is a member (analogously to DRBD). \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize attach \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize yes \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the local disk belonging to $res is not in use by anyone else. Its contents has not been altered in the meantime since the last \family typewriter detach \family default . \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize Mounting \emph on read-only \emph default is allowed during the detached phase. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset \size scriptsize However, be careful! If you \emph on accidentally \emph default forget to give the right readonly-mount flags, if you use \family typewriter fsck \family default in repair mode inbetween, or alter the disk content in any other way (beware of LVM snapshots / restores etc), you will almost certainly produce an \series bold unnoticed inconsistency \series default (not reported by \family typewriter view-is-consistent \family default )! MARS has \emph on no chance \emph default to notice suchalike! \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: MARS uses the local disk and is able to work with it (e.g. replay logfiles on it). \end_layout \begin_layout Plain Layout \size scriptsize Note: the local disk is opened in exclusive read-write mode. This should protect against most common misuse, such as opening the disk in parallel to MARS. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset \size scriptsize However, this does not necessarily protect against non-exclusive openers. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize detach \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize yes \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the local \family typewriter /dev/mars/mydata \family default device (when present) is no longer opened by anybody. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: the local disk belonging to $res is no longer in use. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize In contrast to DRBD, you need not explicitly pause syncing, fetching, or replaying \emph on to \emph default (as apposed to \emph on from \emph default ) the local disk. These processes are automatically paused. As another contrast to DRBD, the respective processes will usually \emph on automatically \emph default resume after re-attach, as far as possible in the respective new situation. This will usually work even over \family typewriter rmmod \family default or reboot cycles, since the internal symlink tree will automatically persist all todo switches for you (c.f. section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-State-of" \end_inset ). \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset \size scriptsize Notice: only \emph on local \emph default transfer operations \emph on to \emph default the local disk are paused by a detach. When another node is remotely running a sync \emph on from \emph default your local disk, it will likely remain in use for remote reading. The reason is that the server part of MARS is operating purely passively, in order serve all remote requests as best as possible (similar to the original Unix philosophy). In order to really stop all accesses, do a \family typewriter pause-sync \family default on all other resource member where a sync is currently running. You may also try \family typewriter pause-sync-global \family default . \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset \size scriptsize WARNING! After this, and ather having paused any remote data access, you might use the underlying disk for your own purposes, such as test-mounting it in \emph on readonly \emph default mode. \series bold Don't modifiy \series default its contents in any way! Not even by an \family typewriter fsck \family default \begin_inset Foot status open \begin_layout Plain Layout \size scriptsize Some (but not all) \family typewriter fsck \family default tools for some filesystems have options to start only a test repair / verify mode / dry run, without doing actual modifications to the data. Of course, these modes \emph on can \emph default be used. But be really sure! Double-check for the right options! \end_layout \end_inset ! Otherwise, you will have inconsistencies \emph on guaranteed \emph default . MARS has no way for knowing of any modifications to your disk when bypassing \family typewriter /dev/mars/* \family default . \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize In case you accidentally modified the underlying disk at the \emph on primary \emph default side, you may choose to resolve the inconsistencies by \family typewriter marsadm invalide $res \family default on \emph on each \emph default secondary. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize pause-sync \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter pause-sync-local \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize pause-sync-local \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: none additionally. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: any sync operation targeting the local disk (when not yet completed) is paused after a while (cf section \begin_inset CommandInset ref LatexCommand ref reference "sec:The-State-of" \end_inset ). When successfully completed, this operation will remember the switch state forever and automatically become relevant if a sync is needed again (e.g. \family typewriter invalidate \family default or \family typewriter resize \family default ). \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize pause-sync-global \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Like \family typewriter *-local \family default , but operates on all members of the resource. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize resume-sync \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter resume-sync-local \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize resume-sync-local \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: additionally, a primary must be designated, and it must not be in emergency mode. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: any sync operation targeting the local disk (when not yet completed) is resumed after a while. When completed, this operation will remember the switch state forever and become relevant if a sync is needed again (e.g. \family typewriter invalidate \family default or \family typewriter resize \family default ). \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize resume-sync-global \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Like \family typewriter *-local \family default , but operates on all members of the resource. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize pause-fetch \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter pause-fetch-local \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize pause-fetch-local \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: none additionally. The resource \emph on should \emph default be in secondary role. Otherwise the switch has \emph on no \emph default \emph on immediate \emph default effect, but will come (possibly unexpectedly) into effect whenever secondary role is entered later for whatever reason. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: any transfer of (parts of) transaction logfiles which are present at another primary host to the local \family typewriter /mars/ \family default storage are paused at their current stage. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize This switch works independently from \family typewriter {pause,resume}-replay \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize pause-fetch-global \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Like \family typewriter *-local \family default , but operates on all members of the resource. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize resume-fetch \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter resume-fetch-local \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize resume-fetch-local \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: none additionally. The resource \emph on should \emph default be in secondary role. Otherwise the switch has \emph on no \emph default \emph on immediate \emph default effect, but will come (possibly unexpectedly) into effect whenever secondary role is entered later for whatever reason. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: any (parts of) transaction logfiles which are present at another primary host shouldl be transferred to the local \family typewriter /mars/ \family default storage as far as not yet locally present. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize This works independently from \family typewriter {pause,resume}-replay \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize resume-fetch-global \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Like \family typewriter *-local \family default , but operates on all members of the resource. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize pause-replay \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter pause-replay-local \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize pause-replay-local \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: none additionally. The resource \emph on should \emph default be in secondary role. Otherwise the switch has \emph on no \emph default \emph on immediate \emph default effect, but will come (possibly unexpectedly) into effect whenever secondary role is entered later for whatever reason. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: any local replay operations of transaction logfiles to the local disk are paused at their current stage. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize This works independently from \family typewriter {pause,resume}-fetch \family default resp. \family typewriter {dis,}connect \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize pause-replay-global \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Like \family typewriter *-local \family default , but operates on all members of the resource. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize resume-replay \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter pause-replay-local \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize resume-replay-local \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status collapsed \begin_layout Plain Layout \size scriptsize Precondition: must be in secondary role. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: any (parts of) locally existing transaction logfiles (whether replicated from other hosts or produced locally) are started for replay to the local disk, as far as they have not yet been applied. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize resume-replay-global \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Like \family typewriter *-local \family default , but operates on all members of the resource. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize connect \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter connect-local \family default and to \family typewriter resume-fetch-local \family default . \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize Note: although this sounds similar to DRBD's \family typewriter drbdadm connect \family default , there are subtle differences. DRBD has exactly one connection per resource, which is associated with \emph on pairs \emph default of nodes. In contrast, MARS may create multiple connections per resource at runtime, and these are associated with the \emph on target \emph default host (not with \emph on pairs \emph default of hosts). As a consequence, the fetch may \emph on potentially \emph default occur from any other other source host which happens to be reachable (although the current implementation prefers the current designated primary, but this may change in future). In addition, \family typewriter marsadm disconnect \family default does not stop \emph on all \emph default communication. It only stops fetching logfiles. The symlink update running in background is \emph on not \emph default stopped, in order to always propagate as much metadata as possible in the cluster. In case of a later incident, chances are higher for a better knowledge of the \emph on real \emph default state of the cluster. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize connect-local \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter resume-fetch-local \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize connect-global \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter resume-fetch-global \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize disconnect \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter disconnect-local \family default and to \family typewriter pause-fetch-local \family default . \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize See above note at \family typewriter connect \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize disconnect-local \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter pause-fetch-local \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize disconnect-global \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize partly \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter pause-fetch-global \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize up \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize yes \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter attach \family default followed by \family typewriter resume-fetch \family default followed by \family typewriter resume-replay \family default followed by \family typewriter resume-sync \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize down \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize yes \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter pause-sync \family default followed by \family typewriter pause-fetch \family default followed by \family typewriter pause-replay \family default followed by \family typewriter detach \family default . \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize Hint: consider to prefer plain \family typewriter detach \family default over this, because \family typewriter detach \family default will remember the last state of all switches, while \family typewriter down \family default will \emph on not \emph default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize primary \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize almost \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: sync must have finished at any resource member. All relevant transaction logfiles must be either already locally present, or be fetchable (see \family typewriter resume-fetch \family default and \family typewriter resume-replay \family default ). When some logfile data is locally missing, there must be enough space on \family typewriter /mars/ \family default to fetch it. Any replay must not have been interrupted by a replay error (see macro %replay-code{} or diskstate \family typewriter DefectiveLog \family default ). The current designated primary must be reachable over network. When there is no designated primary (i.e. \family typewriter marsadm secondary \family default had been executed before, which is explicitly \emph on not recommended \emph default ), \emph on all \emph default other members of the resource must be reachable (since we have no memory who was the old primary before), and then they must also match the same preconditions. When another host is currently primary (whether designated or not), it must match the preconditions of \family typewriter marsadm secondary \family default (that means, its local \family typewriter /dev/mars/mydata \family default device must not be in use any more). A split brain must not already exist. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: \family typewriter /dev/mars/$dev_name \family default appears locally and is usable; the current host is in primary role. \end_layout \begin_layout Plain Layout \size scriptsize Switches the \series bold designated primary \series default . There are two variants: \end_layout \begin_layout Plain Layout \size scriptsize 1) \series bold Handover \series default when \emph on not \emph default giving \family typewriter --force \family default : when another host is currently primary, it is first asked to leave its primary role, and it is waited until it actually has become secondary. After that, the local host is asked to become primary. Before actually becoming primary, all relevant logfiles are transferred over the network and replayed, in order to avoid accidental creation of split brain as best as possible \begin_inset Foot status open \begin_layout Plain Layout \size scriptsize Note that split brain avoidance is \series bold best effort \series default and cannot be guaranteed in general. For example, it may be impossible to avoid split brain in case of long-lasting network outages. \end_layout \end_inset . Only after that, \family typewriter /dev/mars/$dev_name \family default will appear. When network transfers of the symlink tree are very slow (or currently impossible), this command may take a very long time. \end_layout \begin_layout Plain Layout \size scriptsize In case a split brain is already detected at the initial situation, the local host will refuse to switch the designated primary without \family typewriter --force \family default . \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize In case of \begin_inset Formula $k>2$ \end_inset replicas: if you want to handover between host \family typewriter A \family default and \family typewriter B \family default while a sync is currently running at host \family typewriter C \family default , you have the following options: \end_layout \begin_layout Enumerate \size scriptsize wait until the sync has finished (see macro \family typewriter sync-rest \family default , or \family typewriter marsadm view \family default in general). \end_layout \begin_layout Enumerate \size scriptsize do a \family typewriter leave-resouce \family default on host \family typewriter C \family default , and later \family typewriter join-resource \family default after the handover completed successfully. \end_layout \begin_layout Plain Layout \size scriptsize 2) \series bold Forced switching \series default : by giving --force while \family typewriter pause-fetch \family default is active (but not \family typewriter pause-replay \family default ), most preconditions are ignored, and MARS does its best to actually become primary even if some logfiles are missing or incomplete or even defective. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset \family typewriter \size scriptsize primary --force \family default is a potentially harmful variant, because it will provoke a split brain in many cases, and therefore in turn will lead to \series bold data loss \series default because one of your split brain versions must be discarded later in order to resolve the split brain (see section \begin_inset CommandInset ref LatexCommand ref reference "sub:Split-Brain-Resolution" \end_inset ). \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset \series bold \size scriptsize Never \series default call \family typewriter primary --force \family default when \family typewriter primary \family default without \family typewriter --force \family default is sufficient! If \family typewriter primary \family default without \family typewriter --force \family default complains that the device is in use at the former primary side, take it seriously! Don't override with \family typewriter --force \family default , but rather umount \begin_inset Foot status open \begin_layout Plain Layout \size scriptsize A common misconception is when people think that they can keep their filesystem mounted without provoking a split brain, because they have their application stopped and thus don't write any data into the filesystem. This is a wrong idea, because filesystems may write some metadata, like booking information, even after hours or days of inactivity. Therefore MARS insists that the device is no longer in use before any handover can take place. \end_layout \end_inset the device at the other side! \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize Only use \family typewriter primary --force \family default when something is \emph on already broken \emph default , such as a network outage, or a node crash, etc. During ordinary operations (network OK, nodes OK), you should never need \family typewriter primary --force \family default ! \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize If you umount \family typewriter /dev/mars/mydata \family default on the old primary \family typewriter A \family default , and then wait until \family typewriter marsadm view \family default (or another suitable macro) on the target host \family typewriter B \family default shows that everything is \family typewriter UpToDate \family default , you can prevent a split brain by yourself even when giving \family typewriter primary --force \family default afterwards. However, checking / assuring this is \emph on your \emph default responsibility! \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset \family typewriter \size scriptsize primary --force \family default switches the \emph on designated \emph default primary. In some extremely rare cases, when \emph on multiple \emph default faults have accumulated in a \emph on weird \emph default situation, it \emph on might \emph default be impossible becoming the / an actual primary. Typically you may be \emph on already \emph default in a split brain situation. This has not been observed for a long operations time on recent versions of MARS, but in general becoming primary via \family typewriter --force \family default cannot be guaranteed always, although MARS does its best. In split brain situations, or if you ever encounter such a problem, you \emph on must \emph default resolve the split brain immediately after giving this command (see section \begin_inset CommandInset ref LatexCommand ref reference "sub:Split-Brain-Resolution" \end_inset ). \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize Hint in case of \begin_inset Formula $k>2$ \end_inset replicas: \family typewriter marsadm invalidate \family default cannot always resolve a split brain at other secondaries (which are neither the old nor the new designated primary). Therefore, prefer the \family typewriter leave-resource \family default method described in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Split-Brain-Resolution" \end_inset , starting with a \family typewriter leave-resource \family default phase at the old primary, and proceeding to \begin_inset Quotes eld \end_inset unrelated \begin_inset Quotes erd \end_inset secondaries step by step, until the split brain is gone. Don't \family typewriter join-resource \family default again before the split brain is gone! This way, all these replicas will remain consistent for now, but of course outdated (or potentially even a \begin_inset Quotes eld \end_inset wrong \begin_inset Quotes erd \end_inset split-brain version, but \emph on potentially usable \emph default in case you get under pressure in some way). In the hopefully unlikely case that you should later discover that you accidentally forced the \emph on wrong \emph default replica via \family typewriter primary --force \family default , you will have a chance to recover by either forcing the \begin_inset Quotes eld \end_inset correct \begin_inset Quotes erd \end_inset host to primary (if it did not already leave the resource), or by creating a completely fresh resource out of the \begin_inset Quotes eld \end_inset correct \begin_inset Quotes erd \end_inset local disk. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize Generally: in case of \family typewriter primary --force \family default , the preconditions are different. The fetch \emph on must \emph default be switched off (see \family typewriter pause-fetch \family default ), in order to get stable logfile positions. See section \begin_inset CommandInset ref LatexCommand ref reference "sub:Forced-Switching" \end_inset . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize secondary \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize almost \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the local \family typewriter /dev/mars/$dev_name \family default is no longer in use (e.g. umounted). \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: There exists no designated primary any more. During split brain and when the network is OK (again), all actual primaries (including the local host) will leave primary ASAP (i.e. when their \family typewriter /dev/mars/mydata \family default is no longer in use). Any secondary will start following (old) logfiles (even from backlogs) by replaying transaction logs if it is \emph on uniquely \emph default possible (which is often violated during split brain). On any secondary, \family typewriter /dev/mars/$dev_name \family default will have disappeared. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize Notice: in difference to DRBD, you \series bold don't need \series default this command during normal operation, including handover. Any resource member which is \emph on not \emph default designated as primary will \emph on automatically \emph default go into secondary role. For example, if you have \begin_inset Formula $k=4$ \end_inset replicas, only \emph on one of them \emph default can be designated as a primary. When the network is OK, all other 3 nodes will know this fact, and they will \emph on automatically \emph default go into secondary mode, following the transaction logs from the (new) primary. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset \size scriptsize Hint: avoid this command. It turns off \emph on any \emph default primary, \series bold globally \series default \begin_inset Foot status open \begin_layout Plain Layout \size scriptsize A serious \series bold misconception \series default among some people is when they believe that they can switch \begin_inset Quotes eld \end_inset a certain node to secondary \begin_inset Quotes erd \end_inset . It is not possible to switch individual nodes to secondary, without affecting other nodes! The concept of \begin_inset Quotes eld \end_inset designated primary \begin_inset Quotes erd \end_inset is \series bold global \series default throughout a resource! \end_layout \end_inset . You cannot start a sync after that (e.g. \family typewriter invalidate \family default or \family typewriter join-resource \family default or \family typewriter resume-sync \family default ), because it is \emph on not unique \emph default wherefrom the data shall be fetched. In split brain situations (when the network is OK again), this may have further drawbacks. It is much better / easier to \series bold \emph on directly \emph default switch the designated primary \series default from one node to another via the \family typewriter primary \family default command. See also section \begin_inset CommandInset ref LatexCommand ref reference "sub:Forced-Switching" \end_inset . \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset \size scriptsize There is only one valid use case where you \emph on really \emph default need this command: before finally destroying a resouce via the \emph on last \emph default \family typewriter leave-resource \family default (or the dangerous \family typewriter delete-resource \family default ), you will need this before you can do that. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize wait-umount \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize See section \begin_inset CommandInset ref LatexCommand ref reference "sub:Waiting" \end_inset . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize log-purge-all \begin_inset CommandInset label LatexCommand label name "log-purge-all$res" \end_inset \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: none additionally. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: all locally known logfiles and version links are removed, whenever they are not / no longer reachable by any split brain version. \end_layout \begin_layout Plain Layout Rationale: remove hindering split-brain / \family typewriter leave-resource \family default leftovers. \end_layout \begin_layout Plain Layout \size scriptsize Use this only when split brain does not go away by means of \family typewriter leave-resource \family default (which \emph on could \emph default happen in very weird scenarios such as MARS running on virtual machines doing a restore of their snapshots, or otherwise unexpected resurrection of dead or half-dead nodes). \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset THIS IS POTENTIALLY DANGEROUS! \end_layout \begin_layout Plain Layout \size scriptsize This command \emph on might \emph default destroy some valuable logfiles / other information in case the local informatio n is outdated or otherwise incorrect. MARS does its best for checking anything, but there is no guarantee. \end_layout \begin_layout Plain Layout \size scriptsize Hint: use \family typewriter --dry-run \family default beforehand for checking! \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize resize \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset [$size] \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize almost \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: The local host must be primary. All disks in the cluster participating in \family typewriter $res \family default must be physically larger than the logical resource size (e.g, by use of \family typewriter lvm \family default ; can be checked by macros \family typewriter %disk-size{} \family default and \family typewriter %resource-size{} \family default ). When the optional \family typewriter $size \family default argument is present, it must be smaller than the minimum of all physical sizes, but larger than the current logical size of the resource. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: the logical size of \family typewriter /dev/mars/$dev_name \family default will reflect the new size after a while. \end_layout \end_inset \end_layout \end_inset \end_inset \end_layout \begin_layout Subsection Logfile Operations \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize log-rotate \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the local node \family typewriter $host \family default must be primary at \family typewriter $res \family default . \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: after a while, a new transaction logfile \family typewriter /mars/resource-$res/log-$new_nr-$host \family default will be used instead of \family typewriter /mars/resource-$res/log-$old_nr-$host \family default where \family typewriter $new_nr \family default = \family typewriter $old_nr \family default + 1. Without \family typewriter --force \family default , this will only carry out actions at the primary side since it makes no sense on secondaries. With \family typewriter --force \family default , secondaries are \emph on trying \emph default to \emph on remotely \emph default trigger a log-rotate, but without any guarantee (likely even a split-brain may result instead, so use this only if you are \emph on really \emph default desperate). \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize log-delete \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the local node must be a member of \family typewriter $res \family default . \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: when there exists an old transaction logfile \family typewriter /mars/resource-$res/log-$old_nr-$some_host \family default where \family typewriter $old_nr \family default is the minimum existing number and that logfile is no longer referenced by any of the symlinks \family typewriter /mars/resource-$res/replay-* \family default , that logfile is marked for deletion in the whole cluster. When no such logfile exists, nothing will happen. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize log-delete-all \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Like \family typewriter log-delete \family default , but mark \emph on all \emph default currently unreferenced logfiles for deletion. \end_layout \end_inset \end_layout \end_inset \end_inset \end_layout \begin_layout Subsection Consistency Operations \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize invalidate \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the local node must be in secondary role at \family typewriter $res \family default . A \emph on designated \emph default primary must exist. When having \begin_inset Formula $k>2$ \end_inset replicas, no split brain must exist (otherwise, or when \family typewriter invalidate \family default does not work in case of \begin_inset Formula $k=2$ \end_inset , use the \family typewriter leave-resource \family default ; \family typewriter join-resource \family default method described in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Split-Brain-Resolution" \end_inset ). \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: the local disk is marked as inconsistent, and a fast fullsync from the designated primary will start after a while. Notice that \family typewriter marsadm {pause,resume}-sync \family default will influence whether the sync really starts. When the fullsync has finished successfully, the local node will be consistent again. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize fake-sync \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the local node must be in secondary role at \family typewriter $res \family default . \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: when a fullsync is running, it will stop after a while, and the local node will be \emph on marked \emph default as consistent as if it were consistent again. \end_layout \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset \size scriptsize ONLY USE THIS IF YOU REALLY KNOW WHAT YOU ARE DOING! \begin_inset Newline newline \end_inset See the WARNING in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Creating-and-Maintaining" \end_inset \begin_inset Newline newline \end_inset Use this only \emph on before \emph default creating a fresh filesystem inside \family typewriter /dev/mars/$res \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize set-replay \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset \size scriptsize ONLY FOR ADVANCED HACKERS WHO KNOW WHAT THEY ARE DOING! \begin_inset Newline newline \end_inset This command is deliberately not documented. You need the competence level RTFS ( \begin_inset Quotes eld \end_inset read the fucking sources \begin_inset Quotes erd \end_inset ). \end_layout \end_inset \end_layout \end_inset \end_inset \end_layout \begin_layout Section Further Operations \end_layout \begin_layout Subsection Inspection Commands \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize view- \emph on macroname \begin_inset Newline newline \end_inset \emph default \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Display the output of a macro evaluation. See section \begin_inset CommandInset ref LatexCommand ref reference "sec:Inspecting-the-State" \end_inset for a thorough description. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize view \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Equivalent to \family typewriter view-default \family default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize role \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Deprectated. Use \family typewriter view-role \family default instead. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize state \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Deprectated. Use \family typewriter view-state \family default instead. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize cstate \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Deprectated. Use \family typewriter view-cstate \family default instead. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize dstate \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Deprectated. Use \family typewriter view-dstate \family default instead. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize status \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Deprectated. Use \family typewriter view-status \family default instead. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize show-state \end_layout \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Deprectated. Don't use it. Use \family typewriter view-state \family default instead, or other macros. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize show-info \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Deprectated. Don't use it. Use \family typewriter view-info \family default instead, or other macros. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize show \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Deprectated. Don't use it. Use or implement some macros instead. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize show-errors \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Deprectated. Use \family typewriter view-the-err-msg \family default or \family typewriter view-resource-err \family default similar macros. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize cat \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $file \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Write the file content to stdout, but replace all occurences of numeric timestamps converted to a human-readable format. Thus is most useful for inspection of status and log files, e.g. \family typewriter marsadm cat /mars/5.total.log \end_layout \end_inset \end_layout \end_inset \end_inset \end_layout \begin_layout Subsection Setting Parameters \begin_inset CommandInset label LatexCommand label name "sub:Setting-Parameters" \end_inset \end_layout \begin_layout Subsubsection Per-Resource Parameters \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize set-emergency-limit $res \emph on n \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize The argument \emph on n \emph default must be percentage between 0 and 100 %. When the remaining store space in \family typewriter /mars/ \family default undershoots the given percentage, the resource will go \emph on earlier \emph default into emergency mode than by the global computation described in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Defending-Overflow" \end_inset . 0 means unlimited. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize get-emergency-limit $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Inquiry of the preceding value. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \end_inset \end_layout \begin_layout Subsubsection Global Parameters \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize set-sync-limit-value \emph on n \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Limit the concurrency of sync operations to some maximum number. 0 means unlimited. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize get-sync-limit-value \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Inquiry of the preceding value. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize set-sync-pref-list res1,res2,resn \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Set the order of preferences for syning. The argument must be comma-separated list of resource names. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize get-sync-pref-list \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Inquiry of the preceding value. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize set-connect-pref-list host1,host2,hostn \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Set the order of preferences for connections when there are more than 2 hosts participating in a cluster. The argument must be comma-separated list of node names. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize get-connect-pref-list \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Inquiry of the preceding value. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \end_inset \end_layout \begin_layout Subsection Waiting \begin_inset CommandInset label LatexCommand label name "sub:Waiting" \end_inset \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize wait-cluster \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the \family typewriter /mars/ \family default filesystem must be mounted and it must contain a valid MARS symlink tree produced by the other \family typewriter marsadm \family default commands. The kernel module must be loaded. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: none. \end_layout \begin_layout Plain Layout \size scriptsize Wait until \emph on all \emph default nodes in the cluster have sent a message, or until timeout. The default timeout is 30 s (exceptionally) and \size default \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Be \size scriptsize may be changed by \family typewriter --timeout=$seconds \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize wait-resource \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset {is-,}{attach, \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset primary, \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset device}{-off,} \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: the local node must be a member of the resource \family typewriter $res \family default . \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: none. \end_layout \begin_layout Plain Layout \size scriptsize Wait until the local node reaches a specified condition on \family typewriter $res \family default , or until timeout. The default timeout of 60 s may be changed by \family typewriter --timeout=$seconds \family default . The last argument denotes the condition. The condition is inverted if suffixed by \family typewriter -off \family default . When preceded by \family typewriter is- \family default (which is the most useful case), it is checked whether the condition is actually reached. When the \family typewriter is- \family default prefix is left off, the check is whether another \family typewriter marsadm \family default command has been already given which \emph on tries \emph default to achieves the intended result (typicially, you may use this after the \family typewriter is- \family default variant has failed). \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize wait-connect \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize almost \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize This is an alias for \family typewriter wait-cluster \family default waiting until only those nodes are reachable which belong to \family typewriter $res \family default (instead of waiting for the \emph on full \emph default cluster). \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize wait-umount \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $res \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Precondition: none additionally. \end_layout \begin_layout Plain Layout \size scriptsize Postcondition: the local \family typewriter /dev/mars/$dev_name \family default is no longer in use (e.g. umounted). \end_layout \end_inset \end_layout \end_inset \end_inset \end_layout \begin_layout Subsection Low-Level Expert Commands \end_layout \begin_layout Standard These commands are for experts and advanced sysadmins only. The interface is not stable, i.e. the meaning may change at any time. Use at your own risk! \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize set-link \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize RTFS. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize get-link \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize RTFS. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize delete-file \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize RTFS. \end_layout \end_inset \end_layout \end_inset \end_inset \end_layout \begin_layout Standard The following commands are for manual setup / repair of cluster membership. Only to be used by experts who know what they are doing! In general, cluster-wi de operations on IP addresses may need to be repeated at all hosts in the cluster iff the communication is not (yet) possible and/or not (yet) actually working (e.g. firewalling problems etc). \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "30col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize lowlevel-ls-host-ips \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "50col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize List all configured cluster members together with their currently configured IP addresses, as known \emph on locally \emph default . \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "30col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize lowlevel-set-host-ip \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $hostname \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $ip \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "50col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Change the assignment of IP addresses \emph on locally \emph default . May be used when hosts are moved to different network locations, or when different network interfaces are to be used for replication (e.g. dedicated replication IPs). Notice that the names of hosts must not change at all, only their IP addresses may be changed. Check active connections with \family typewriter netstat \family default & friends. Updates may need some time to proceed (socket timeouts etc). \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "30col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize lowlevel-delete-host \begin_inset Newline newline \end_inset \begin_inset ERT status open \begin_layout Plain Layout \backslash strut \backslash hfill \end_layout \end_inset $hostname \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "50col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Remove a host from the cluster membership \emph on locally \emph default , together with its IP address assignment. This does not remove any further information. In particular, resource memberships are untouched. \end_layout \end_inset \end_layout \end_inset \end_inset \end_layout \begin_layout Subsection Senseless Commands (from DRBD) \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize syncer \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize new-current-uuid \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize create-md \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize dump-md \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize dump \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize get-gi \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize show-gi \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize outdate \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize adjust \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize yes \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize Implemented as NOP (not necessary with MARS). \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize hidden-commands \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \end_layout \end_inset \end_inset \end_layout \begin_layout Subsection Forbidden Commands (from DRBD) \end_layout \begin_layout Standard These commands are not implemented because they would be dangerous in MARS context: \end_layout \begin_layout Standard \size scriptsize \begin_inset Tabular \begin_inset Text \begin_layout Plain Layout \size scriptsize Command / Params \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Cmp \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize Description \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize invalidate-remote \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize This would be too dangerous in case you have multiple secondaries. A similar effect can be achieved with the \family typewriter --host= \family default option. \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \family typewriter \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "20col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \family typewriter \size scriptsize verify \end_layout \end_inset \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize no \end_layout \end_inset \begin_inset Text \begin_layout Plain Layout \size scriptsize \begin_inset Box Frameless position "t" hor_pos "c" has_inner_box 1 inner_pos "t" use_parbox 0 use_makebox 0 width "60col%" special "none" height "1in" height_special "totalheight" status open \begin_layout Plain Layout \size scriptsize This would cause unintended side effects due to races between logfile transfer / application and block-wise comparison of the underlying disks. However, \family typewriter marsadm join-resource \family default or \family typewriter invalidate \family default will do the same as DRBD verify followed by DRBD resync, i.e. this will automatically correct any found errors;. Note that the fast-fullsync algorithm of MARS will minimize network traffic. \end_layout \end_inset \end_layout \end_inset \end_inset \end_layout \begin_layout Section The \family typewriter /proc/sys/mars/ \family default and other Expert Tweaks \begin_inset CommandInset label LatexCommand label name "sec:The-/proc/sys/mars/-Expert" \end_inset \end_layout \begin_layout Standard In general, you shouldn't need to deal with any tweaks in \family typewriter /proc/sys/mars/ \family default because everything should already default to reasonable predefined values. This interface allows access to some internal kernel variables of the \family typewriter mars.ko \family default kernel module at runtime. Thus it is \emph on not \emph default a stable interface. It is not only specific for MARS, but may also change between releases without notice. \end_layout \begin_layout Standard This section describes only those tweaks intended for sysadmins, not those for developers / very deep internals. \end_layout \begin_layout Subsection Syslogging \end_layout \begin_layout Standard All internal messages produced by the kernel module belong to one of the following classes: \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 0 debug messages \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 1 info messages \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 2 warnings \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 3 error messages \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 4 fatal error messages \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 5 any message (summary of 0 to 4) \end_layout \begin_layout Subsubsection Logging to Files \end_layout \begin_layout Standard These classes are used to produce status files \family typewriter $class.*.status \family default in the \family typewriter /mars/ \family default and/or in the \family typewriter /mars/resource- \emph on mydata \emph default / \family default directory / directories. \end_layout \begin_layout Standard When you create a file \family typewriter $class.*.log \family default in parallel to any \family typewriter $class.*.status \family default , the \family typewriter *.log \family default file will be appended forever with the same messages as in \family typewriter *.status \family default . The difference is that *.status is regenerated anew from an empty starting point, while *.log can (potentially) increase indefinitely unless you remove it, or rename it to something else. \end_layout \begin_layout Standard \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Beware, any permamently present \family typewriter *.log \family default file can easily fill up your \family typewriter /mars/ \family default partition until the problems described in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Defending-Overflow" \end_inset will appear. Use \family typewriter *.log \family default only for a \series bold limited time \series default , and \series bold only for debugging! \end_layout \begin_layout Subsubsection Logging to Syslog \end_layout \begin_layout Standard The classes also play a role in the following \family typewriter /proc/sys/mars/ \family default tweaks: \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter syslog_min_class \family default (rw) The \emph on mimimum \emph default class number for \emph on permanent \emph default syslogging. By default, this is set to -1 in order to switch off perment logging completely. Permament logging can easily flood your syslog with such huge amounts of messages (in particular when class=0), that your system as a whole may become unusable (because vital kernel threads may be blocked too long or too often by the userspace syslog daemon). Instead, please use the flood-protected syslogging described below! \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter syslog_max_class \family default (rw) The \emph on maximum \emph default class number for \emph on permanent \emph default syslogging. Please use the flood-protected version instead. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter syslog_flood_class \family default (rw) The mimimum class of flood-protected syslogging. The maximum class is always 4. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter syslog_flood_limit \family default (rw) The maxmimum number of messages after which the flood protection will start. This is a hard limit for the the number of messages written to the syslog. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter syslog_flood_recovery_s \family default (rw) The number of seconds after which the internal flood counter is reset (after flood protection state has been reached). When no new messages appear after this time, the flood protection will start over at count 0. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset The rationale behind flood protected syslogging: sysadmins are usually only interested in the point in time where some problems / incidents / etc have \emph on started \emph default . They are usually not interested in capturing \emph on each \emph default and \emph on every \emph default single error message (in particular when they are flooding the system logs). \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset If you \emph on really \emph default need complete error information, use the \family typewriter *.log \family default files described above, compress them and save them to somewhere else \emph on regularly \emph default by a cron job. This bears much less overhead than filtering via the syslog daemon, or even remote syslogging in real time which will almost surely screw up your system in case of network problems co-inciding with flood messages, such as caused in turn by those problems. Don't rely on real-time concepts, just do it the old-fashioned batch job way. \end_layout \begin_layout Subsubsection Tuning Verbosity of Logging \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter show_debug_messages \family default Boolean switch, 0 or 1. Mostly useful only for developers. This can easily flood your logs if our are not careful. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter show_log_messages \family default Boolean switch, 0 or 1. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter show_connections \family default Boolean switch, 0 or 1. Show detailed internal statistics on sockets. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter show_statistics_local \begin_inset space ~ \end_inset / \begin_inset space ~ \end_inset show_statistics_global \family default Only useful for kernel developers. Shows some internal information on internal brick instances, memory usage, etc. \end_layout \begin_layout Subsection Tuning the Sync \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter sync_flip_interval_sec \family default (rw) The sync process must not run in parallel to logfile replay, in order to easily guarantee consistency of your disk. If logfile replay would be paused for the full duration of very large or long-lasting syncs (which could take some days over very slow networks), your \family typewriter /mars/ \family default filesystem could overflow because no replay would be possible in the meantime. Therefore, MARS regulary flips between actually syncing and actually replaying, if both is enabled. You can set the time interval for flipping here. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter sync_limit \family default (rw) When > 0, this limits the maximum number of sync processes actually running parallel. This is useful if you have a large number of resources, and you don't want to overload the network with sync processes. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter sync_nr \family default (ro) Passive indicator for the number of sync processes currently running. \end_layout \begin_layout Labeling \labelwidthstring 00.00.0000 \family typewriter sync_want \family default (ro) Passive indicator for the number of sync processes which \emph on demand \emph default running. \end_layout \begin_layout Chapter Tips and Tricks \end_layout \begin_layout Section Avoiding Inappropriate Clustermanager Types for Medium and Long-Distance Replication \end_layout \begin_layout Standard This section addresses some wide-spread misconceptions. Its main target audience is developers, but sysadmins will profit from \series bold detailed explanations of problems and pitfalls \series default . When the problems described in this section are solved somewhen in future, this section will be shortened and some relevant parts moved to the appendix. \end_layout \begin_layout Standard Doing \series bold High Availability (HA) \series default wrong at \emph on concept level \emph default may easily get you into trouble, and may cost you several millions of € or $ in larger installations, or even knock you out of business when disasters are badly dealt with at higher levels such as clustermanagers. \end_layout \begin_layout Subsection General Cluster Models \end_layout \begin_layout Standard The most commonly known cluster model is called \series bold shared-disk \series default , and typically controlled by clustermanagers like \family typewriter PaceMaker \family default : \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/shared-disk-model.fig width 50col% \end_inset \end_layout \begin_layout Standard \noindent The most important property of shared-disk is that there exists only a single disk instance. Nowadays, this disk often has some \emph on internal \emph default redundancy such as RAID. At \emph on system \emph default architecure layer / network level, there exists no redundant disk at all. Only the application cluster is built redundant. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset It should be immediately clear that shared-disk clusters are only suitable for short-distance operations in the same datacenter. Although running one of the data access lines over short distances between very near-by datacenters (e.g. 1 km) would be theoretically possible, there would be no sufficient protection against failure of a whole datacenter. \end_layout \begin_layout Standard Both DRBD and MARS belong to a different architectural model called \series bold shared-nothing \series default : \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/shared-nothing-model.fig width 50col% \end_inset \end_layout \begin_layout Standard \noindent The characteristic feature of a shared-nothing model is (additional) \series bold redundancy at network level \series default . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Shared-nothing \begin_inset Quotes eld \end_inset clusters \begin_inset Foot status open \begin_layout Plain Layout Notice that the term \begin_inset Quotes eld \end_inset cluster computing \begin_inset Quotes erd \end_inset usually refers to short-distance only. Long-distance coupling should be called \begin_inset Quotes eld \end_inset grid computing \begin_inset Quotes erd \end_inset in preference. As known from the scientific literature, grid computing requires different concepts and methods in general. Only for the sake of simplicity, we use \begin_inset Quotes eld \end_inset cluster \begin_inset Quotes erd \end_inset and \begin_inset Quotes eld \end_inset grid \begin_inset Quotes erd \end_inset interchangeably. \end_layout \end_inset \begin_inset Quotes erd \end_inset could theoretically be built for \emph on any \emph default distances, from short to medium to long distances. However, concrete technologies of disk coupling such as synchronous operation may pose practical limits on the distances (see chapter \begin_inset CommandInset ref LatexCommand ref reference "chap:Use-Cases-for" \end_inset ). \end_layout \begin_layout Standard In general, clustermanagers must fit to the model. Some clustermanager can be configured to fit to multiple models. If so, this must be done properly, or you may get into serious trouble. \end_layout \begin_layout Standard Some people don't know, or they don't believe, that different architectural models like shared-disk or shared-nothing will \emph on require \emph default an \emph on appropriate \emph default type of clustermanager and/or a different configuration. Failing to do so, by selection of an inappropriate clustermanager type and/or an inappropriate configuration may be hazardous. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Selection of the right model alone is not sufficient. Some, if not many, clustermanagers have not been designed for long distances. As explained in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Special-Requirements-for" \end_inset , long distances have further \series bold hard requirements \series default . Disregarding them may be also hazardous! \end_layout \begin_layout Subsection Handover / Failover Reasons and Scenarios \end_layout \begin_layout Standard From a sysadmin perspective, there exist a number of different \series bold reasons \series default why the application workload must be switched from the currently active side A to the currently passive side B: \end_layout \begin_layout Enumerate Some \series bold defect \series default has occurred at cluster side A or at some corresponding part of the network. \end_layout \begin_layout Enumerate Some \series bold maintenance \series default has to be done at side A which would cause a longer downtime (e.g. security kernel update or replacement of core network equipment or maintainance of UPS or of the BBU cache etc - hardware isn't 24/7/365 in practice, although some vendors \emph on claim \emph default it - it is either not really true, or it becomes \emph on extremely \emph default expensive). \end_layout \begin_layout Standard Both reasons are valid and must be automatically handled in larger installations. In order to deal with all of these reasons, the following basic mechanisms can be used in either model: \end_layout \begin_layout Enumerate \series bold Failover \series default (triggered either manually or automatically) \end_layout \begin_layout Enumerate \series bold Handover \series default (triggered manually \begin_inset Foot status open \begin_layout Plain Layout Automatic triggering could be feasible for prophylactic treatments. \end_layout \end_inset ) \end_layout \begin_layout Standard It is important to not confuse handover with failover at concept level. Not only the reasons / preconditions are very different, but also the \emph on requirements \emph default . Example: precondition for handover is that \emph on both \emph default cluster sides are healthy, while precondition for failover is that \emph on some relevant(!) \emph default failure has been \emph on detected \emph default somewhere (whether this is \emph on really \emph default true is another matter). Typically, failover must be able to run in masses, while planned handover often has lower scaling requirements. \end_layout \begin_layout Standard Not all existing clustermanagers are dealing with all of these cases (or their variants) equally well, and some are not even dealing with some of these cases / variants \emph on at all \emph default . \end_layout \begin_layout Standard Some clustermanagers cannot easily express the concept of \begin_inset Quotes eld \end_inset automatic triggering \begin_inset Quotes erd \end_inset versus \begin_inset Quotes eld \end_inset manual triggering \begin_inset Quotes erd \end_inset of an action. There exists simply no cluster-global switch which selects either \begin_inset Quotes eld \end_inset manual mode \begin_inset Quotes erd \end_inset or \begin_inset Quotes eld \end_inset automatic mode \begin_inset Quotes erd \end_inset (except when you start to hack the code and/or write new plugins; then you might notice that there is almost no architectural layering / sufficient separation between mechanism and strategy). Being forced to permanently use an automatic mode for several hundreds or even thousands of clusters is not only boring, but bears a considerable risk when automatics do a wrong decision at hundreds of instances in parallel. \end_layout \begin_layout Subsection Granularity and Layering Hierarchy for Long Distances \end_layout \begin_layout Standard Many existing clustermanager solutions are dealing with a single cluster instance, as the term \begin_inset Quotes eld \end_inset \emph on cluster \emph default manager \begin_inset Quotes erd \end_inset suggests. However, when running several hundreds or thousands of cluster instances, you likely will not want to manage each of them individually. In addition, failover should \emph on not only \emph default be \emph on triggered \emph default (not to be confused with \emph on executed \emph default ) individually at cluster level, but likely \emph on also \emph default at a higher granularity such as a room, or a whole datacenter. Otherwise, some chaos is likely to happen. \end_layout \begin_layout Standard Here is what you probably will \series bold need \series default , possibly in difference to what you may find on the market (whether OpenSource or not). For simplicity, the following diagram shows only two levels of granularity, but can be easily extended to multiple layers of granularity, or to some concept of various \emph on subsets of clusters \emph default : \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/clustermanager-hierarchy.fig width 70col% \end_inset \end_layout \begin_layout Standard \noindent Notice that many existing clustermanager solutions are not addressing the datacenter granularity at all. Typically, they use concepts like \series bold quorums \series default for determining failures \emph on at cluster level \emph default solely, and then immediately executing failover of the cluster, sometimes without clean architectural distinction between trigger and execution (similar to the \begin_inset Quotes eld \end_inset separation of concerns \begin_inset Quotes erd \end_inset between \series bold mechanism \series default and \series bold strategy \series default in Operating Systems). Sometimes there is even no internal software layering / modularization according to this separation of concerns at all. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset When there is no distinction between different levels of granularity, you are hopelessly bound to a non-extensible and thus non-adaptable system when you need to operate masses of clusters. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset A lacking distinction between automatic mode and manual mode, and/or lack of corresponding \series bold architectural software layers \series default is not only a blatant ignoration of well-established best practices of \series bold software engineering \series default , but will bind you even more firmly to an inflexible system. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Terminology: for practical reasons, we use the general term \begin_inset Quotes eld \end_inset clustermanager \begin_inset Quotes erd \end_inset also for speaking about layers dealing with higher granularity, such as datacenter layers, and also for long-distance replication scenarios, although some terminology from grid computing would be more appropriate in a scientific background. \end_layout \begin_layout Standard Please consider the following: when it comes to long-distance HA, the above layering architecture is also motivated by vastly different numbers of instances for each layer. Ideally, the topmost automatics layer should be able to overview several datacenters in parallel, in order to cope with (almost) global network problems such as network partitions. Additionally, it should also detect single cluster failures, or intermediate problems like \begin_inset Quotes eld \end_inset rack failure \begin_inset Quotes erd \end_inset or \begin_inset Quotes eld \end_inset room failure \begin_inset Quotes erd \end_inset , as well as various types of (partial / intermediate) (replication) network failures. Incompatible decisions at each of the different granularities would be a no-go in practice. Somewhere and somehow, you need one single \begin_inset Foot status open \begin_layout Plain Layout If you have \emph on logical pairs of datacenters \emph default which are firmly bound together, you could also have several topmost automatics instances, e.g. for each \emph on pair \emph default of datacenters. However, that would be very \series bold inflexible \series default , because then you cannot easily mix locations or migrate your servers between datacenters. Using \begin_inset Formula $k>2$ \end_inset replicas with MARS would also become a nightmare. In your own interest, please don't create any concepts where masses of hardware are firmly bound to fixed constants at some software layers. \end_layout \end_inset top-most \emph on logical \emph default problem detection / ranking instance, which should be \emph on internally distributed \emph default of course, typically using some \series bold distributed consensus protocol \series default ; but in difference to many published distributed consensus algorithms it should be able to work with multiple granularities at the same time. \end_layout \begin_layout Subsection Methods and their Appropriateness \end_layout \begin_layout Subsubsection Failover Methods \begin_inset CommandInset label LatexCommand label name "sub:Failover-Methods" \end_inset \end_layout \begin_layout Standard Failover methods are only needed in case of an incident. They should not be used for regular handover. \end_layout \begin_layout Paragraph STONITH-like Methods \end_layout \begin_layout Standard STONITH = Shoot The Other Node In The Head \end_layout \begin_layout Standard These methods are widely known, although they have several serious drawbacks. Some people even believe that \emph on any \emph default clustermanager must \emph on always \emph default have some STONITH-like functionality. This is wrong. There \emph on exist \emph default alternatives, as shown in the next paragraph. \end_layout \begin_layout Standard The most obvious drawback is that STONITH will always create a \series bold damage \series default , by definition. \end_layout \begin_layout Standard Example: a typical contemporary STONITH implementation uses IPMI for automatical ly powering off your servers, or at least pushes the (virtual) reset button. This will \emph on always \emph default create a certain type of damage: the affected systems will definitely not be available, at least for some time until they have (manually) rebooted. \end_layout \begin_layout Standard This is a conceptual contradiction: the reason for starting failover is that you want to restore availability as soon as possible, but in order to do so you will first \emph on destroy \emph default the availability of a particular \emph on component \emph default . This may be counter-productive. \end_layout \begin_layout Standard Example: when your hot standby node B does not work as expected, or if it works even \emph on worse \emph default than A before, you will loose some time until you \emph on can \emph default become operational again at the old side A. \end_layout \begin_layout Standard Here is an example method for handling a failure scenario. The old active side A is assumed to be no longer healthy anymore. The method uses a sequential state transition chain with a STONITH-like step: \end_layout \begin_layout Description Phase1 Check whether the hot standby B is currently usable. If this is violated (which may happen during certain types of disasters), abort the failover for any affected resources. \end_layout \begin_layout Description Phase2 \emph on Try \emph default to shutdown the damaged side A (in the \emph on hope \emph default that there is no \emph on serious \emph default damage). \end_layout \begin_layout Description Phase3 In case phase2 did not work during a grace period / after a timeout, assume that A is badly damaged and therefore STONITH it. \end_layout \begin_layout Description Phase4 Start the application at the hot standby B. \end_layout \begin_layout Standard Notice: any cleanup actions, such as \series bold repair \series default of defective hard- or software etc, are outside the scope of failover processes. Typically, they are executed much later when restoring redundancy. \end_layout \begin_layout Standard Also notice: this method is a \emph on heavily \emph default distributed one, in the sense that sequential actions are alternated multiple times on different hosts. This is known to be cumbersome in distributed systems, in particular in presence of network problems. \end_layout \begin_layout Standard \begin_inset CommandInset label LatexCommand label name "Phase4-in-more" \end_inset Phase4 in more detail for DRBD, augmented with some pseudo code for application control: \end_layout \begin_layout Enumerate at side B: \family typewriter drbdadm disconnect all \end_layout \begin_layout Enumerate at side B: \family typewriter drbdadm primary --force all \end_layout \begin_layout Enumerate at side B: \family typewriter applicationmanager start all \end_layout \begin_layout Standard The same phase4 using MARS: \end_layout \begin_layout Enumerate at side B: \family typewriter marsadm pause-fetch all \end_layout \begin_layout Enumerate at side B: \family typewriter marsadm primary --force all \end_layout \begin_layout Enumerate at side B: \family typewriter applicationmanager start all \end_layout \begin_layout Standard This sequential 4-phase method is far from optimal, for the following reasons: \end_layout \begin_layout Itemize The method tries to handle both failover and handover scenarios with one single sequential receipe. In case of a true failover scenario where it is \emph on already known for sure \emph default that side A is badly damaged, this method will unnecessarily waste time for phase 2. This could be fixed by introduction of a conceptual distinction between handover and failover, but it would not fix the following problems. \end_layout \begin_layout Itemize Before phase4 is started (which will re-establish the service from a user's perspective), a lot of time is wasted by \emph on both \emph default phases 2 \emph on and \emph default 3. Even if phase 2 would be skipped, phase 3 would unnecessarily cost some time. In the next paragraph, an alternative method is explained which eliminates any unnecessary waiting time at all. \end_layout \begin_layout Itemize The above method is adapted to the shared-disk model. It does not take advantage of the shared-nothing model, where further possibili ties for better solutions exist. \end_layout \begin_layout Itemize In case of long-distance network partitions and/or sysadmin / system management subnetwork outages, you may not even be able to (remotely) start STONITH at at. Thus the above method misses an important failure scenario. \end_layout \begin_layout Standard Some people seem to have a \emph on binary \emph default view at the healthiness of a system: in their view, a system is either operational, or it is damaged. This kind of view is ignoring the fact that some systems may be half-alive, showing only \emph on minor \emph default problems, or occurring only from time to time. \end_layout \begin_layout Standard It is obvious that damaging a healthy system is a bad idea by itself. Even \emph on generally \emph default damaging a half-alive system in order to \begin_inset Quotes eld \end_inset fix \begin_inset Quotes erd \end_inset problems is not generally a good idea, because it may increase the damage when you don't know the \emph on real \emph default reason \begin_inset Foot status open \begin_layout Plain Layout Example, occurring in masses: an incorrectly installed bootloader, or a wrong BIOS boot priority order which unexpectedly lead to hangs or infinite reboot cycles once the DHCP or BOOTP servers are not longer available / reachable. \end_layout \end_inset . \end_layout \begin_layout Standard Even worse: in a distributed system \begin_inset Foot status open \begin_layout Plain Layout Notice: the STONITH concept is more or less associated with short-distance scenarios where \series bold crossover cables \series default or similare equipment are used. The assumption is that crossover cables can't go defective, or at least it would be an extremely unlikely scenario. For long-distance replication, this assumption is simply not true. \end_layout \end_inset you sometimes \emph on cannot(!) \emph default know whether a system is healthy, or to what degree it is healthy. Typical STONITH methods as used in some contemporary clustermanagers are \series bold assuming a worst case \series default , even if that worst case is currently not for real. \end_layout \begin_layout Standard Therefore, avoid the following \series bold fundamental flaws \series default in failover concepts and healthiness models, which apply to implementors / configurators of clustermanagers: \end_layout \begin_layout Itemize Don't mix up knowledge with conclusions about a (sub)system, and also don't mix this up with the real state of that (sub)system. In reality, you don't have any knowledge about a complex distributed system. You only may have \emph on some \emph default knowledge about \emph on some \emph default parts of the system, but you cannot \begin_inset Quotes eld \end_inset see \begin_inset Quotes erd \end_inset a complex distributed system as a whole. What you think is your knowledge, isn't knowledge in reality: in many cases, it is \emph on conclusion \emph default , not knowledge. Don't mix this up! \end_layout \begin_layout Itemize Some systems are more complex than your model of it. Don't neglect important parts (such as networks, routers, switches, cables, plugs) which may lead you to wrong conclusions! \end_layout \begin_layout Itemize Don't restrict your mind to boolean models of healthyness. Doing so can easily create unnecessary damage by construction, and even at concept level. You should know from software engineering that defects in concepts or models are much more serious than simple bugs in implementations. Choosing the wrong model cannot be fixed as easily as a typical bug or a typo. \end_layout \begin_layout Itemize Try to deduce the state of a system as \series bold reliably \series default as possible. If you don't know something for sure, don't generally assume that it has gone wrong. Don't confuse missing knowledge with the conclusion that something is bad. Boolean algebra restricts your mind to either \begin_inset Quotes eld \end_inset good \begin_inset Quotes erd \end_inset or \begin_inset Quotes eld \end_inset bad \begin_inset Quotes erd \end_inset . Use at least \series bold tri-state algebra \series default which has a means for expressing \series bold \begin_inset Quotes eld \end_inset unknown \begin_inset Quotes erd \end_inset \series default . Even better: attach a probability to anything you (believe to) know. Errare humanum est: nothing is absolutely sure. \end_layout \begin_layout Itemize Oversimplification: don't report an \begin_inset Quotes eld \end_inset unknown \begin_inset Quotes erd \end_inset or even a \begin_inset Quotes eld \end_inset broken \begin_inset Quotes erd \end_inset state for a complex system whenever a smaller subsystem exists for which you have some knowledge (or you can conclude something about it with reasonable evidence). Otherwise, your users / sysadmins may draw wrong conclusions, and assume that the whole system is broken, while in reality only some minor part has some minor problem. Users could then likely make wrong decisions, which may then easily lead to bigger damages. \end_layout \begin_layout Itemize Murphy's law: \series bold never assume that something can't go wrong! \series default Doing so is a blatant misconception at topmost level: the \emph on purpose \emph default of a clustermanager is creating High Availablity (HA) out of more or less \begin_inset Quotes eld \end_inset unreliable \begin_inset Quotes erd \end_inset components. It is the damn duty of both a clustermanager and its configurator to try to compensate \emph on any \emph default failures, \emph on regardless of their probability \emph default \begin_inset Foot status open \begin_layout Plain Layout Never claim that something has only low probability (and therefore it were not relevant). In the HA area, you simply \series bold cannot know \series default that, because you typically have \emph on sporadic \emph default incidents. In extreme cases, the \emph on purpose \emph default of your HA solution is protection against 1 failure per 10 years. You simply don't have the time to wait for creating an incident statistics about that! \end_layout \end_inset , as best as possible. \end_layout \begin_layout Itemize Never confuse \series bold probability \series default with \series bold expectancy value! \series default If you don't know the mathematical term \begin_inset Quotes eld \end_inset expectancy value \begin_inset Quotes erd \end_inset , or if you don't know what this means \emph on in practice \emph default , don't take responsibility for millions of € or $. \end_layout \begin_layout Itemize When operating masses of hard- and software: never assume that a particular failure can occur only at a low number of instances. There are \series bold \emph on unknown(!) \emph default systematic errors \series default which may pop up at the wrong time and in huge masses when you don't expect them. \end_layout \begin_layout Itemize Multiple layers of fallback: \emph on any \emph default action can fail. Be prepared to have a plan B, and even a plan C, and even better a plan D, wherever possible. \end_layout \begin_layout Itemize Never increase any damage anywhere, unnecessarily! Always try to \emph on miminize \emph default any damage! It can be mathematically proven that in deterministic probabilistic systems having finite state, increases of a damage level \emph on at the wrong place \emph default will \emph on introduce \emph default an \emph on additional \emph default \emph on risk \emph default of getting into an \series bold endless loop \series default . This is also true for nondeterministic systems, as known from formal language theory \begin_inset Foot status open \begin_layout Plain Layout Finite automatons are known to be transformable to deterministic ones, usually by an exponential increase in the number of states. \end_layout \end_inset . \end_layout \begin_layout Itemize Use the \series bold best effort principle \series default . You should be aware of the following fact: in general, it is impossible to create an \emph on absolutely reliable system \emph default out of unreliable components. You can \emph on lower \emph default the risk of failures to any \begin_inset Formula $\epsilon>0$ \end_inset by investing a lot of resources and of money, but whatever you do: \begin_inset Formula $\epsilon=0$ \end_inset is impossible. Therefore, be careful with boolean algebra. Prefer approximation methods / optimizing methods instead. Always do \emph on your \emph default best, instead of trying to reach a \emph on global \emph default optimum which likely does not exist at all (because the \begin_inset Formula $\epsilon$ \end_inset can only \emph on converge \emph default to an optimum, but will never actually reach it). The best effort principle means the following: if you discover a method for improving your operating state by reduction of a (potential) damage in a reasonable time and with reasonable effort, then \series bold simply do it \series default . Don't argue that a particular step is no 100% solution for all of your problems. \emph on Any \emph default \emph on improvement \emph default is valuable. \series bold Don't miss any valuable step \series default having reasonable costs with respect to your budget. Missing valuable measures which have low costs are certainly a violation of the best effort principle, because you are not doing \emph on your \emph default best. Keep that in mind. \begin_inset Newline newline \end_inset If you have \emph on understood \emph default this (e.g. deeply think at least one day about it), you will no longer advocate STONITH methods \emph on in general \emph default , when there are alternatives. STONITH methods are only valuable when you \emph on know in advance \emph default that the final outcome (after reboot) will most likely be better, and that waiting for reboot will most likely \emph on pay off \emph default . In general, this condition is \emph on not true \emph default if you have a healthy hot standby system. This should be easy to see. But there exist well-known clustermanager solutions / configurations blatantly ignoring \begin_inset Foot status open \begin_layout Plain Layout For some \emph on special(!) \emph default cases of the shared-disk model, there exist some justifications for doing STONITH \emph on before \emph default starting the application at the hot standby. Under certain circumstances, it can happen that system A running amok could destroy the data on your single shared disk (example: a filesystem doubly mounted \emph on in parallel \emph default , which will certainly destroy your data, except you are using \family typewriter ocfs2 \family default or suchalike). This argument is only valid for \emph on passive \emph default disks which are \emph on directly \emph default attached to \emph on both \emph default systems A and B, such that there is no \emph on external \emph default means for fencing the disk. In case of iSCSI running over ordinary network equipment such as routers or switches, the argument \begin_inset Quotes eld \end_inset fencing the disk is otherwise not possible \begin_inset Quotes erd \end_inset does not apply. You can interrupt iSCSI connection at the network gear, or you can often do it at cluster A or at the iSCSI target. Even commercial storage appliances speaking iSCSI can be remotely controlled for forcefully aborting iSCSI sessions. In modern times, the STONITH method has no longer such a justification. The justification stems from ancient times when a disk was a purely passive mechanical device, and its disk controller was part of the server system. \end_layout \end_inset this. Only when the former standby system does not work as expected (this means that \emph on all \emph default of your redundant systems are not healthy enough for your application), \emph on only then \begin_inset Foot status open \begin_layout Plain Layout Notice that STONITH may be needed for (manual or partially automatic) \emph on repair \emph default in some cases, e.g. when you know that a system has a kernel crash. Don't mix up the repair phase with failover or handover phases. Typically, they are executed at different times. The repair phase is outside the scope of this section. \end_layout \end_inset \emph default STONITH is unevitable as a \emph on last resort \emph default option. \begin_inset Newline newline \end_inset In short: blindly using STONITH without true need during failover is a violation of the best effort principle. You are simply not doing your best. \end_layout \begin_layout Itemize When your budget is limited, carefully select those improvements which make your system \series bold as reliable as possible \series default , given your fixed budget. \end_layout \begin_layout Itemize Create statistics on the duration of your actions. Based on this, try to get a \emph on balanced \emph default optimum between time and costs. \end_layout \begin_layout Itemize Whatever actions you can \series bold start in parallel \series default for saving time, do it. Otherwise you are disregarding the best effort principle, and your solution will be sub-optimal. You will require deep knowledge of parallel systems, as well as experience with dealing with problems like (distributed) races. Notice that \emph on any \emph default distributed system is \emph on inherently parallel \emph default . Don't believe that sequential methods can deliver an optimum solution in such a difficult area. \end_layout \begin_layout Itemize If you don't have the \series bold necessary skills \series default for (a) recognizing already existing parallelism, (b) dealing with parallelism at concept level, (c) programming and/or configuring parallelism race-free and deadlock-free (or if you even don't know what a race condition is and where it may occur in practice), then don't take responsibility for millions of € or $. \end_layout \begin_layout Itemize Avoid hard timeouts wherever possible. Use \series bold adaptive timeouts \series default instead. Reason: depending on hardware or workload, the same action A may take a very short time on cluster 1, but take a very long time on cluster 2. If you need to guard action A from hanging (which is almost always the case because of Murphy's law), don't configure any fixed timeout for it. When having several hundreds of clusters, you would need to use the \emph on worst case value \emph default , which is the longest time occurring somewhere at the very slow clusters / slow parts of the network. This wastes a lot of time in case one of the fast clusters is hanging. Adaptive timeouts work differently: they use a kind of \begin_inset Quotes eld \end_inset progress bar \begin_inset Quotes erd \end_inset to monitor the \emph on progress \emph default of an action. They will abort only if there is \emph on no progress \emph default for a certain amount of time. Hint: among others, \family typewriter marsadm view-*-rest \family default commands or macros are your friend. \end_layout \begin_layout Paragraph ITON = Ignore The Other Node \end_layout \begin_layout Standard This means \series bold fencing from application traffic \series default , and can be used as an alternative to STONITH when done properly. \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/fencing-hierarchy.fig width 60col% \end_inset \end_layout \begin_layout Standard \noindent Fencing from application traffic is best suited for the shared-nothing model, but can also be adapted to the shared-disk model with some quirks. \end_layout \begin_layout Standard The idea is simple: always route your application network traffic to the current (logically) active side, whether it is currently A or B. Just don't route any application requests to the current (logically) passive side at all. \end_layout \begin_layout Standard For failover (and \emph on only \emph default for that), you \emph on should not care about \emph default any split brain occurring at the low-level generic block device: \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/split-brain-history.fig width 50col% \end_inset \end_layout \begin_layout Standard \noindent Although having a split brain at the generic low-level block device, you now define the \begin_inset Quotes eld \end_inset logically active \begin_inset Quotes erd \end_inset and \begin_inset Quotes eld \end_inset logically passive \begin_inset Quotes erd \end_inset side by yourself by \emph on logically ignoring \emph default the \begin_inset Quotes eld \end_inset wrong \begin_inset Quotes erd \end_inset side as defined by yourself: \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/split-brain-resolved.fig width 50col% \end_inset \end_layout \begin_layout Standard \noindent This is possible because the generic block devices provided by DRBD or MARS are completely \series bold agnostic \series default of the \begin_inset Quotes eld \end_inset meaning \begin_inset Quotes erd \end_inset of either version A or B. Higher levels such as clustermanagers (or humans like sysadmins) can assign them a meaning like \begin_inset Quotes eld \end_inset relevant \begin_inset Quotes erd \end_inset or \begin_inset Quotes eld \end_inset not relevant \begin_inset Quotes erd \end_inset , or \begin_inset Quotes eld \end_inset logically active \begin_inset Quotes erd \end_inset or \begin_inset Quotes eld \end_inset logically passive \begin_inset Quotes erd \end_inset . \end_layout \begin_layout Standard As a result of fencing from application traffic, the \begin_inset Quotes eld \end_inset logically passive \begin_inset Quotes erd \end_inset side will \emph on logically \emph default cease any actions such as updating user data, even if it is \begin_inset Quotes eld \end_inset physically active \begin_inset Quotes erd \end_inset during split-brain (when two primaries exist in DRBD or MARS sense \begin_inset Foot status open \begin_layout Plain Layout Hint: some clustermanagers and/or some people seem to define the term \begin_inset Quotes eld \end_inset split-brain \begin_inset Quotes erd \end_inset differently from DRBD or MARS. In the context of generic block devices, split brain means that the \emph on history \emph default of both versions has been split to a Y-like \series bold fork \series default (for whatever reason), such that re-joining them \emph on incrementally \emph default by ordinary write operations is no longer guaranteed to be possible. As a slightly simplified definition, you might alternatively use the definition \begin_inset Quotes eld \end_inset two incompatible primaries are existing in parallel \begin_inset Quotes erd \end_inset , which means almost the same in practice. Details of formal semantics are not the scope of this treatment. \end_layout \end_inset ). \end_layout \begin_layout Standard If you already have some load balancing, or BGP, or another \emph on mechanism \emph default for dynamic routing, you already have an important part for the ITON method. Additionally, ensure by an appropriate \emph on strategy \emph default that your balancer status / BGP announcement etc does always coincide with the \begin_inset Quotes eld \end_inset logically active \begin_inset Quotes erd \end_inset side (recall that even during split-brain \emph on you \emph default must define \begin_inset Quotes eld \end_inset logically active \begin_inset Quotes erd \end_inset \series bold uniquely \series default \begin_inset Foot status open \begin_layout Plain Layout A possible strategy is to use a Lamport clock for route changes: the change with the most recent Lamport timestamp will always win over previous changes. \end_layout \end_inset by yourself). \end_layout \begin_layout Standard Example: \end_layout \begin_layout Description Phase1 Check whether the hot standby B is currently usable. If this is violated (which may happen during certain types of disasters), abort the failover for any affected resources. \end_layout \begin_layout Description Phase2 Do the following \emph on in parallel \begin_inset Foot status open \begin_layout Plain Layout For database applications where no transactions should get lost, you should slightly modify the order of operations: first fence the old side A, then start the application at standby side B. However, be warned that even this cannot guarantee that no transaction is lost. When the network between A and B is interrupted \emph on before \emph default the incident happens, DRBD will automatically disconnect, and MARS will show a lagbehind. In order to fully eliminate this possibility, you can either use DRBD and configure it to hang forever during network outages (such that users will be unable to commit any transactions at all), or you can use the shared-disk model instead. But in the latter case, you are introducing a SPOF at the single shared disk. The former case is logically almost equivalent to shared-disk, but avoiding some parts of the physical SPOF. In a truly distributed system, the famous CAP theorem is limiting your possibilities. Therefore, no general solution exists fulfilling all requirements at the same time. \end_layout \end_inset : \end_layout \begin_deeper \begin_layout Itemize Start all affected applications at the hot standby B. This can be done with the same DRBD or MARS procedure as described \begin_inset CommandInset ref LatexCommand vpageref reference "Phase4-in-more" \end_inset . \end_layout \begin_layout Itemize Fence A by fixedly routing all affected application traffic to B. \end_layout \end_deeper \begin_layout Standard That's all which has to be done for a shared-nothing model. Of course, this will likely produce a split-brain (even when using DRBD in place of MARS), but that will not matter from a user's perspective, because the users will no longer \begin_inset Quotes eld \end_inset see \begin_inset Quotes erd \end_inset the \begin_inset Quotes eld \end_inset logically passive \begin_inset Quotes erd \end_inset side A through their network. Only during the relatively small time period where application traffic was going to the old side A while not replicated to B due to the incident, a very small number of updates \emph on could \emph default have gone lost. In fields like webhosting, this is taken into account. Users will usually not complain when some (smaller amount of) data is lost due to split-brain. They will complain when the service is unavailable. \end_layout \begin_layout Standard This method is the fastest for restoring availability, because it doesn't try to execute any (remote) action at side A. Only from a sysadmin's perspective, there remain some cleanup tasks to be done during the following repair phase, such as split-brain resolution, which are outside the scope of this treatment. \end_layout \begin_layout Standard By running the application fencing step \emph on sequentially \emph default (including wait for its partial successfulness such that the old side A can no longer be reached by any users) in front of the failover step, you may minimize the amount of lost data, but at the cost of total duration. Your service will take longer to be available again, while the amount of lost data is typically somewhat smaller. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset A few people might clamour when some data is lost. In long-distance replication scenarios with high update traffic, there is \emph on simply no way at all \emph default for guaranteeing that no data can be lost ever. According to the laws of Einstein and the laws of Distributed Systems like the famous CAP theorem, this isn't the fault of DRBD+proxy or MARS, but simply the \emph on consequence \emph default of having long distances. If you want to protect against data loss as best as possible, then don't use \begin_inset Formula $k=2$ \end_inset replicas. Use \begin_inset Formula $k\geq4$ \end_inset , and spread them over different distances, such as mixed small + medium + long distances. Future versions of MARS will support adaptive pseudo-synchronous modes, which will allow individual adaptation to network latencies / distances. \end_layout \begin_layout Standard The ITON method can be adapted to shared-disk by additionally fencing the common disk from the (presumably) failed cluster node A. \end_layout \begin_layout Subsubsection Handover Methods \end_layout \begin_layout Standard Planned handover is conceptually simpler, because both sides must be (almost) healthy as a \emph on precondition \emph default . There are simply no pre-existing failures to deal with. \end_layout \begin_layout Standard Here is an example using DRBD, some application commands denoted as pseudo code: \end_layout \begin_layout Enumerate at side A: \family typewriter applicationmanager stop all \end_layout \begin_layout Enumerate at side A: \family typewriter drbdadm secondary all \end_layout \begin_layout Enumerate at side B: \family typewriter drbdadm primary all \end_layout \begin_layout Enumerate at side B: \family typewriter applicationmanager start all \end_layout \begin_layout Standard MARS already has a conceptual distinction between handover and failover. With MARS, it becomes even simpler, because a generic handover procedure is already built in: \end_layout \begin_layout Enumerate at side A: \family typewriter applicationmanager stop all \end_layout \begin_layout Enumerate at side B: \family typewriter marsadm primary all \end_layout \begin_layout Enumerate at side B: \family typewriter applicationmanager start all \end_layout \begin_layout Subsubsection Hybrid Methods \end_layout \begin_layout Standard In general, a planned handover may fail at any stage. Notice that such a failure is also a failure, but (partially) caused by the planned handover. You have the following alternatives for automatically dealing with such cases: \end_layout \begin_layout Enumerate In case of a failure, switch back to the old side A. \end_layout \begin_layout Enumerate Instead, forcefully switch to the new side A, similar to the methods described in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Failover-Methods" \end_inset . \end_layout \begin_layout Standard Similar options exist for a failed failover (at least in theory), but chances are lower for actually recovering if you have only \begin_inset Formula $k=2$ \end_inset replicas in total. \end_layout \begin_layout Standard Whatever you decide to do in what case in whatever priority order, whether you decide it in advance or during the course of a failing action: it simply means that according to the best effort principle, you should \series bold never leave your system in a broken state \series default when there exists a chance to recover availability with any method. \end_layout \begin_layout Standard Therefore, you should \emph on implement \emph default neither handover nor failover in their pure forms. Always implement hybrid forms following the best effort principle. \end_layout \begin_layout Subsection Special Requirements for Long Distances \begin_inset CommandInset label LatexCommand label name "sub:Special-Requirements-for" \end_inset \end_layout \begin_layout Standard Most contemporary clustermanagers have been constructed for short distance shared-nothing clusters, or even for \emph on local \emph default shared-nothing clusters (c.f. DRBD over crossover cables), or even for shared-disk clusters ( \emph on originally \emph default , when their \emph on concepts \emph default were developed). Blindly using them for long-distance replication without modification / adaptation bears some additional risks. \end_layout \begin_layout Itemize Notice that long-distance replication always \emph on requires \emph default a \series bold shared-nothing \series default model. \end_layout \begin_layout Itemize As a consequence, \series bold split brain \series default can appear \emph on regularly \emph default during failover. There is no way for preventing it! This is an \emph on inherent property \emph default of distributed systems, not limited to MARS (e.g. also ocurring with DRBD if you try to use it over long distances). Therefore, you \emph on must \emph default deal with occurences of split-brain as a \emph on requirement \emph default . \end_layout \begin_layout Itemize The probability of \series bold network partitions \series default is much higher: although you should have been required by Murphy's law to deal with network partitions already in short-distance scenarios, it now becomes \emph on mandatory \emph default . \end_layout \begin_layout Itemize Be prepared that in case of certain types of (more or less global) internet partitions, you may not be able to trigger STONITH actions \emph on at all \emph default . Therefore, \series bold fencing of application traffic \series default is \emph on mandatory \emph default . \end_layout \begin_layout Section Creating Backups via Pseudo Snapshots \end_layout \begin_layout Standard When all your secondaries are all homogenously located in a standby datacenter, they will be almost idle all the time. This is a waste of computing resources. \end_layout \begin_layout Standard Since MARS is no substitute for a full-fledged backup system, and since backups may put high system load onto your active side, you may want to utilize your passive hardware resources in a better way. \end_layout \begin_layout Standard MARS supports this thanks to its ability to switch the \family typewriter pause-replay \family default \emph on independently \emph default from \family typewriter pause-fetch \family default . \end_layout \begin_layout Standard The basic idea is simple: just use \family typewriter pause-replay \family default at your secondary site, but leave the replication of transaction logfiles intact by deliberately \emph on not \emph default saying \family typewriter pause-fetch \family default . This way, your secondary replica (block device) will stay frozen for a limited time, without loosing your redundancy: since the transaction logs will continue to replicate in the meantime, you can start \family typewriter resume-replay \family default at any time, in particular when a primary-side incident should happen unexpecte dly. The former secondary will just catch up by replaying the outstanding parts of the transaction logs in order to become recent. \end_layout \begin_layout Standard However, some \emph on details \emph default have to be obeyed. In particular, the current version of MARS needs an additional \family typewriter detach \family default operation, in order to release exclusive access to the underlying disk \family typewriter /dev/lv/$res \family default . Future versions of MARS are planned to support this more directly, without need for an intermediate \family typewriter detach \family default operation. \end_layout \begin_layout Standard \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Beware: \family typewriter mount -o ro /dev/vg/$res \family default can lead to \series bold unnoticed write operations \series default if you are not careful! Some journalling filesystems like \family typewriter xfs \family default or \family typewriter ext4 \family default may replay their journals onto the disk, leading to \emph on binary \emph default differences and thus \series bold destroying your consistency \series default later when you re-enable \family typewriter resume-replay \family default ! \end_layout \begin_layout Standard \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Therefore, you may use small LVM snapshots (only in such cases). Typically, \family typewriter xfs \family default journal replay will require only a few megabytes. Therefore you typically don't need much temporary space for this. Here is a more detailed description of steps: \end_layout \begin_layout Enumerate \family typewriter marsadm pause-replay $res \end_layout \begin_layout Enumerate \family typewriter marsadm detach $res \end_layout \begin_layout Enumerate \family typewriter lvcreate --size 100m --snapshot --name ro-$res /dev/vg/$res \end_layout \begin_layout Enumerate \family typewriter mount -o ro /dev/vg/ro-$res /mnt/tmp \end_layout \begin_layout Enumerate Now draw your backup from \family typewriter /mnt/tmp/ \end_layout \begin_layout Enumerate \family typewriter umount /mnt/tmp \end_layout \begin_layout Enumerate \family typewriter lvremove -f /dev/vg/ro-$res \end_layout \begin_layout Enumerate \family typewriter marsadm up $res \end_layout \begin_layout Standard Hint: during the backup, the transaction logs will accumulate on \family typewriter /mars/ \family default . In order to avoid overflow of \family typewriter /mars/ \family default (c.f. section \begin_inset CommandInset ref LatexCommand ref reference "sec:Defending-Overflow" \end_inset ), don't unnecessarily prolong the backup duration. \end_layout \begin_layout Chapter MARS for Developers \end_layout \begin_layout Standard This chapter is organized strictly top-down. \end_layout \begin_layout Standard If you are a sysadmin and want to inform yourself about internals (useful for debugging), the relevant information is at the beginning, and you don't need to dive into all technical details at the end. \end_layout \begin_layout Standard If you are a kernel developer and want to contribute code to the emerging MARS community, please read it (almost) all. Due to the top-down organization, sometimes you will need to follow some forward references in order to understand details. Therefore I recommend reading this chapter twice in two different reading modes: in the first reading pass, you just get a raw network of principles and structures in your brain (you don't want to grasp details, therefore don't strive for a full understanding). In the second pass, you will exploit your knowlegde from the first pass for a deeper understanding of the details. \end_layout \begin_layout Standard Alternatively, you may first read the sections about general architecture, and then start a bottom-up scan by first reading the last section about generic objects and aspects, and working in reverse \emph on section \emph default order (but read \emph on sub \emph default sections in-order) until you finally reach the kernel interfaces / symlink trees. \end_layout \begin_layout Section Motivation / Politics \end_layout \begin_layout Standard MARS is not yet upstream in the Linux kernel. This section tries to clear up some potential doubts. Some people have asked why MARS uses its own internal framework instead of \emph on directly \emph default \begin_inset Foot status open \begin_layout Plain Layout Notice that \emph on indirect \emph default use of pre-existing Linux infrastructure is not only possible, but actually implemented, by usinig it \emph on internally \emph default in brick \emph on implementations \emph default (black-box principle). However, such bricks are not portable to other environments like userspace. \end_layout \end_inset being based on some already existing Linux kernel infrastructures like the device mapper. Here is a list of technical reasons: \end_layout \begin_layout Enumerate The existing device mapper infrastructure is based on \family typewriter struct bio \family default . In contrast, the new XIO personality of the generic brick infrastructure is based on the concept of AIO (Asynchronous IO), which is a \series bold true superset \series default of block IO. \end_layout \begin_layout Enumerate In particular, \family typewriter struct bio \family default is firmly referencing to \family typewriter struct page \family default (via intermediate \family typewriter struct bio_vec \family default ), using types like \family typewriter sector_t \family default in the field \family typewriter bi_sector \family default . Basic transfer units are blocks, or sectors, or pages, or the like. In contrast, \family typewriter struct aio_object \family default used by the XIO personality can address \series bold arbitrary granularity \series default memory with byte resolution even at odd \begin_inset Foot status open \begin_layout Plain Layout Some brick \emph on implementations \emph default (as opposed to the capabilities of the \emph on interface \emph default ) may be (and, in fact, \emph on are \emph default ) restricted to \family typewriter PAGE_SIZE \family default operations or the like. This is no general problem, because IOP can automatically insert some translato r bricks extending the capabilities to universal granularity (of course at some performance costs). \end_layout \end_inset positions in (virtual) files / devices, similar to classical Unix file IO, but \emph on asynchronously \emph default . Practical experience shows that even non-functional properties like performance of many datacenter workloads are profiting from that \begin_inset Foot status open \begin_layout Plain Layout The current transaction logger uses variable-sized headers at \begin_inset Quotes eld \end_inset odd \begin_inset Quotes erd \end_inset addresses. Although this increases \family typewriter memcpy() \family default load due to \begin_inset Quotes eld \end_inset misalignment \begin_inset Quotes erd \end_inset , the \emph on overall performance \emph default was provably better than in variants where sector / page alignment was strictly obeyed, but space was wasted for alignments. Such functionality is only possible if the XIO infrastructure \emph on allows \emph default \emph on for \emph default (but doesn't force) \begin_inset Quotes eld \end_inset mis-aligned \begin_inset Quotes erd \end_inset IO operations. In future, many different transaction logfile formats showing different runtime behaviour (e.g. optimized for high-throughput SSD loads) may co-exist in parallel. Note that properly aligned XIO operations bear no noticeable overhead compared to classical block IO, at least in typical datacenter RAID scenarios. \end_layout \end_inset . The AIO/XIO abstraction contains no fixed link to kernel abstractions and should be \series bold easily portable \series default to other environments. In summary, the new personality provides a uniform abstraction which abstracts away from multiple different kernel interfaces; it is designed to be useful even in userspace. \end_layout \begin_layout Enumerate Kernel infrastructures for the concept of \emph on direct IO \emph default are different from those for \emph on buffered IO \emph default . The XIO personality used by MARS subsumes both concepts as use case \emph on variants \emph default . \series bold Buffering \series default is an optional internal property of XIO bricks (almost non-functional property with support for consistency guarantees). \end_layout \begin_layout Enumerate The AIO/XIO personality is generically designed for remote operations over networks, at arbitrary places in the IO stack, with (almost \begin_inset Foot status open \begin_layout Plain Layout By default, automatic network connection re-establishment and infinite network retries are already implemented in the \family typewriter xio_client \family default and \family typewriter xio_server \family default bricks to provide fully transparent semantics. However, this may be undesirable in case of fatal crashes. Therefore, abort operations are also configurable, as well as network timeouts which are then mapped to classical IO errors. \end_layout \end_inset ) no semantic differences to local operations (built-in \series bold network transparency \series default ). There are universal provisions for mixed operation of different versions ( \series bold rolling software updates \series default in clusters / grids). \end_layout \begin_layout Enumerate The generic brick infrastructure (as well as its personalities like XIO or any other future personality) supports \series bold dynamic re-wiring / re-configuration \series default \emph on during \emph default operation (even while parallel IO requests are flying, some of them taking different paths in the IO stack in parallel). This is absolutely needed for MARS logfile rotation. In the long term, this would be useful for many advanced new features and products, not limited to multipathing. \end_layout \begin_layout Enumerate The generic brick infrastructure (and in turn all personalities) provide \series bold additional comfort \series default to the programmer while enabling \series bold increased functionality \series default : by use of a generalization of \series bold aspect orientation \series default \begin_inset Foot status open \begin_layout Plain Layout Similar to AOP, insertion of IOP bricks for checking / debugging etc is one of the key advantages of the generic brick infrastructure. In contrast to AOP where debugging is usually {en,dis}abled statically at compile time, IOP allows for \emph on dynamic \emph default (re-)configuration of debugging bricks, automatic repair, and many more features promoted by \emph on organic computing \emph default . \end_layout \end_inset , the programmer need no longer worry about dynamic memory allocations for \emph on local state \emph default in a brick instance. MARS is \series bold automating local state \series default even when dynamically instantiating new bricks (possibly having the same brick type) at runtime. Specifially, XIO is automating \series bold request stacking \series default at the completion path this way, even while dynamically reconfiguring the IO stack \begin_inset Foot status open \begin_layout Plain Layout The generic aspect orientation approach leads to better \series bold separation of concerns \series default : local state needed by brick implementations is not visible from outside by default. In other words, local state is also \series bold private state \series default . Accidental hampering of internal operations is impeded. \end_layout \begin_layout Plain Layout Example from the kernel: in \family typewriter include/linux/blkdev.h \family default the definition of \family typewriter struct request \family default contains the following comment: \family typewriter /* the following two fields are internal, NEVER access directly */ \family default . It appears that \family typewriter struct request \family default contains not only fields relevant for the caller, but also \series bold internal fields \series default needed only in \emph on some \emph default \emph on specific \emph default callees. For example, \family typewriter rb_node \family default is documented to be used only in IO schedulers. \end_layout \begin_layout Plain Layout XIO goes one step further: there need not exist exactly one IO scheduler instance in the IO stack for a single device. Future \family typewriter xio_scheduler_{deadline,cfq,...} \family default brick types could be each instantiated many times, and in arbitrary places, even for the same (logical) device. The equivalent of \family typewriter rb_node \family default would then be automatically instantiated multiple times for the same IO request, by automatically instantiating the right local aspect instances. \end_layout \end_inset . A similar automation \begin_inset Foot status open \begin_layout Plain Layout DM can achieve stacking and dynamic routing by a workaround called \emph on request cloning \emph default , potentially leading to mass creation of temporary / intermediate object instances. \end_layout \end_inset does not exist in the rest of the Linux kernel. \end_layout \begin_layout Enumerate The generic brick infrastructure, together with personalities like XIO, enables \series bold new long-term functional and non-functional opportunities \series default by use of concepts from instance-oriented programming (IOP \begin_inset Foot status open \begin_layout Plain Layout See \begin_inset Flex URL status collapsed \begin_layout Plain Layout http://athomux.net/papers/paper_inst2.pdf \end_layout \end_inset \end_layout \end_inset ). The application area is \series bold not limited to device drivers \series default . For example, a new personality for \emph on stackable filesystems \emph default could be developed in future. \end_layout \begin_layout Standard In summary, anyone who would insist that MARS should be \emph on directly \begin_inset Foot status open \begin_layout Plain Layout Notice that kernel-specific structures like \family typewriter struct bio \family default are of course used by MARS, but only \emph on inside \emph default the blackbox implementation of bricks like \family typewriter mars_bio \family default or \family typewriter mars_if \family default which act as \series bold adaptors \series default to/from that structure. It is possible to write further adaptors, e.g. for direct interfacing to the device mapper infrastructure. \end_layout \end_inset \emph default based on pre-existing kernel structures / frameworks instead of contributing a new framework would cause a \emph on massive regression of functionality \emph default . \end_layout \begin_layout Itemize On one hand, all code contributed by the MARS project is \series bold non-intrusive \series default into the rest of the Linux kernel. From the viewpoint of other parts of the kernel, the whole addition \emph on behaves \emph default \emph on like \emph default a driver (although its infrastructure is much more than a driver). \end_layout \begin_layout Itemize On the other hand, if people are interested, the contributed infrastructure \emph on may \emph default be used to \emph on add \emph default to the power of the Linux kernel. It is designed to be \series bold open for contributions \series default . \end_layout \begin_layout Itemize A \emph on possible \emph default (but not the only possible) way to do this is giving the generic brick framework / the XIO personality as well as future personalities / the MARS application the status of a \emph on subsystem \emph default inside the kernel (in the long term), similar to the SCSI subsystem or the network subsystem. Noone is forced to use it, but anybody may use it if he/she likes. \end_layout \begin_layout Itemize Politically, the author is a FOSS advocate willing to collaborate and to support anyone interested in contributions. The author's personal interest is long-term and is open for both in-tree and out-of-tree extensions of both the framework and MARS by any other party obeying the GPL and not hazarding FOSS by patents (instead supporting organizations like the Open Invention Network). The author is open to closer relationships with the Linux Foundation and other parts of the Linux ecosystem. \end_layout \begin_layout Section Architecture Overview \end_layout \begin_layout Standard \begin_inset Graphics filename images/MARS_Framework_Architecture.pdf width 100col% \end_inset \end_layout \begin_layout Section Some Architectural Details \end_layout \begin_layout Standard The following pictures show some \begin_inset Quotes eld \end_inset zones of responsibility \begin_inset Quotes erd \end_inset , not necessarily a strict hierarchy (although Dijkstra's famous layering rules from THE are tried to be respected as much as possible). The construction principle follows the concept of \series bold Instance Oriented Programming \series default (IOP) described in \begin_inset Flex URL status collapsed \begin_layout Plain Layout http://athomux.net/papers/paper_inst2.pdf \end_layout \end_inset . Please note that MARS is only instance- \emph on based \emph default \begin_inset Foot status open \begin_layout Plain Layout Similar to OOP, where \begin_inset Quotes eld \end_inset object-based \begin_inset Quotes erd \end_inset means a weaker form of \begin_inset Quotes eld \end_inset object-oriented \begin_inset Quotes erd \end_inset , the term \begin_inset Quotes eld \end_inset instance-based \begin_inset Quotes erd \end_inset means that the \emph on strategy \emph default brick layer need not be fully modularized according to the IOP principles, but the \emph on worker \emph default brick layer already is. \end_layout \end_inset , while MARS Full is planned to be fully instance- \emph on oriented \emph default . \end_layout \begin_layout Subsection MARS Architecture \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/mars-light-architecture.fig width 40col% \end_inset \end_layout \begin_layout Subsection MARS Full Architecture (planned) \end_layout \begin_layout Standard \noindent \align center \begin_inset Graphics filename images/mars-full-architecture.fig width 80col% \end_inset \end_layout \begin_layout Section Documentation of the Symlink Trees \begin_inset CommandInset label LatexCommand label name "sec:Documentation-of-the" \end_inset \end_layout \begin_layout Standard The \family typewriter /mars/ \family default symlink tree is serving the following purposes, all at the same time: \end_layout \begin_layout Enumerate For \series bold communication \series default between cluster nodes, see sections \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Lamport-Clock" \end_inset and \begin_inset CommandInset ref LatexCommand ref reference "sec:The-Symlink-Tree" \end_inset . This communication is even the \emph on only \emph default communication between cluster nodes (apart from the \emph on contents \emph default of transaction logfiles and sync data). \end_layout \begin_layout Enumerate \series bold \emph on Internal \emph default interface \series default between the kernel module and the userspace tool \family typewriter marsadm \family default . \end_layout \begin_layout Enumerate \series bold \emph on Internal \emph default persistent repository \series default which keeps state information between reboots (also in case of node crashes). It is even the \emph on only \emph default place where state information is kept. There is no other place like \family typewriter /etc/drbd.conf \family default . \end_layout \begin_layout Standard \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Because of its internal character, its representation and semantics may change at any time without notice (e.g. via an \emph on internal \emph default upgrade procedure between major releases). It is \emph on not \emph default an external interface to the outer world. Don't build anything on it. \end_layout \begin_layout Standard However, knowledge of the symlink tree is useful for advanced sysadmins, for \series bold human inspection \series default and for \series bold debugging \series default . And, of course, for developers. \end_layout \begin_layout Standard As an \begin_inset Quotes eld \end_inset official \begin_inset Quotes erd \end_inset interface from outside, only the \family typewriter marsadm \family default command should be used. \end_layout \begin_layout Subsection Documentation of the MARS Symlink Tree \end_layout \begin_layout Section XIO Worker Bricks \end_layout \begin_layout Section StrategY Worker Bricks \end_layout \begin_layout Standard NYI \end_layout \begin_layout Section The XIO Brick Personality \end_layout \begin_layout Section The Generic Brick Infrastructure Layer \end_layout \begin_layout Section The Generic Object and Aspect Infrastructure \end_layout \begin_layout Chapter \start_of_appendix Technical Data MARS \end_layout \begin_layout Standard MARS has some built-in limitations which should be overcome \begin_inset Foot status open \begin_layout Plain Layout Some internal algorithms are quadratic. The reason is that MARS evolved from a lab prototype which wasn't originally intended for enterprise grade usage, but should have been succeeded by the fully instance-oriented MARS Full much earlier. \end_layout \end_inset by the future MARS Full. Please don't exceed the following limits: \end_layout \begin_layout Itemize maximum 10 nodes per cluster \end_layout \begin_layout Itemize maximum 10 resources per cluster \end_layout \begin_layout Itemize maximum 100 logfiles per resource \end_layout \begin_layout Chapter Handout for Midnight Problem Solving \end_layout \begin_layout Standard Here are generic instructions for the generic \family typewriter marsadm \family default and commandline level. Other levels (e.g. different types of cluster managers, PaceMaker, control scripts / \family typewriter rc \family default scripts / \family typewriter upstart \family default scripts, etc should be described elsewhere. \end_layout \begin_layout Section Inspecting the State of MARS \end_layout \begin_layout Standard For manual inspection, please prefer the new \family typewriter marsadm view all \family default over the old \family typewriter marsadm view-1and1 all \family default . It shows more appropriate / detailed information. \end_layout \begin_layout Standard Hint: this might change in future when somebody will program better marcros for the \family typewriter view-1and1 \family default variant, or create even better other macros. \end_layout \begin_layout Quotation \family typewriter \begin_inset listings inline false status open \begin_layout Plain Layout # watch marsadm view all \end_layout \end_inset \end_layout \begin_layout Standard Checking the low-level network connections at runtime: \end_layout \begin_layout Quotation \family typewriter \begin_inset listings inline false status open \begin_layout Plain Layout # watch "netstat --tcp | grep 777" \end_layout \end_inset \end_layout \begin_layout Standard Meaning of the port numbers (as currently configured into the kernel module, may change in future): \end_layout \begin_layout Itemize 7777 = metadata / symlink propagation \end_layout \begin_layout Itemize 7778 = transfer of transaction logfiles \end_layout \begin_layout Itemize 7779 = transfer of sync traffic \end_layout \begin_layout Standard 7777 must be always active on a healthy cluster. 7778 and 7779 will appear only on demand, when some data is transferred. \end_layout \begin_layout Standard Hint: when one of the columns Send-Q or Recv-Q are constantly at high values, you might have a network bottleneck. \end_layout \begin_layout Section Replication is Stuck \end_layout \begin_layout Standard Indications for a stuck: \end_layout \begin_layout Itemize One of the flags shown by \family typewriter marsadm view all \family default or \family typewriter marsadm view-flags all \family default contain a symbol \family typewriter "-" \family default (dash). This means that some switch is currently switched off (deliberately). Please check whether there is a valid reason why somebody else switched it off. If the switch-off is just by accident, use the following command to fix the stuck: \family typewriter \begin_inset listings inline false status open \begin_layout Plain Layout # marsadm up all \end_layout \end_inset \family default (or replace \family typewriter all \family default by a particular resource name if you want to start only a specific one). \begin_inset Newline newline \end_inset Note: \family typewriter up \family default is equivalent to the sequence \family typewriter attach; resume-fetch; resume-replay; resume-sync \family default . Instead of switching each individual knob, use \family typewriter up \family default as a shortcut for switching on anything which is currently off. \end_layout \begin_layout Itemize \family typewriter netstat --tcp | grep 7777 \family default does not show anything. Please check the following: \end_layout \begin_deeper \begin_layout Itemize Is the kernel module loaded? Check \family typewriter lsmod | grep mars \family default . When necessary, run \family typewriter modprobe mars \family default . \end_layout \begin_layout Itemize Is the network interface down? Check \family typewriter ifconfig \family default , and/or \family typewriter ethtool \family default and friends, and fix it when necessary. \end_layout \begin_layout Itemize Is a \family typewriter ping \family default possible? If not, fix the network / routing / firewall / etc. When fixed, the MARS connections should automatically appear after about 1 minute. \end_layout \begin_layout Itemize When \family typewriter ping \family default is possible, but a MARS connection to port 7777 does not appear after a few minutes, try to connect to remote port 7777 by hand via \family typewriter telnet \family default . But don't type anything, just abort the connection immediately when it works! Typing anything will almost certainly throw a harsh error message at the other server, which could unnecessarily alarm other people. \end_layout \end_deeper \begin_layout Itemize Check whether \family typewriter marsadm view all \family default shows some progress bars somewhere. Example: \family typewriter \size scriptsize \begin_inset listings inline false status open \begin_layout Plain Layout istore-test-bap1:~# marsadm view all \end_layout \begin_layout Plain Layout --------- resource lv-0 \end_layout \begin_layout Plain Layout lv-0 OutDated[F] PausedReplay dCAS-R Secondary istore-test-bs1 \end_layout \begin_layout Plain Layout replaying: [>...................] 1.21% (12/1020)MiB logs: [2..3] \end_layout \begin_layout Plain Layout > fetch: 1008.198 MiB rate: 0 B/sec remaining: --:--:-- hrs \end_layout \begin_layout Plain Layout > replay: 0 B rate: 0 B/sec remaining: 00:00:00 hrs \end_layout \end_inset \family default \size default At least one of the \family typewriter rate: \family default values should be greater than 0. When none of the \family typewriter rate: \family default values indicate any progress for a longer time, try \family typewriter marsadm up all \family default again. If it doesn't help, check and repair the network. If even this does not help, check the hardware for any IO hangups, or kernel hangups. First, check the RAID controllers. Often (but not certainly), a stuck kernel can be recognized when many processes are \emph on permanently \emph default in state "D", for a long time: \family typewriter ps ax | grep " D" | grep -v grep \family default or similar. Please check whether there is just an overload, or \emph on really \emph default a true kernel problem. Discrimination is not easy, and requires experience (as with any other system; not limited to MARS). A truly stuck kernel can only be resurrected by rebooting. The same holds for any hardware problems. \end_layout \begin_layout Itemize Check whether \family typewriter marsadm view all \family default reports any lines like \family typewriter WARNING: SPLIT BRAIN at '' detected \family default . In such a case, check that there is \emph on really \emph default a split brain, before obeying the instructions in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Resolution-of-Split" \end_inset . Notice that network outages or missing \family typewriter marsadm log-delete-all all \family default may continue to report an old split brain which has gone in the meantime. \end_layout \begin_layout Itemize Check whether \family typewriter /mars/ \family default is too full. For a rough impression, \family typewriter df /mars/ \family default may be used. For getting authoritative values as internally used by the MARS emergency-mode computations, use \family typewriter marsadm view-rest-space \family default (the unit is GiB). In practice, the differences are only marginal, at least on bigger \family typewriter /mars/ \family default partitions. When there is only few rest space (or none at all), please obey the instruction s in section \begin_inset CommandInset ref LatexCommand ref reference "sec:Resolution-of-Emergency" \end_inset . \end_layout \begin_layout Section Resolution of Emergency Mode \begin_inset CommandInset label LatexCommand label name "sec:Resolution-of-Emergency" \end_inset \end_layout \begin_layout Standard Emergency mode occurs when \family typewriter /mars/ \family default runs out of space, such that no new logfile data can be written anymore. \end_layout \begin_layout Standard In emergency mode, the primary will write any write requests \emph on directly \emph default to the underlying disk, as if MARS were not present at all. Thus, your application will continue to run. Only the \emph on replication \emph default as such is stopped. \end_layout \begin_layout Standard \begin_inset Note Greyedout status open \begin_layout Plain Layout Notice: emergency mode means that your secondary nodes are usually in a \emph on consistent \emph default , but \emph on outdated \emph default state (exception: when a sync was running in parallel to the emergency mode, then the sync will be automatically started over again). You can check consistency via \family typewriter marsadm view-flags all \family default . Only when a local disk shows a lower-case letter \family typewriter "d" \family default instead of an uppercase \family typewriter "D" \family default , it is known to be inconsistent (e.g. during a sync). When there is a dash instead, it usually means that the disk is detatched or misconfigured or the kernel module is not started. Please fix these problems first before believing that your local disk is unusable. Even if it is really inconsistent (which is very unlikely, typically occurring only as a consequence of hardware failures, or of the above-mentioned exception ), you have a big chance to recover most of the data via \family typewriter fsck \family default and friends. \end_layout \end_inset \end_layout \begin_layout Standard A currently existing Emergency mode can be detected by \begin_inset listings inline false status open \begin_layout Plain Layout primary:~# marsadm view-is-emergency all \end_layout \begin_layout Plain Layout secondary:~# marsadm view-is-emergency all \end_layout \end_inset Notice: this delivers the current state, telling nothing about the past. \end_layout \begin_layout Standard Currently, emergency mode will also show something like \family typewriter WARNING: SPLIT BRAIN at '' detected \family default . This ambiguity will be resolved in a future MARS release. It is however not crucial: the resolution methods for both cases are very similar. If in doubt, start emergency resolution first, and only proceed to split brain resoultion if it did not help. \end_layout \begin_layout Standard Preconditions: \end_layout \begin_layout Itemize Only current version of MARS: the space at the primary side should have been already released, and the emergency mode should have been already left. Otherwise, you might need the split-brain resolution method from section \begin_inset CommandInset ref LatexCommand ref reference "sec:Resolution-of-Split" \end_inset . \end_layout \begin_layout Itemize The network \series bold must \series default be working. Check that the following gives an entry for each secondary: \begin_inset listings inline false status open \begin_layout Plain Layout primary:~# netstat --tcp | grep 7777 \end_layout \end_inset When necessary, fix the network first (see instructions above). \end_layout \begin_layout Standard Emergency mode should now be resolved via the following instructions: \begin_inset listings inline false status open \begin_layout Plain Layout primary:~# marsadm view-is-emergency all \end_layout \begin_layout Plain Layout primary:~# du -s /mars/resource-* | sort -n \end_layout \end_inset Remember the affected resources. Best practice is to do the following, starting with the \emph on biggest \emph default resource as shown by the \family typewriter du | sort \family default output in reverse order, but \emph on starting \emph default the following only with the \emph on affected \emph default resources in the first place: \begin_inset listings inline false status open \begin_layout Plain Layout secondary1:~# marsadm invalidate \end_layout \begin_layout Plain Layout secondary1:~# marsadm log-delete-all all \end_layout \begin_layout Plain Layout ... dito with all resources showing emergency mode \end_layout \begin_layout Plain Layout ... dito on all other secondaries \end_layout \begin_layout Plain Layout primary:~# marsadm log-delete-all all \end_layout \end_inset \end_layout \begin_layout Standard Hint: during the resolution process, some other resources might have gone into emergency mode concurrently. In addition, it is possible that some secondaries are stuck at particular resources while the corresponding primary has \emph on not yet \emph default entered emergency mode. Please repeat the steps in such a case, and look for emergency modes at secondaries additionally. When necessary, extend your list of \emph on affected \emph default resources. \end_layout \begin_layout Standard Hint: be patient. Deleting large bulks of logfile data may take a long time, at least on highly loaded systems. You should give the cleanup processes at least 5 minutes before concluding that an \family typewriter invalidate \family default followed by \family typewriter log-delete-all \family default had no effect! Don't forget to give the \family typewriter log-delete-all \family default at all cluster nodes, even when seemingly unaffected. \end_layout \begin_layout Standard In very complex scenarios, when the primary roles of different resources are spread over diffent hosts (aka mixed operation), you may need to repeat the whole cycle iteratively for a few cycles until the jam is resolved. \end_layout \begin_layout Standard If it does not go away, you have another chance by the following split-brain resolution process, which will also cleanup emergency mode as a side effect. \end_layout \begin_layout Section Resolution of Split Brain and of Emergency Mode \begin_inset CommandInset label LatexCommand label name "sec:Resolution-of-Split" \end_inset \end_layout \begin_layout Standard Hint: in many cases (but not guaranteed), the previous receipe for resolution of emergency mode will also cleanup split brain. Good chances are in case of \begin_inset Formula $k=2$ \end_inset total replicas. Please collect your own experiences which method works better for you! \end_layout \begin_layout Standard Precondition: the network must be working. Check that the following gives an entry for each secondary: \begin_inset listings inline false status open \begin_layout Plain Layout primary:~# netstat --tcp | grep 7777 \end_layout \end_inset When necessary, fix the network first (see instructions above). \end_layout \begin_layout Standard Inspect the split brain situation: \begin_inset listings inline false status open \begin_layout Plain Layout primary:~# marsadm view all \end_layout \begin_layout Plain Layout primary:~# du -s /mars/resource-* | sort -n \end_layout \end_inset Remember those resources where a message like \family typewriter WARNING: SPLIT BRAIN at '' detected \family default appears. Do the following only for \emph on affected \emph default resources, starting with the biggest one (before proceeding to the next one). \end_layout \begin_layout Standard Do the following with only \emph on one \emph default resource at a time (before proceeding to the next one), and repeat the actions on that resource at every secondary (if there are multiple secondaries) : \begin_inset listings inline false status open \begin_layout Plain Layout secondary1:~# marsadm leave-resource $res1 \end_layout \begin_layout Plain Layout secondary1:~# marsadm log-delete-all all \end_layout \end_inset Check whether the split brain has vanished everywhere. Startover with other resources at their secondaries when necessary. \end_layout \begin_layout Standard Finally, when no split brain is reported at any (former) secondary, do the following on the primary: \begin_inset listings inline false status open \begin_layout Plain Layout primary:~# marsadm log-delete-all all \end_layout \begin_layout Plain Layout primary:~# sleep 30 \end_layout \begin_layout Plain Layout primary:~# marsadm view all \end_layout \end_inset Now, the split brain should be gone even at the primary. If not, repeat this step. \end_layout \begin_layout Standard In case even this should fail on some \family typewriter $res \family default (which is very unlikely), read the PDF manual before using \family typewriter marsadm log-purge-all $res \family default . \end_layout \begin_layout Standard Finally, when the split brain is gone everywhere, rebuild the redundancy at every secondary via \begin_inset listings inline false status open \begin_layout Plain Layout secondary1:~# marsadm join-resource $res1 /dev//$res1 \end_layout \end_inset \end_layout \begin_layout Standard \noindent If even this method does not help, setup the whole cluster afresh by \family typewriter rmmod mars \family default everywhere, and creating a fresh \family typewriter /mars/ \family default filesystem everywhere, followed by the same procedure as installing MARS for the first time (which is outside the scope of this handout). \end_layout \begin_layout Section Handover of Primary Role \end_layout \begin_layout Standard When there exists a method for primary handover in higher layers such as cluster managers, please prefer that method (e.g. \family typewriter cm3 \family default or other tools). \end_layout \begin_layout Standard If suchalike doesn't work, or if you need to handover some resource \family typewriter $res1 \family default by hand, do the following: \end_layout \begin_layout Itemize Stop the load / application corresponding to \family typewriter $res1 \family default on the old primary side. \end_layout \begin_layout Itemize \family typewriter umount /dev/mars/$res1 \family default , or otherwise close any openers such as iSCSI. \end_layout \begin_layout Itemize At the new primary: \family typewriter marsadm primary $res1 \end_layout \begin_layout Itemize Restart the application at the new site (in reverse order to above). In case you want to switch \emph on all \emph default resources which are not yet at the new side, you may use \family typewriter marsadm primary all \family default . \end_layout \begin_layout Section Emergency Switching of Primary Role \end_layout \begin_layout Standard Emergency switching is necessary when your primary is no longer reachable over the network for a \emph on longer \emph default time, or when the hardware is defective. \end_layout \begin_layout Standard Emergency switching will very often lead to a split brain, which requires lots of manual actions to resolve (see above). Therefore, try to avoid emergency switching when possible! \end_layout \begin_layout Standard Hint: MARS can automatically recover after a primary crash / reboot, as well as after secondary crashes, just by executing \family typewriter modprobe mars \family default after \family typewriter /mars/ \family default had been mounted. Please consider to wait until your system comes up again, instead of risking a split brain. \end_layout \begin_layout Standard The decision between emergency switching and continuing operation at the same primary side is an operational one. MARS can support your decision by the following information at the potentially new primary side (which was in secondary mode before): \family typewriter \size scriptsize \begin_inset listings inline false status open \begin_layout Plain Layout istore-test-bap1:~# marsadm view all \end_layout \begin_layout Plain Layout --------- resource lv-0 \end_layout \begin_layout Plain Layout lv-0 InConsistent Syncing dcAsFr Secondary istore-test-bs1 \end_layout \begin_layout Plain Layout syncing: [====>..............] 27.84% (567/2048)MiB rate: 72583.00 KiB/sec remaining: 00:00:20 hrs \end_layout \begin_layout Plain Layout > sync: 567.293/2048 MiB rate: 72583 KiB/sec remaining: 00:00:20 hrs \end_layout \begin_layout Plain Layout replaying: [>:::::::::::::::::::] 0.00% (0/12902)KiB logs: [1..1] \end_layout \begin_layout Plain Layout > fetch: 0 B rate: 38 KiB/s remaining: 00:00:00 \end_layout \begin_layout Plain Layout > replay: 12902.047 KiB rate: 0 B/s remaining: --:--:-- \end_layout \end_inset \family default \size default When your target is syncing (like in this example), you cannot switch to it (same as with DRBD). When you had an emergency mode before, you should first resolve that (whenever possible). When a split brain is reported, try to resolve it first (same as with DRBD). Only in case you \emph on know \emph default that the primary is really damaged, or it is really impossible to the run the application there for some reason, emergency switching is desirable. \end_layout \begin_layout Standard Hint: in case the secondary is inconsistent for some reason, e.g. because of an incremental fast full-sync, you have a last chance to recover most data after forceful switching by using a filesystem check or suchalike. This might be even faster than restoring data from the backup. But use it only if you are \emph on really \emph default desperate! \end_layout \begin_layout Standard The amount of data which is \emph on known \emph default to be missing at your secondary is shown after the \family typewriter > fetch: \family default in human-readable form. However, in cases of networking problems this information may be outdated. You \emph on always \emph default need to consider further facts which cannot be known by MARS. \end_layout \begin_layout Standard When there exists a method for emergency switching of the primary in higher layers such as cluster managers, please prefer that method in front of the following one. \end_layout \begin_layout Standard If suchalike doesn't work, or when a handover attempt has failed several times, or if you \emph on really need \emph default forceful switching of some resource \family typewriter $res1 \family default by hand, you can do the following: \end_layout \begin_layout Itemize When possible, stop the load / application corresponding to \family typewriter $res1 \family default on the old primary side. \end_layout \begin_layout Itemize When possible, \family typewriter umount /dev/mars/$res1 \family default , or otherwise close any openers such as iSCSI. \end_layout \begin_layout Itemize When possible (if you have some time), wait until as much data has been propagated to the new primary as possible (watch the \family typewriter fetch: \family default indicator). \end_layout \begin_layout Itemize At the new primary: \family typewriter marsadm disconnect $res1; marsadm primary --force $res1 \end_layout \begin_layout Itemize Restart the application at the new site (in reverse order to above). \end_layout \begin_layout Itemize After the application is known to run reliably, check for split brains and cleanup them when necessary. \end_layout \begin_layout Chapter Alternative Methods for Split Brain Resolution \begin_inset CommandInset label LatexCommand label name "chap:Alternative-Methods-for" \end_inset \end_layout \begin_layout Standard Instead of \family typewriter marsadm invalidate \family default , the following steps may be used. In preference, start with the old \begin_inset Quotes eld \end_inset wrong \begin_inset Quotes erd \end_inset primaries first: \end_layout \begin_layout Enumerate \family typewriter marsadm leave-resource mydata \end_layout \begin_layout Enumerate After having done this on one cluster node, check whether the split brain is already gone (e.g. by saying \family typewriter marsadm view mydata \family default ). There are chances that you don't need this on all of your nodes. Only in very rare \begin_inset Foot status open \begin_layout Plain Layout When your network had partitioned in a very awkward way for a long time, and when your partitioned primaries did several \family typewriter log-rotate \family default operations indendently from each other, there is a small chance that \family typewriter leave-resource \family default does not clean up \emph on all \emph default remains of such an awkward situation. Only in such a case, try \family typewriter log-purge-all \family default . \end_layout \end_inset cases, it might happen that the preceding l \family typewriter eave-resource \family default operations were not able to clean up all logfiles produced in parallel by the split brain situation. \end_layout \begin_layout Enumerate Read the documentation about \family typewriter log-purge-all \family default (see page \begin_inset CommandInset ref LatexCommand pageref reference "log-purge-all$res" \end_inset ) and use it. \end_layout \begin_layout Enumerate If you want to restore redundancy, you can follow-up a \family typewriter join-resource \family default phase to the old resource name (using the correct device name, double-check it!) This will restore your redundancy by overwriting your bad split brain version with the correct one. \end_layout \begin_layout Standard \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset It is important to resolve the split brain \emph on before \emph default you can start the \family typewriter join-resource \family default reconstruction phase! In order to keep as many \begin_inset Quotes eld \end_inset good \begin_inset Quotes erd \end_inset versions as possible (e.g. for emergency cases), don't re-join them all in parallel, but rather start with the oldest / most outdated / worst / inconsistent version first. It is recommended to start the next one only when the previous one has sucessfully finished. \end_layout \begin_layout Chapter Alternative De- and Reconstruction of a Damaged Resource \begin_inset CommandInset label LatexCommand label name "chap:Alternative-De--and" \end_inset \end_layout \begin_layout Standard In case \family typewriter leave-resource --host= \family default does not work, you may use the following fallback. On the surviving new designated primary, give the following commands: \end_layout \begin_layout Enumerate \family typewriter marsadm disconnect-all mydata \end_layout \begin_layout Enumerate \family typewriter marsadm down mydata \end_layout \begin_layout Enumerate Check by hand whether your local disk is consistent, e.g. by test-mounting it readonly, \family typewriter fsck \family default , etc. \end_layout \begin_layout Enumerate \family typewriter marsadm delete-resource mydata \end_layout \begin_layout Enumerate Check whether the other vital cluster nodes don't report the dead resource any more, e.g. \family typewriter marsadm view all \family default at \emph on each \emph default of them. In case the resource has not disappeared anywhere (which may happen during network problems), do the \family typewriter down ; delete-resource \family default steps also there (optionally again with \family typewriter --force \family default ). \end_layout \begin_layout Enumerate Be sure that the resource has disappeared \emph on everywhere \emph default . When necessary, repeat the \family typewriter delete-resource \family default with \family typewriter --force \family default . \end_layout \begin_layout Enumerate \family typewriter marsadm create-resource newmydata ... \family default at the \emph on correct \emph default node using the \emph on correct \emph default disk device containing the \emph on correct \emph default version, and further steps to setup your resource from scratch, preferably under a different name to minimize any risk. \end_layout \begin_layout Standard \noindent In any case, \series bold manually check \series default whether a split brain is reported for any resource on any of your \emph on surviving \emph default cluster nodes. If you find one there (and only then), please (re-)execute the split brain resolution steps on the affected node(s). \end_layout \begin_layout Chapter Cleanup in case of Complicated Cascading Failures \begin_inset CommandInset label LatexCommand label name "sub:Cleanup-in-case" \end_inset \end_layout \begin_layout Standard MARS does its best to recover even from multiple failures (e.g. \series bold rolling disasters \series default ). Chances are high that the instructions from sections \begin_inset CommandInset ref LatexCommand ref reference "sub:Split-Brain-Resolution" \end_inset \begin_inset CommandInset ref LatexCommand ref reference "sub:Final-Destroy-of" \end_inset or appendix \begin_inset CommandInset ref LatexCommand ref reference "chap:Alternative-Methods-for" \end_inset \begin_inset CommandInset ref LatexCommand ref reference "chap:Alternative-De--and" \end_inset will work even in case of multiple failures, such as a network failure plus local node failure at only 1 node (even if that node is the former primary node). \end_layout \begin_layout Standard However, in general (e.g. when more than 1 node is damaged and/or when the filesystem \family typewriter /mars/ \family default is badly damaged) there is no general guarantee that recovery will \emph on always \emph default succeed under \emph on any \emph default (weird) circumstances. That said, your chances for recovery are \emph on very \emph default high when some disk remains usable at least at one of your surviving secondarie s. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset It should be very hard to finally trash a secondary, because the transaction logfiles are containing \family typewriter md5 \family default checksums for all data records. Any attempt to replay currupted logfiles is refused by MARS. In addition, the sequence numbers of \family typewriter log-rotate \family default d logfiles are checked for contiguity. Finally, the \emph on sequence path \emph default of logfile applications (consisting of logfile names plus their respective length) is additionally secured by a \family typewriter git \family default -like incremental checksum over the whole path history (so-called \begin_inset Quotes eld \end_inset version links \begin_inset Quotes erd \end_inset ). This should detect split brains even if logfiles are appended / modified \emph on after \emph default a (forceful) switchover has already taken place. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresToxiques.png lyxscale 50 scale 17 \end_inset That said, your risk of final data loss is very high if you remove the \series bold BBU \series default from your hardware RAID controller before all hot data has been flushed to the physical disks. Therefore, never try to \begin_inset Quotes eld \end_inset repair \begin_inset Quotes erd \end_inset a seemingly dead node before your replication is up again somewhere else! Only unplug the network cables when advised, but never try to repair the hardware instantly! \end_layout \begin_layout Standard In case of desperate situations where none of the previous instructions have succeeded, your last chance is rebuilding all your resources from intact disks as follows: \end_layout \begin_layout Enumerate Do \family typewriter rmmod mars \family default on all your cluster nodes and/or reboot them. Note: if you are less desperate, chances are high that the following will also work when the kernel module remains active and everywhere a \family typewriter marsadm down \family default is given instead, but for an \emph on ultimate \emph default instruction you should eliminate \emph on potential \emph default kernel problems by \family typewriter rmmod \family default / \family typewriter reboot \family default , at least if you can afford the downtime on concurrently operating resources. \end_layout \begin_layout Enumerate For safety, physically remove the storage network cables on \emph on all \emph default your cluster nodes. Note: the same disclaimer holds. MARS really does its best, even when \family typewriter delete-resource \family default is given while the network is fully active and multiple split-brain primaries are actively using their local device in parallel (approved by some testcases from the automatic test suite, but note that it is impossible to catch all possible failure scenarios). Don't challenge your fate if you are desperate! Don't \emph on rely \emph default on this! Nothing is absolutely fail-safe! \end_layout \begin_layout Enumerate \series bold Manually \series default check which surviving disk is usable, and which is the \begin_inset Quotes eld \end_inset best \begin_inset Quotes erd \end_inset one for your purpose. \end_layout \begin_layout Enumerate Do \family typewriter modprobe mars \family default \emph on only \emph default on that node. If that fails, \family typewriter rmmod \family default and/or reboot again, and start over with a completely fresh \family typewriter /mars/ \family default partition ( \family typewriter mkfs.ext4 /mars/ \family default or similar) \emph on everywhere \emph default on \emph on all \emph default cluster nodes, and continue with step 7. \end_layout \begin_layout Enumerate If your old \family typewriter /mars/ \family default works, and you did not already (forcefully) switch your designated primary to the final destination, do it now (see description in section \begin_inset CommandInset ref LatexCommand ref reference "sub:Forced-Switching" \end_inset ). Wait until any old logfile data has been replayed. \end_layout \begin_layout Enumerate Say \family typewriter marsadm delete-resource mydata --force \family default . This will cleanup all internal symlink tree information for the resource, but will leave your disk data intact. \end_layout \begin_layout Enumerate Locally build up the new resource(s) as usual, out of the underlying disks. \end_layout \begin_layout Enumerate Check whether the new resource(s) work in standalone mode. \end_layout \begin_layout Enumerate When necessary, repeat these steps with other resources. \end_layout \begin_layout Standard Now you can choose how the rebuild your cluster. If you rebuilt \family typewriter /mars/ \family default anywhere, you \emph on must \emph default rebuild it on \emph on all \emph default new cluster nodes and start over with a fresh \family typewriter join-cluster \family default on each of them, from scratch. It is not possible to mix the old cluster with the new one. \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash begin{enumerate} \backslash setcounter{enumi}{9} \end_layout \end_inset \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash item \end_layout \end_inset Finally, do all the necessary \family typewriter join-resource \family default s on the respective cluster nodes, according to your new redundancy scenario after the failures (e.g. after activating spare nodes, etc). If you have \begin_inset Formula $k>2$ \end_inset replicas, start \family typewriter join-resource \family default on the worst / most damaged version first, and start the next preferably only after the previous sync has completed successfully. This way, you will be permanently retaining some (old and outdated, but hopefully potentially usable) replicas while a sync is running. Don't start too many syncs in parallel. \end_layout \begin_layout Standard \begin_inset ERT status open \begin_layout Plain Layout \backslash end{enumerate} \end_layout \end_inset \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Never use \family typewriter delete-resource \family default twice on the same resource name, after you have already a working standalone primary \begin_inset Foot status open \begin_layout Plain Layout Of course, when you don't have created the \emph on same \emph default resource anew, you may repeat \family typewriter delete-resource \family default on other cluster nodes in order to get rid of local files / symlinks which had not been propagated to other nodes before. \end_layout \end_inset . You might accidentally destroy your again-working copy! You \emph on can \emph default issue \family typewriter delete-resource \family default multiple times on different nodes, e.g. when the network has problems, but doing so \emph on after \emph default re-establishment of the initial primary bears some risk. Therefore, the safest way is first deleting the resources everywhere, and then starting over afresh. \end_layout \begin_layout Standard Before re-connecting any network cable on any non-primary (new secondaries), ensure that all \family typewriter /dev/mars/mydata \family default devices are no longer in use (e.g. from an old primary role before the incident happened), and that each local disk is detached. Only after that, you should be able to safely re-connect the network. The \family typewriter delete-resource \family default given at the new primary should propagate now to each of your secondaries, and your local disk should be usable for a re- \family typewriter join-resource \family default . \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset When you did not rebuild your cluster from scratch with fresh \family typewriter /mars/ \family default filesystems, and one of the old cluster nodes is supposed to be removed permanently, use \family typewriter leave-resource \family default (optionally with \family typewriter --host= \family default and/or \family typewriter --force \family default ) and finally \family typewriter leave-cluster \family default . \end_layout \begin_layout Chapter Experts only: Special Trick Switching and Rebuild \begin_inset CommandInset label LatexCommand label name "chap:Experts-only:-Special" \end_inset \end_layout \begin_layout Standard The following is a further alternative for \series bold experts \series default who really know what they are doing. The method is very simple and therefore well-suited for coping with mass failures, e.g. \series bold power blackout of whole datacenters \series default . \end_layout \begin_layout Standard In case a primary datacenter fails as a whole for whatever reason and you have a backup datacenter, do the following steps in the backup datacenter: \end_layout \begin_layout Enumerate Fencing step: by means of firewalling, \series bold ensure \series default that the (virtually) damaged datacenter nodes \series bold cannot \series default be reached over the network. For example, you may place REJECT rules into all of your local iptables firewalls at the backup datacenter. Alternatively / additionally, you may block the routes at the appropriate central router(s) in your network. \end_layout \begin_layout Enumerate Run the sequence \family typewriter marsadm disconnect all; marsadm primary --force all \family default on all nodes in the backup datacenter. \end_layout \begin_layout Enumerate Restart your services in the backup datacenter (as far as necessary). Depending on your network setup, further steps like switching BGP routes etc may be necessary. \end_layout \begin_layout Enumerate Check that \emph on all \emph default your services are \emph on really \emph default up and running, before you try to repair anything! Failing to do so may result in data loss when you execute the following restore method for \emph on experts \emph default . \end_layout \begin_layout Standard Now your backup datacenter should continue servicing your clients. The final reconstruction of the originally primary datacenter works as follows: \end_layout \begin_layout Enumerate At the damaged primary datacenter, ensure that nowhere the MARS kernel module is running. In case of a power blackout, you shouldn't have executed an automatic \family typewriter modprobe mars \family default anywhere during reboot, so you should be already done when all your nodes are up again. In case some nodes had no reboot, execute \family typewriter rmmod mars \family default everywhere. If \family typewriter rmmod \family default refuses to run, you may need to umount the \family typewriter /dev/mars/mydata \family default device first. When nothing else helps, you may just mass reboot your hanging nodes. \end_layout \begin_layout Enumerate At the failed side, do \family typewriter rm -rf /mars/resource-$mydata/ \family default for all those resources which had been primary before the blackout. Do this \emph on only \emph default for those cases, otherwise you will need unnecessary \family typewriter leave-resource \family default s or \family typewriter invalidate \family default s later (e.g. when half of your nodes were already running at the surving side). In order to avoid unnecessary traffic, please do this only as far as really necessary. Don't remove any other directories. In particular, \family typewriter /mars/ips/ \family default \emph on must \emph default remain intact. In case you accidentally deleted them, or you had to re-create \family typewriter /mars/ \family default from scratch, try \family typewriter rsync \family default with the correct options. \begin_inset Newline newline \end_inset \begin_inset Graphics filename images/MatieresCorrosives.png lyxscale 50 scale 17 \end_inset Caution! before doing this, check that the corresponding directory exists at the backup datacenter, and that it is \emph on really \emph default healthy! \end_layout \begin_layout Enumerate Un-Fencing: restore your network firewall / routes and check that they work ( \family typewriter ping \family default etc). \end_layout \begin_layout Enumerate Do \family typewriter modprobe mars \family default everywhere. All missing directories and their missing symlinks should be automatically fetched from the backup datacenter. \end_layout \begin_layout Enumerate Run \family typewriter marsadm join-resource $res \family default , but only at those places where the directory was removed previously, while using the same disk devices as before. This will minimize actual traffic thanks to the fast full sync algorithm. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset It is \series bold crucial \series default that the fencing step \series bold must \series default be executed \emph on before \emph default any \family typewriter primary --force \family default ! This way, no split brain will be \emph on visible \emph default at the backup datacenter side, because there is simply no chance for transferri ng different versions over the network. It is also crucial to remove any (potentially diverging) resource directories \emph on before \emph default the \family typewriter modprobe \family default ! This way, the backup datacenter never runs into split brain. This saves you a lot of detail work for split brain resolution when you have to restore bulks of nodes in a short time. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset In case the repair of a full datacenter should take so extremely long that some \family typewriter /mars/ \family default partitions are about to run out of space at the surviving side, you may use the \family typewriter leave-resource --host=failed-node \family default trick described earlier, followed by \family typewriter log-delete-all \family default . Best if you have prepared a fully automatic script long before the incident, which executes suchalike only as far as necessary in each individual case. \end_layout \begin_layout Standard \noindent \begin_inset Graphics filename images/lightbulb_brightlit_benj_.png lyxscale 12 scale 7 \end_inset Even better: train such scenarios in advance, and prepare scripts for mass automation. Look into section \begin_inset CommandInset ref LatexCommand ref reference "sec:Scripting-HOWTO" \end_inset . \end_layout \begin_layout Chapter GNU Free Documentation License \begin_inset CommandInset label LatexCommand label name "chap:GNU-FDL" \end_inset \end_layout \begin_layout Standard \noindent \family typewriter \size footnotesize \begin_inset ERT status open \begin_layout Plain Layout \backslash lstinputlisting{fdl.txt} \end_layout \end_inset \end_layout \end_body \end_document