diff --git a/docu/images/Architecure_Big_Cluster.pdf b/docu/images/Architecure_Big_Cluster.pdf
new file mode 100644
index 00000000..2616f7dd
Binary files /dev/null and b/docu/images/Architecure_Big_Cluster.pdf differ
diff --git a/docu/images/Architecure_Sharding.pdf b/docu/images/Architecure_Sharding.pdf
new file mode 100644
index 00000000..45870cb2
Binary files /dev/null and b/docu/images/Architecure_Sharding.pdf differ
diff --git a/docu/images/MARS_Background_Migration.pdf b/docu/images/MARS_Background_Migration.pdf
new file mode 100644
index 00000000..56c6ee1e
Binary files /dev/null and b/docu/images/MARS_Background_Migration.pdf differ
diff --git a/docu/images/MARS_Cluster_on_Demand.pdf b/docu/images/MARS_Cluster_on_Demand.pdf
new file mode 100644
index 00000000..8b5f41dc
Binary files /dev/null and b/docu/images/MARS_Cluster_on_Demand.pdf differ
diff --git a/docu/mars-manual.lyx b/docu/mars-manual.lyx
index 46d19a68..e3cdecb6 100644
--- a/docu/mars-manual.lyx
+++ b/docu/mars-manual.lyx
@@ -328,6 +328,1292 @@ name "chap:Why-You-should"
\end_layout
+\begin_layout Section
+Cost Arguments from Architecture
+\end_layout
+
+\begin_layout Standard
+Datacenters aren't usually operated for fun or for hobby.
+ Costs are therefore a very important argument.
+\end_layout
+
+\begin_layout Standard
+Many enterprise system architects are starting with a particular architecture
+ in mind, called
+\begin_inset Quotes eld
+\end_inset
+
+big cluster
+\begin_inset Quotes erd
+\end_inset
+
+.
+ There is a common belief that otherwise
+\series bold
+scalability
+\series default
+ could not be achieved:
+\end_layout
+
+\begin_layout Standard
+\noindent
+\align center
+\begin_inset Graphics
+ filename images/Architecure_Big_Cluster.pdf
+ width 100col%
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+\noindent
+The crucial point is the storage network here:
+\begin_inset Formula $n$
+\end_inset
+
+ frontend servers are interconnected with
+\begin_inset Formula $m=O(n)$
+\end_inset
+
+ storage servers, in order to achieve properties like scalability, failure
+ tolerance, etc.
+\end_layout
+
+\begin_layout Standard
+Since
+\emph on
+any
+\emph default
+ of the
+\begin_inset Formula $n$
+\end_inset
+
+ frontends must be able to access
+\emph on
+any
+\emph default
+ of the
+\begin_inset Formula $m$
+\end_inset
+
+ storages in realtime, the storage network must be dimensioned for
+\begin_inset Formula $O(n\cdot m)=O(n^{2})$
+\end_inset
+
+ network connections running in parallel.
+ Even if the total network throughput would be scaling only with
+\begin_inset Formula $O(n)$
+\end_inset
+
+, the network has to
+\emph on
+switch
+\emph default
+ the packets from
+\begin_inset Formula $n$
+\end_inset
+
+ sources to
+\begin_inset Formula $m$
+\end_inset
+
+ destinations (and their opposite way back) in
+\series bold
+realtime
+\series default
+.
+\end_layout
+
+\begin_layout Standard
+This
+\series bold
+cross-bar functionality
+\series default
+ in realtime makes the storage network expensive.
+ Some further factors are increasing the costs of storage networks:
+\end_layout
+
+\begin_layout Itemize
+In order to limit error propagation from other networks, the storage network
+ is often built as a
+\emph on
+physically separate
+\emph default
+ /
+\emph on
+dedicated
+\emph default
+ network.
+
+\end_layout
+
+\begin_layout Itemize
+Because storage networks are heavily reacting to high latencies and packet
+ loss, they often need to be dimensioned for the
+\series bold
+worst case
+\series default
+ (load peaks, packet storms, etc), needing one of the best = most expensive
+ components for reducing latency and increasing throughput.
+ Dimensioning to the worst case instead of an average case plus some safety
+ margins is nothing but an expensive
+\series bold
+overdimensioning
+\series default
+ /
+\series bold
+over-engineering
+\series default
+.
+\end_layout
+
+\begin_layout Itemize
+When multipathing is required for improving fault tolerance of the storage
+ network itself, these efforts will even
+\series bold
+double
+\series default
+.
+\end_layout
+
+\begin_layout Itemize
+When geo-redundancy is required, the whole mess may easily more than double
+ another time because in cases of disasters like terrorist attacks the backup
+ datacenter must be prepared for taking over for multiple days or weeks.
+\end_layout
+
+\begin_layout Standard
+Fortunately, there is an alternative called
+\begin_inset Quotes eld
+\end_inset
+
+sharding architecture
+\begin_inset Quotes erd
+\end_inset
+
+ which does not need a storage network at all, at least when built and dimension
+ed properly.
+ Instead, it
+\emph on
+should have
+\emph default
+ (but not always needs) a so-called replication network which can, when
+ present, be dimensioned much smaller because it does neither need realtime
+ operations, nor scalabiliy to
+\begin_inset Formula $O(n^{2})$
+\end_inset
+
+:
+\end_layout
+
+\begin_layout Standard
+\noindent
+\align center
+\begin_inset Graphics
+ filename images/Architecure_Sharding.pdf
+ width 100col%
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+\noindent
+Sharding architectures are extremely well suited when both the input traffic
+ and the data is
+\series bold
+already partitioned
+\series default
+.
+ For example, when several thousands or even millions of customers are operating
+ on disjoint data sets, like in web hosting where each webspace is residing
+ in its own home directory, or when each of millions of mySQL database instances
+ has to be isolated from its neighbour.
+\end_layout
+
+\begin_layout Standard
+Even in cases when any customer may potentially access any of the data items
+ residing in the whole storage pool (e.g.
+ like in a search engine), sharding can be often applied.
+ The trick is to create some relatively simple content-based dynamic switching
+ or redirect mechanism in the input network traffic, similar to HTTP load
+ balancers or redirectors.
+\end_layout
+
+\begin_layout Standard
+Only when partitioning of input traffic plus data is not possible in a reasonabl
+e way, big cluster architectures as implemented for example in Ceph or Swift
+ (and partly even possible with MARS when resticted to the block layer)
+ have their
+\series bold
+usecase
+\series default
+.
+ Only under such a precondition they are really needed.
+\end_layout
+
+\begin_layout Standard
+When sharding is possible, it is the preferred model due to cost and performance
+ reasons.
+\end_layout
+
+\begin_layout Standard
+\noindent
+\begin_inset Graphics
+ filename images/lightbulb_brightlit_benj_.png
+ lyxscale 12
+ scale 7
+
+\end_inset
+
+Notice that MARS' new remote device feature from the 0.2 branch series (which
+ is a replacement for iSCSI)
+\emph on
+could
+\emph default
+ be used for implementing the
+\begin_inset Quotes eld
+\end_inset
+
+big cluster
+\begin_inset Quotes erd
+\end_inset
+
+ model at block layer.
+\end_layout
+
+\begin_layout Standard
+Nevertheless, this sub-variant is not the preferred model.
+ Following is the a super-model which combines both the
+\begin_inset Quotes eld
+\end_inset
+
+big cluster
+\begin_inset Quotes erd
+\end_inset
+
+ and sharding model at block lyer in a very flexible way.
+ The following example shows only two servers from a pool consisting of
+ hundreds or thousands of servers:
+\end_layout
+
+\begin_layout Standard
+\noindent
+\align center
+\begin_inset Graphics
+ filename images/MARS_Cluster_on_Demand.pdf
+ width 100col%
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+\noindent
+The idea is to use iSCSI or the MARS remote device
+\emph on
+only where necessary
+\emph default
+.
+ Preferably, local storage is divided into multiple Logical Volumes (LVs)
+ via LVM, which are
+\emph on
+directly
+\emph default
+ used
+\emph on
+locally
+\emph default
+ by Virtual Machines (VMs), such as KVM or filesystem-based variants like
+ LXC containers.
+\end_layout
+
+\begin_layout Standard
+In the above example, the left machine has relatively less CPU power or
+ RAM than storage capacity.
+ Therefore, not
+\emph on
+all
+\emph default
+ LVs could be instantiated locally at the same time without causing operational
+ problems, but
+\emph on
+some
+\emph default
+ of them can be run locally.
+ The example solution is to
+\emph on
+exceptionally(!)
+\emph default
+ export LV3 to the right server, which has some otherwise unused CPU and
+ RAM capacity.
+\end_layout
+
+\begin_layout Standard
+Notice that locally running VMs doesn't produce any storage network traffic
+ at all.
+ Therefore, this is the preferred runtime configuration.
+\end_layout
+
+\begin_layout Standard
+Only in cases of resource imbalance, such as (transient) CPU or RAM peaks
+ (e.g.
+ caused by DDOS attacks),
+\emph on
+some
+\emph default
+ containers may be run somewhere else over the network.
+ In a well-balanced and well-dimensioned system, this will be the
+\series bold
+vast minority
+\series default
+, and should be only used for dealing with timely load peaks etc.
+\end_layout
+
+\begin_layout Standard
+Running VMs directly on the same servers as their storage is a
+\series bold
+major cost reducer.
+\end_layout
+
+\begin_layout Standard
+You simply don't need to buy and operate
+\begin_inset Formula $n+m$
+\end_inset
+
+ servers, but only about
+\begin_inset Formula $\max(n,m)+m\cdot\epsilon$
+\end_inset
+
+ servers, where
+\begin_inset Formula $\epsilon$
+\end_inset
+
+ corresponds to some relative small extra resources needed by MARS.
+\end_layout
+
+\begin_layout Standard
+\noindent
+\begin_inset Graphics
+ filename images/lightbulb_brightlit_benj_.png
+ lyxscale 12
+ scale 7
+
+\end_inset
+
+In addition to this and to reduced networking costs, there are further cost
+ savings at power consumption, air conditioning, Height Units (HUs), number
+ of HDDs, operating costs, etc as explained below in section
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "sec:Cost-Arguments-from"
+
+\end_inset
+
+.
+\end_layout
+
+\begin_layout Standard
+The sharding model needs a different approach to load balancing of storage
+ space than the big cluster model.
+ There are serveral possibilities at different layers:
+\end_layout
+
+\begin_layout Itemize
+Dynamically growing the sizes of LVs via
+\family typewriter
+lvresize
+\family default
+ followed by
+\family typewriter
+marsadm resize
+\family default
+ followed by
+\family typewriter
+xfs_growfs
+\family default
+ or similar operations.
+\end_layout
+
+\begin_layout Itemize
+Moving customer data at filesystem or database level via
+\family typewriter
+rsync
+\family default
+ or
+\family typewriter
+mysqldump
+\family default
+ or similar.
+\end_layout
+
+\begin_layout Itemize
+Moving whole LVs via MARS, as shown in the following example:
+\end_layout
+
+\begin_layout Standard
+\noindent
+\align center
+\begin_inset Graphics
+ filename images/MARS_Background_Migration.pdf
+ width 100col%
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+\noindent
+The idea is to dynamically create
+\emph on
+additional
+\emph default
+ LV replicas for the sake of background migration.
+ Examples:
+\end_layout
+
+\begin_layout Itemize
+In case you had no redundancy at LV level before, you have
+\begin_inset Formula $k=1$
+\end_inset
+
+ replicas during ordinary operation.
+ If not yet done, you should transparently introduce MARS into your LVM-based
+ stack by using the so-called
+\begin_inset Quotes eld
+\end_inset
+
+standalone mode
+\begin_inset Quotes erd
+\end_inset
+
+ of MARS.
+ When necessary, create the first MARS replica with
+\family typewriter
+marsadm create-resource
+\family default
+ on your already-existing LV data, which is retained unmodified, and restart
+ your application again.
+ Now, for the sake of migration, you just create an additional replica at
+ another server via
+\family typewriter
+marsadm join-resource
+\family default
+ there and wait until the second mirror has been fully
+\series bold
+synced
+\series default
+ in background, while your application is running and while the contents
+ of the LV is modified
+\emph on
+in parallel
+\emph default
+ by your ordinary applications.
+ Then you do a primary
+\series bold
+handover
+\series default
+ to your mirror.
+ This is usually a matter of minutes, or even seconds.
+ Once the application runs again at the new location, you can delete the
+ old replica via
+\family typewriter
+marsadm leave-resource
+\family default
+ and
+\family typewriter
+lvremove
+\family default
+.
+ Finally, you may re-use the freed-up space for something else (e.g.
+
+\family typewriter
+lvresize
+\family default
+ of
+\emph on
+another
+\emph default
+ LV followed by
+\family typewriter
+marsadm resize
+\family default
+ followed by
+\family typewriter
+xfs_growfs
+\family default
+ or similar).
+ For the sake of some hardware lifecycle, you may run a different strategy:
+ evacuate the original source server completely via the above MARS migration
+ method, and eventually decommission it.
+\end_layout
+
+\begin_layout Itemize
+In case you already have a redundant LV copy somewhere, you should run a
+ similar procedure, but starting with
+\begin_inset Formula $k=2$
+\end_inset
+
+ replicas, and temporarily increasing the number of replicas to either
+\begin_inset Formula $k'=3$
+\end_inset
+
+ when moving each replica step-by-step, or you may even directly go up to
+
+\begin_inset Formula $k'=4$
+\end_inset
+
+ when moving pairs at once.
+\end_layout
+
+\begin_layout Itemize
+When already starting with
+\begin_inset Formula $k>2$
+\end_inset
+
+ LV replicas in the starting position, you can do the same analogously,
+ or you may then use a lesser variant.
+ For example, we have some mission-critical servers at 1&1 which are running
+
+\begin_inset Formula $k=4$
+\end_inset
+
+ replicas all the time on relatively small but important LVs for extremely
+ increased safety.
+ Only in such a case, you may have the freedom to temporarily decrease from
+
+\begin_inset Formula $k=4$
+\end_inset
+
+ to
+\begin_inset Formula $k'=3$
+\end_inset
+
+ and then going up to
+\begin_inset Formula $k''=4$
+\end_inset
+
+ again.
+ This has the advantage of requiring less temporary storage space for
+\emph on
+swapping
+\emph default
+ some LVs.
+\end_layout
+
+\begin_layout Section
+Cost Arguments from Technology
+\begin_inset CommandInset label
+LatexCommand label
+name "sec:Cost-Arguments-from"
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+A common pre-jugdement is that
+\begin_inset Quotes eld
+\end_inset
+
+big cluster
+\begin_inset Quotes erd
+\end_inset
+
+ is the cheapest scaling storage technology when built on so-called
+\begin_inset Quotes eld
+\end_inset
+
+commodity hardware
+\begin_inset Quotes erd
+\end_inset
+
+.
+ While this is very often true for the
+\begin_inset Quotes eld
+\end_inset
+
+commodity hardware
+\begin_inset Quotes erd
+\end_inset
+
+ part, it is often not true for the
+\begin_inset Quotes eld
+\end_inset
+
+big cluster
+\begin_inset Quotes erd
+\end_inset
+
+ part.
+ But let us first look at the
+\begin_inset Quotes eld
+\end_inset
+
+commodity
+\begin_inset Quotes erd
+\end_inset
+
+ part.
+\end_layout
+
+\begin_layout Standard
+Here are some rough market prices for basic storage as determined around
+ end of 2016 / start of 2017:
+\end_layout
+
+\begin_layout Standard
+\noindent
+\align center
+\begin_inset Tabular
+
+
+
+
+
+
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+Technology
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+Enterprise-Grade
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+Price in € / TB
+\end_layout
+
+\end_inset
+ |
+
+
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+Consumer SATA disks via on-board SATA controllers
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+no (small-scale)
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+< 30 possible
+\end_layout
+
+\end_inset
+ |
+
+
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+SAS disks via SAS HBAs (e.g.
+ in external 14
+\begin_inset Quotes erd
+\end_inset
+
+ shelfs)
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+halfways
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+< 80
+\end_layout
+
+\end_inset
+ |
+
+
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+SAS disks via hardware RAID + LVM (+DRBD/MARS)
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+yes
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+80 to 150
+\end_layout
+
+\end_inset
+ |
+
+
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+Commercial storage appliances via iSCSI
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+yes
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+around 1000
+\end_layout
+
+\end_inset
+ |
+
+
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+Cloud storage, S3 over 5 years lifetime
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+yes
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\size small
+3000 to 8000
+\end_layout
+
+\end_inset
+ |
+
+
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+\noindent
+You can see that any self-built and self-administered storage (whose price
+ varies with slower high-capacity versus faster low-capacity disks) is much
+ cheaper than any commercial offering by about a factor of 10 or even more.
+ If you need to operate serveral petabytes of data, self-built storage is
+ always cheaper than commercial one, even if additional manpower would be
+ needed for commissioning and operating.
+ Here we just assume that the storage is needed permanently for at least
+ 5 years, as is the case in web hosting, databases, backup / archival systems,
+ and many other application areas.
+\end_layout
+
+\begin_layout Standard
+Cloud storage is way too much hyped.
+ From a commercial perspective it usually pays off only when your storage
+ demands are
+\emph on
+extremely
+\emph default
+ varying over time, and when you need some
+\emph on
+extra
+\emph default
+ capacity only
+\emph on
+temporarily
+\emph default
+ for a
+\emph on
+very
+\emph default
+ short time.
+\end_layout
+
+\begin_layout Standard
+In addition to basic storage prices, many further factors come into play
+ when roughly comparing big clusters versus sharding (
+\family roman
+\series medium
+\shape up
+\size normal
+\emph off
+\bar no
+\strikeout off
+\uuline off
+\uwave off
+\noun off
+\color none
+
+\begin_inset Formula $\times2$
+\end_inset
+
+
+\family default
+\series default
+\shape default
+\size default
+\emph default
+\bar default
+\strikeout default
+\uuline default
+\uwave default
+\noun default
+\color inherit
+ means with geo-redundancy):
+\end_layout
+
+\begin_layout Standard
+\noindent
+\align center
+\begin_inset Tabular
+
+
+
+
+
+
+
+
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+BC
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+SHA
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+BC
+\begin_inset Formula $\times2$
+\end_inset
+
+
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+SHA
+\begin_inset Formula $\times2$
+\end_inset
+
+
+\end_layout
+
+\end_inset
+ |
+
+
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+# of Disks
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+>200%
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+<120%
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+>400%
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+<240%
+\end_layout
+
+\end_inset
+ |
+
+
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+# of Servers
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+\begin_inset Formula $\approx\times2$
+\end_inset
+
+
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+\begin_inset Formula $\approx\times1.1$
+\end_inset
+
+ possible
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+\begin_inset Formula $\approx\times4$
+\end_inset
+
+
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+\begin_inset Formula $\approx\times2.2$
+\end_inset
+
+
+\end_layout
+
+\end_inset
+ |
+
+
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+Power Consumption
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+\begin_inset Formula $\approx\times2$
+\end_inset
+
+
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+dito
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+dito
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+dito
+\end_layout
+
+\end_inset
+ |
+
+
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+HU Consumption
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+\begin_inset Formula $\approx\times2$
+\end_inset
+
+
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+dito
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+dito
+\end_layout
+
+\end_inset
+ |
+
+\begin_inset Text
+
+\begin_layout Plain Layout
+dito
+\end_layout
+
+\end_inset
+ |
+
+
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+\noindent
+The crucial point is not only the number of extra servers needed for dedicated
+ storage boxes, but also the total number of HDDs.
+ While big cluster implementations like Ceph or Swift can
+\emph on
+theoretically
+\emph default
+ use some erasure encoding for avoiding full object replicas, their
+\emph on
+practice
+\emph default
+ as seen in our internal 1&1 Ceph clusters is similar to RAID-10, but just
+ on objects instead of block-based sectors.
+\end_layout
+
+\begin_layout Standard
+Therefore a big cluster typically needs >200% disks to reach the same net
+ capacity as a sharded cluster, where typically hardware RAID-60 with a
+ significantly smaller overhead is sufficient for providing sufficient failure
+ tolerance at disk level.
+\end_layout
+
+\begin_layout Standard
+There is a surprising consequence from this: geo-redundancy is not as expensive
+ as many people are believing.
+ It just needs to be built with the proper architecture.
+ A sharded geo-redundant pool based on hardware RAID-60 costs roughly about
+ the same as (or when taking
+\begin_inset Formula $O(n^{2})$
+\end_inset
+
+ storage networks into account it is possibly even cheaper than) a big cluster
+ with full replicas without geo-redundancy.
+ A geo-redundant sharded pool provides even better failure compensation.
+\end_layout
+
+\begin_layout Standard
+Notice that geo-redundancy implies by definition that an unforeseeable
+\series bold
+full datacenter loss
+\series default
+ (e.g.
+ caused by
+\series bold
+disasters
+\series default
+ like a terrorist attack or an earthquake) must be compensated for
+\series bold
+several days or weeks
+\series default
+.
+ Therefore it is
+\emph on
+not
+\emph default
+ sufficient to take a big cluster and just spread it to two different locations.
+\end_layout
+
+\begin_layout Standard
+In any case, a MARS-based geo-redundant sharding pool is cheaper than using
+ commercial storage appliances which are much more expensive by their nature.
+\end_layout
+
+\begin_layout Section
+Performance Arguments from Architecture
+\end_layout
+
\begin_layout Standard
Some people think that replication is easily done at filesystem layer.
There exist lots of cluster filesystems and other filesystem-layer solutions
@@ -346,7 +1632,7 @@ Choosing the wrong layer for
mass data replication
\series default
may get you into trouble.
- Here is an explaination why replication at the block layer is more easy
+ Here is an explanation why replication at the block layer is more easy
and less error prone:
\end_layout
@@ -29189,7 +30475,7 @@ This section addresses some wide-spread misconceptions.
Its main target audience is developers, but sysadmins will profit from
\series bold
-detailed explainations of problems and pitfalls
+detailed explanations of problems and pitfalls
\series default
.
When the problems described in this section are solved somewhen in future,