RepoMirrors/mars: Asynchronous Block-Level Storage Replication - mars

Asynchronous Block-Level Storage Replication

Go to file

Thomas Schoebel-Theuer dab60da817 marsadm: allow calling multiple functions in phases Add infrastructure for splitting commands in multiple phases. Usually, phase0 will check for some preconditions, while phase1 will execute the command. The final result will only be committed if nothing fails. The difference to the old behaviour will only show up when combined with 'all' resources. If anything fails in phase0, nothing will be touched in phase1. The old behaviour could touch some resources, but omit others when something failed. The new behaviour is more transactional-like.		2013-06-03 09:05:46 +02:00
docu	doc: add presentation slides from LCA2013	2013-01-29 22:32:22 +01:00
kernel	marsadm: check attach state	2013-05-13 12:50:29 +02:00
pre-patches	all: move kernel source into separate directory	2013-04-08 17:01:37 +02:00
scripts	infra: move script 'gen_config.pl' to scripts/	2013-04-12 08:46:58 +02:00
testing	add some small testscripts	2013-01-23 20:05:37 +01:00
userspace	marsadm: allow calling multiple functions in phases	2013-06-03 09:05:46 +02:00
.gitattributes	infra: add .gitignore	2013-01-08 15:53:47 +01:00
.gitignore	all: preparations for out-of-tree build	2013-04-11 11:01:25 +02:00
AUTHORS	all: prepare publication at github	2013-01-25 11:58:46 +01:00
COPYING	all: prepare publication at github	2013-01-25 11:58:46 +01:00
ChangeLog	all: prepare publication at github	2013-01-25 11:58:46 +01:00
INSTALL	all: prepare publication at github	2013-01-25 11:58:46 +01:00
Makefile.dist	infra: Makefile.dist fix GITHEAD initialization	2013-04-15 18:34:44 +02:00
NEWS	all: prepare publication at github	2013-01-25 11:58:46 +01:00
README	doc: improve README	2013-01-31 21:33:46 +01:00

README

GPLed software AS IS, sponsored by 1&1 Internet AG (www.1und1.de).

Contact: tst@1und1.de

--------------------------------

Abstract:

MARS Light is almost a drop-in replacement for DRBD
(that is, block-level storage replication).

In contrast to plain DRBD, it works _asynchronously_ and over
arbitrary distances. My regular testing runs between datacenters
in the US and Europe. MARS uses very different technology under the
hood, similar to transaction logging of database systems.

Reliability: application and replication are completely decoupled.
Networking problems (e.g. packet loss, bottlenecks) have no
impact onto your application at the primary side.

Anytime consistency: on a secondary node, its version of the
block device is always consistent in itself, but may be outdated
(represent a former state from the primary side). Thanks to
incremental replication of the transaction logfiles, usually the
lag-behind will be only a few seconds, or parts of a second.

Synchronous or near-synchronous operating modes are planned for
the future, but are expected to _reliably_ work only over short 
distances (less than 50km), due to fundamental properties
of the network.

WARNING! Current stage is BETA. Don't put productive data on it!

Documentation: currently very rudimentary, some even in German.
This will be fixed soon.

Concepts:

There is a 2-years old concept paper in German which is so much outdated,
that I don't want to publish it. Please be patient until I write a
comprehensive paper at the concept level in English.

For the meantime, please look at my presentation about MARS at LCA2013
(linux.conf.au or look into ./docu/).

History:

As you can see in the git log, it evolved from a very experimental
concept study, starting in the Summer of 2010.
At this time, I was working on it in my spare time.

In Summer 2011, an "official" internal 1&1 project started, which aimed
to deliver a proof of concept.
In February 2012, a pilot system was rolled out to an internal statistics
server, which collects statistics data from thousands of other servers,
and thus produces a very heavy random-access write load, formerly
replicated with DRBD (which led to performance problems due to massive
randomness). After switching to MARS, the performance was provably
better.
This server was selected because potential loss of statistics data
would be not be that critical as with other productive data, but
nevertheless it operates on productive data and loads.

After curing some small infancy problems, this server runs until today
(end of January 2013) without problems. Our sysadmins even switched the
primary side a few times, without informing me, so I could
sleep better at night without knowing what they did ;)

In Summer 2012, the next "official" internal 1&1 project started. Its goal
is to reach enterprise grade, and therefore to rollout MARS Light on
~10 productive servers, starting with less critical systems like ones
for test webspaces etc. This project will continue until Summer 2013.

Hopefully, there will be a followup project for mass rollout to some
thousands of servers.

In December 2012 (shortly before Christmas), I got the official permission
from our CTO Henning Kettler to publish MARS under GPL on github.

Many thanks to him!

Before that point, I was bound to my working contract which keeps internal
software as secret by default (when there is no explicit permission).

Now there is a chance to build up an opensource
community for MARS, partially outside of 1&1.

Please contribute! I will be open.

I also try to respect the guidelines from Linus, but probably this
will need more work. Help is always welcome!