Asynchronous Block-Level Storage Replication
Go to file
Thomas Schoebel-Theuer 45f462026f marsadm: fix leave-resource cleanup
Now the sequence leave-resource ; join-resource should work.

When the last member of the resource has gone and create-resource is
tried anew, there is a new saftey measure: the old resource directory
is left over deliberately, thus the new create-resource will deny creation
because some unreachable cluster node may have existed, such that we
didn't even know of its resource membership. This very special
case requires --force and some handwork cleanup.
2013-07-04 07:21:00 +02:00
docu doc: add presentation slides from LCA2013 2013-01-29 22:32:22 +01:00
kernel light: allow remote deletion of directories 2013-06-29 21:15:18 +02:00
pre-patches all: update pre-patches 2013-06-29 21:15:17 +02:00
scripts infra: move script 'gen_config.pl' to scripts/ 2013-04-12 08:46:58 +02:00
testing add some small testscripts 2013-01-23 20:05:37 +01:00
userspace marsadm: fix leave-resource cleanup 2013-07-04 07:21:00 +02:00
.gitattributes infra: add .gitignore 2013-01-08 15:53:47 +01:00
.gitignore all: preparations for out-of-tree build 2013-04-11 11:01:25 +02:00
AUTHORS all: prepare publication at github 2013-01-25 11:58:46 +01:00
COPYING all: prepare publication at github 2013-01-25 11:58:46 +01:00
ChangeLog all: prepare publication at github 2013-01-25 11:58:46 +01:00
INSTALL all: prepare publication at github 2013-01-25 11:58:46 +01:00
Makefile.dist infra: Makefile.dist fix GITHEAD initialization 2013-04-15 18:34:44 +02:00
NEWS all: prepare publication at github 2013-01-25 11:58:46 +01:00
README doc: update README 2013-06-29 21:15:17 +02:00

README

GPLed software AS IS, sponsored by 1&1 Internet AG (www.1und1.de).

Contact: tst@1und1.de

--------------------------------

Abstract:

MARS Light is almost a drop-in replacement for DRBD
(that is, block-level storage replication).

In contrast to plain DRBD, it works _asynchronously_ and over
arbitrary distances. Our internal 1&1 testing runs between datacenters
in the US and Europe. MARS uses very different technology under the
hood, similar to transaction logging of database systems.

Reliability: application and replication are completely decoupled.
Networking problems (e.g. packet loss, bottlenecks) have no
impact onto your application at the primary side.

Anytime Consistency: on a secondary node, its version of the underlying
disk device is always consistent in itself, but may be outdated
(represent a former state from the primary side). Thanks to
incremental replication of the transaction logfiles, usually the
lag-behind will be only a few seconds, or parts of a second.

Synchronous or near-synchronous operating modes are planned for
the future, but are expected to _reliably_ work only over short 
distances (less than 50km), due to fundamental properties
of distributed systems.

Although many people ask for synchronous modes and although they
would be very easy to implement (basically just add some additional
wait conditions to turn asynchronous IO into synchronous one), I don't
want to implement them for now.

One reason is DRBD which already does a good job for that ("RAID-1 over
network" which works extremely well on crossover cables).
MARS is no RAID. The transaction logging of MARS is fundamentally
different from that.

The other reason is that I personally am not convinced by our experiences
with synchronous replication in the presence of network bottlenecks.
Even relatively short bundled 10Gbit lines between datacenters form
a bottleneck where suddenly some unexpected jitter / packet loss may occur,
leading to effects similar to "traffic jam".

MARS has simply another application area which is different from DRBD.

WARNING! Current stage is BETA. Don't put productive data on it!

Documentation: currently under construction, see docu/mars-manual.pdf

Concepts:

See later chapters in docu/mars-manual.pdf .

For a very short intro, see my LCA2013 presentation docu/MARS_LCA2013.pdf .

There is also an internal 2-years old concept paper which is so much outdated,
that I don't want to publish it. 

The fundamental construction principle of the planned MARS Full
is called Instance Oriented Programming (IOP) and is described in
the following paper:

http://athomux.net/papers/paper_inst2.pdf

History:

As you can see in the git log, it evolved from a very experimental
concept study, starting in the Summer of 2010.
At this time, I was working on it in my spare time.

In Summer 2011, an "official" internal 1&1 project started, which aimed
to deliver a proof of concept.

In February 2012, a pilot system was rolled out to an internal statistics
server, which collects statistics data from thousands of other servers,
and thus produces a very heavy random-access write load, formerly
replicated with DRBD (which led to performance problems due to massive
randomness). After switching to MARS, the performance was provably
better.
That server was selected because potential loss of statistics data
would be not be that critical as with other productive data, but
nevertheless it operates on productive data and loads.

After curing some small infancy problems, that server runs until today
without problems. It was upgraded to newer versions of MARS several
times (indicated by some of the git tags). Our sysadmins switched the
primary side a few times, without informing me, so I could
sleep better at night without knowing what they did ;)

In Summer 2012, the next "official" internal 1&1 project started. Its goal
is to reach enterprise grade, and therefore to rollout MARS Light on
~15 productive servers, starting with less critical systems like ones
for test webspaces etc. This project will continue until Summer 2013.

In December 2012 (shortly before Christmas), I got the official permission
from our CTO Henning Kettler to publish MARS under GPL on github.

Many thanks to him!

Before that point, I was bound to my working contract which kept internal
software as secret by default (when there was no explicit permission).

Now there is a chance to build up an opensource
community for MARS, partially outside of 1&1.

Please contribute! I will be open.

I also try to respect the guidelines from Linus, but probably this
will need more work. I am already planning to invest some time into
community revision of the sourcecode, but there is not yet any schedule.

In May 2013, I got help by my new collegue Frank Liepold. He currently
creates a fully automatic test suite which automates regression tests
(goal: rolling releases). That test suite is based on the internal
test suite of blkreplay and will also be published soon.

Hopefully, there will be an iternal 1&1 followup project for
mass rollout to some thousands of servers.