2013-01-25 10:58:46 +00:00
|
|
|
GPLed software AS IS, sponsored by 1&1 Internet AG (www.1und1.de).
|
|
|
|
|
2013-01-29 21:28:21 +00:00
|
|
|
Contact: tst@1und1.de
|
|
|
|
|
|
|
|
--------------------------------
|
|
|
|
|
|
|
|
Abstract:
|
|
|
|
|
2013-01-25 10:58:46 +00:00
|
|
|
MARS Light is almost a drop-in replacement for DRBD
|
|
|
|
(that is, block-level storage replication).
|
|
|
|
|
|
|
|
In contrast to plain DRBD, it works _asynchronously_ and over
|
2013-06-23 07:24:36 +00:00
|
|
|
arbitrary distances. Our internal 1&1 testing runs between datacenters
|
2013-01-31 20:33:46 +00:00
|
|
|
in the US and Europe. MARS uses very different technology under the
|
2013-01-29 21:28:21 +00:00
|
|
|
hood, similar to transaction logging of database systems.
|
|
|
|
|
2013-01-31 20:33:46 +00:00
|
|
|
Reliability: application and replication are completely decoupled.
|
|
|
|
Networking problems (e.g. packet loss, bottlenecks) have no
|
|
|
|
impact onto your application at the primary side.
|
|
|
|
|
2013-06-23 07:24:36 +00:00
|
|
|
Anytime Consistency: on a secondary node, its version of the underlying
|
|
|
|
disk device is always consistent in itself, but may be outdated
|
2013-01-29 21:28:21 +00:00
|
|
|
(represent a former state from the primary side). Thanks to
|
|
|
|
incremental replication of the transaction logfiles, usually the
|
|
|
|
lag-behind will be only a few seconds, or parts of a second.
|
|
|
|
|
2013-01-31 20:33:46 +00:00
|
|
|
Synchronous or near-synchronous operating modes are planned for
|
|
|
|
the future, but are expected to _reliably_ work only over short
|
|
|
|
distances (less than 50km), due to fundamental properties
|
2013-06-23 07:24:36 +00:00
|
|
|
of distributed systems.
|
|
|
|
|
|
|
|
Although many people ask for synchronous modes and although they
|
|
|
|
would be very easy to implement (basically just add some additional
|
|
|
|
wait conditions to turn asynchronous IO into synchronous one), I don't
|
|
|
|
want to implement them for now.
|
|
|
|
|
|
|
|
One reason is DRBD which already does a good job for that ("RAID-1 over
|
|
|
|
network" which works extremely well on crossover cables).
|
|
|
|
MARS is no RAID. The transaction logging of MARS is fundamentally
|
|
|
|
different from that.
|
|
|
|
|
|
|
|
The other reason is that I personally am not convinced by our experiences
|
|
|
|
with synchronous replication in the presence of network bottlenecks.
|
|
|
|
Even relatively short bundled 10Gbit lines between datacenters form
|
|
|
|
a bottleneck where suddenly some unexpected jitter / packet loss may occur,
|
|
|
|
leading to effects similar to "traffic jam".
|
|
|
|
|
|
|
|
MARS has simply another application area which is different from DRBD.
|
2013-01-25 10:58:46 +00:00
|
|
|
|
|
|
|
WARNING! Current stage is BETA. Don't put productive data on it!
|
|
|
|
|
2013-06-23 07:24:36 +00:00
|
|
|
Documentation: currently under construction, see docu/mars-manual.pdf
|
2013-01-25 10:58:46 +00:00
|
|
|
|
|
|
|
Concepts:
|
|
|
|
|
2013-06-23 07:24:36 +00:00
|
|
|
See later chapters in docu/mars-manual.pdf .
|
|
|
|
|
|
|
|
For a very short intro, see my LCA2013 presentation docu/MARS_LCA2013.pdf .
|
|
|
|
|
|
|
|
There is also an internal 2-years old concept paper which is so much outdated,
|
|
|
|
that I don't want to publish it.
|
2013-01-25 10:58:46 +00:00
|
|
|
|
2013-06-23 07:24:36 +00:00
|
|
|
The fundamental construction principle of the planned MARS Full
|
|
|
|
is called Instance Oriented Programming (IOP) and is described in
|
|
|
|
the following paper:
|
|
|
|
|
|
|
|
http://athomux.net/papers/paper_inst2.pdf
|
2013-01-25 10:58:46 +00:00
|
|
|
|
|
|
|
History:
|
|
|
|
|
|
|
|
As you can see in the git log, it evolved from a very experimental
|
|
|
|
concept study, starting in the Summer of 2010.
|
|
|
|
At this time, I was working on it in my spare time.
|
|
|
|
|
|
|
|
In Summer 2011, an "official" internal 1&1 project started, which aimed
|
|
|
|
to deliver a proof of concept.
|
2013-06-23 07:24:36 +00:00
|
|
|
|
2013-01-25 10:58:46 +00:00
|
|
|
In February 2012, a pilot system was rolled out to an internal statistics
|
|
|
|
server, which collects statistics data from thousands of other servers,
|
|
|
|
and thus produces a very heavy random-access write load, formerly
|
|
|
|
replicated with DRBD (which led to performance problems due to massive
|
|
|
|
randomness). After switching to MARS, the performance was provably
|
|
|
|
better.
|
2013-06-23 07:24:36 +00:00
|
|
|
That server was selected because potential loss of statistics data
|
2013-01-25 10:58:46 +00:00
|
|
|
would be not be that critical as with other productive data, but
|
|
|
|
nevertheless it operates on productive data and loads.
|
|
|
|
|
2013-06-23 07:24:36 +00:00
|
|
|
After curing some small infancy problems, that server runs until today
|
|
|
|
without problems. It was upgraded to newer versions of MARS several
|
|
|
|
times (indicated by some of the git tags). Our sysadmins switched the
|
2013-01-25 10:58:46 +00:00
|
|
|
primary side a few times, without informing me, so I could
|
|
|
|
sleep better at night without knowing what they did ;)
|
|
|
|
|
|
|
|
In Summer 2012, the next "official" internal 1&1 project started. Its goal
|
|
|
|
is to reach enterprise grade, and therefore to rollout MARS Light on
|
2013-06-23 07:24:36 +00:00
|
|
|
~15 productive servers, starting with less critical systems like ones
|
2013-01-25 10:58:46 +00:00
|
|
|
for test webspaces etc. This project will continue until Summer 2013.
|
|
|
|
|
|
|
|
In December 2012 (shortly before Christmas), I got the official permission
|
|
|
|
from our CTO Henning Kettler to publish MARS under GPL on github.
|
|
|
|
|
|
|
|
Many thanks to him!
|
|
|
|
|
2013-06-23 07:24:36 +00:00
|
|
|
Before that point, I was bound to my working contract which kept internal
|
|
|
|
software as secret by default (when there was no explicit permission).
|
2013-01-25 10:58:46 +00:00
|
|
|
|
|
|
|
Now there is a chance to build up an opensource
|
|
|
|
community for MARS, partially outside of 1&1.
|
|
|
|
|
|
|
|
Please contribute! I will be open.
|
|
|
|
|
|
|
|
I also try to respect the guidelines from Linus, but probably this
|
2013-06-23 07:24:36 +00:00
|
|
|
will need more work. I am already planning to invest some time into
|
|
|
|
community revision of the sourcecode, but there is not yet any schedule.
|
|
|
|
|
|
|
|
In May 2013, I got help by my new collegue Frank Liepold. He currently
|
|
|
|
creates a fully automatic test suite which automates regression tests
|
|
|
|
(goal: rolling releases). That test suite is based on the internal
|
|
|
|
test suite of blkreplay and will also be published soon.
|
|
|
|
|
|
|
|
Hopefully, there will be an iternal 1&1 followup project for
|
|
|
|
mass rollout to some thousands of servers.
|