From eacc8361aeb6a38a8815951ae51ee33c6a1af44e Mon Sep 17 00:00:00 2001
From: Thomas Schoebel-Theuer <schoebel@bell.site>
Date: Sun, 23 Jun 2013 09:24:36 +0200
Subject: [PATCH] doc: update README

---
 README | 73 +++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 52 insertions(+), 21 deletions(-)

diff --git a/README b/README
index 7c085f71..59180793 100644
--- a/README
+++ b/README
@@ -10,7 +10,7 @@ MARS Light is almost a drop-in replacement for DRBD
 (that is, block-level storage replication).
 
 In contrast to plain DRBD, it works _asynchronously_ and over
-arbitrary distances. My regular testing runs between datacenters
+arbitrary distances. Our internal 1&1 testing runs between datacenters
 in the US and Europe. MARS uses very different technology under the
 hood, similar to transaction logging of database systems.
 
@@ -18,8 +18,8 @@ Reliability: application and replication are completely decoupled.
 Networking problems (e.g. packet loss, bottlenecks) have no
 impact onto your application at the primary side.
 
-Anytime consistency: on a secondary node, its version of the
-block device is always consistent in itself, but may be outdated
+Anytime Consistency: on a secondary node, its version of the underlying
+disk device is always consistent in itself, but may be outdated
 (represent a former state from the primary side). Thanks to
 incremental replication of the transaction logfiles, usually the
 lag-behind will be only a few seconds, or parts of a second.
@@ -27,21 +27,44 @@ lag-behind will be only a few seconds, or parts of a second.
 Synchronous or near-synchronous operating modes are planned for
 the future, but are expected to _reliably_ work only over short 
 distances (less than 50km), due to fundamental properties
-of the network.
+of distributed systems.
+
+Although many people ask for synchronous modes and although they
+would be very easy to implement (basically just add some additional
+wait conditions to turn asynchronous IO into synchronous one), I don't
+want to implement them for now.
+
+One reason is DRBD which already does a good job for that ("RAID-1 over
+network" which works extremely well on crossover cables).
+MARS is no RAID. The transaction logging of MARS is fundamentally
+different from that.
+
+The other reason is that I personally am not convinced by our experiences
+with synchronous replication in the presence of network bottlenecks.
+Even relatively short bundled 10Gbit lines between datacenters form
+a bottleneck where suddenly some unexpected jitter / packet loss may occur,
+leading to effects similar to "traffic jam".
+
+MARS has simply another application area which is different from DRBD.
 
 WARNING! Current stage is BETA. Don't put productive data on it!
 
-Documentation: currently very rudimentary, some even in German.
-This will be fixed soon.
+Documentation: currently under construction, see docu/mars-manual.pdf
 
 Concepts:
 
-There is a 2-years old concept paper in German which is so much outdated,
-that I don't want to publish it. Please be patient until I write a
-comprehensive paper at the concept level in English.
+See later chapters in docu/mars-manual.pdf .
 
-For the meantime, please look at my presentation about MARS at LCA2013
-(linux.conf.au or look into ./docu/).
+For a very short intro, see my LCA2013 presentation docu/MARS_LCA2013.pdf .
+
+There is also an internal 2-years old concept paper which is so much outdated,
+that I don't want to publish it. 
+
+The fundamental construction principle of the planned MARS Full
+is called Instance Oriented Programming (IOP) and is described in
+the following paper:
+
+http://athomux.net/papers/paper_inst2.pdf
 
 History:
 
@@ -51,36 +74,35 @@ At this time, I was working on it in my spare time.
 
 In Summer 2011, an "official" internal 1&1 project started, which aimed
 to deliver a proof of concept.
+
 In February 2012, a pilot system was rolled out to an internal statistics
 server, which collects statistics data from thousands of other servers,
 and thus produces a very heavy random-access write load, formerly
 replicated with DRBD (which led to performance problems due to massive
 randomness). After switching to MARS, the performance was provably
 better.
-This server was selected because potential loss of statistics data
+That server was selected because potential loss of statistics data
 would be not be that critical as with other productive data, but
 nevertheless it operates on productive data and loads.
 
-After curing some small infancy problems, this server runs until today
-(end of January 2013) without problems. Our sysadmins even switched the
+After curing some small infancy problems, that server runs until today
+without problems. It was upgraded to newer versions of MARS several
+times (indicated by some of the git tags). Our sysadmins switched the
 primary side a few times, without informing me, so I could
 sleep better at night without knowing what they did ;)
 
 In Summer 2012, the next "official" internal 1&1 project started. Its goal
 is to reach enterprise grade, and therefore to rollout MARS Light on
-~10 productive servers, starting with less critical systems like ones
+~15 productive servers, starting with less critical systems like ones
 for test webspaces etc. This project will continue until Summer 2013.
 
-Hopefully, there will be a followup project for mass rollout to some
-thousands of servers.
-
 In December 2012 (shortly before Christmas), I got the official permission
 from our CTO Henning Kettler to publish MARS under GPL on github.
 
 Many thanks to him!
 
-Before that point, I was bound to my working contract which keeps internal
-software as secret by default (when there is no explicit permission).
+Before that point, I was bound to my working contract which kept internal
+software as secret by default (when there was no explicit permission).
 
 Now there is a chance to build up an opensource
 community for MARS, partially outside of 1&1.
@@ -88,4 +110,13 @@ community for MARS, partially outside of 1&1.
 Please contribute! I will be open.
 
 I also try to respect the guidelines from Linus, but probably this
-will need more work. Help is always welcome!
+will need more work. I am already planning to invest some time into
+community revision of the sourcecode, but there is not yet any schedule.
+
+In May 2013, I got help by my new collegue Frank Liepold. He currently
+creates a fully automatic test suite which automates regression tests
+(goal: rolling releases). That test suite is based on the internal
+test suite of blkreplay and will also be published soon.
+
+Hopefully, there will be an iternal 1&1 followup project for
+mass rollout to some thousands of servers.