Update index.md

This commit is contained in:
Thomas Schöbel-Theuer 2017-04-16 11:11:06 +02:00 committed by GitHub
parent a1f2e81c06
commit 304834318b
1 changed files with 34 additions and 79 deletions

113
index.md
View File

@ -6,40 +6,34 @@
or https://github.com/schoebel/mars
GPLed software AS IS, sponsored by 1&1 Internet AG (www.1und1.de). Contact: tst@1und1.de
GPLed software AS IS, sponsored by 1&1 Internet SE (www.1und1.de). Contact: tst@1und1.de
## What is MARS Light?
## What is MARS?
MARS can be used to replicate Linux-based storage devices, or even whole datacenters, over arbitrary distances (geo-redundancy).
MARS can be used to replicate Linux-based storage devices, and even whole datacenters, over arbitrary distances (geo-redundancy).
Another use case is cost-efficient virtual pools of sharded storage.
At 1&1 Internet SE, it runs at more than 2000 servers and about 2x8 petabytes of customer data.
Main features:
* Anytime Consistency
* Arbitrary Distances
* Tolerates Flaky Networks
* Allows _background_ migration of block devices (optionally combined with traffic shaping)
MARS Light is almost a drop-in replacement for DRBD (block-level storage replication). It runs as a Linux kernel module.
MARS is almost a drop-in replacement for DRBD (block-level storage replication). It runs as a Linux kernel module.
In contrast to plain DRBD, it works _asynchronously_ and over
arbitrary distances. Our internal 1&1 testing runs between datacenters
in the US and Europe. MARS uses very different technology under the
hood, similar to transaction logging of database systems.
In contrast to plain DRBD, it works _asynchronously_ and over arbitrary distances. Our internal 1&1 testing runs between datacenters in the US and Europe. MARS uses very different technology under the hood, similar to transaction logging of database systems.
Reliability: application and replication are completely decoupled.
Networking problems (e.g. packet loss, bottlenecks) have no
impact onto your application at the primary side.
Anytime Consistency: on a secondary node, its version of the underlying
disk device is always consistent in itself, but may be outdated
(represent a former state from the primary side). Thanks to
incremental replication of the transaction logfiles, usually the
lag-behind will be only a few seconds, or parts of a second.
Networking problems (e.g. packet loss, bottlenecks) have no impact onto your application at the primary side.
Synchronous or near-synchronous operating modes are planned for
the future, but are expected to work _reliably_ only over short
distances (less than 50km), due to fundamental properties
of distributed systems.
Anytime Consistency: on a secondary node, its version of the underlying disk device is always consistent in itself, but may be outdated (represent a former state from the primary side). Thanks to incremental replication of the transaction logfiles, usually the lag-behind will be only a few seconds, or parts of a second.
WARNING! Current stage is BETA. It has been already tested with productive data, but there is no guarantee (as with any GPL software).
Synchronous or near-synchronous operating modes are planned for the future, but are expected to work _reliably_ only over short
distances (less than 50km), due to fundamental properties of distributed systems.
## Documentation / Manual
@ -47,86 +41,47 @@ See https://github.com/schoebel/mars/blob/master/docu/mars-manual.pdf
Intro: the use cases MARS vs DRBD can be found in chapter 1.
COST SAVINGS using MARS is described at https://github.com/schoebel/mars/blob/master/docu/MARS_GUUG2017_en.pdf
## Concepts
For a very short intro, see my LCA2013 presentation https://github.com/schoebel/mars/blob/master/docu/MARS_LCA2013.pdf .
For a short intro, see my GUUG2016 presentation https://github.com/schoebel/mars/blob/master/docu/MARS_GUUG2016.pdf .
There is also an internal 2-years old concept paper which is so much outdated,
that I don't want to publish it.
The fundamental construction principle of the planned MARS Full
is called Instance Oriented Programming (IOP) and is described in
the following paper:
http://athomux.net/papers/paper_inst2.pdf
The fundamental construction principle of the planned future MARS Full is called Instance Oriented Programming (IOP) and is described in the following paper: http://athomux.net/papers/paper_inst2.pdf
## History
As you can see in the git log, it evolved from a very experimental
concept study, starting in the Summer of 2010.
As you can see in the git log, it evolved from a very experimental concept study, starting in the Summer of 2010.
At that time, I was working on it in my spare time.
In Summer 2011, an "official" internal 1&1 project started, which aimed
to deliver a proof of concept.
Around Christmas 2010, my boss and shortly thereafter the CTO became aware of MARS, and I started working on it more or less "officially".
In February 2012, a pilot system was rolled out to an internal statistics
server, which collects statistics data from thousands of other servers,
and thus produces a very heavy random-access write load, formerly
replicated with DRBD (which led to performance problems due to massive
randomness). After switching to MARS, the performance was provably
better.
In Summer 2011, an "official" internal 1&1 project started, which aimed to deliver a proof of concept.
After curing some small infancy problems, that server runs until today
without problems. It was upgraded to newer versions of MARS several
times (indicated by some of the git tags). Our sysadmins switched the
primary side a few times, without informing me, so I could
sleep better at night without knowing what they did ;)
In February 2012, a pilot system was rolled out to an internal statistics server, which collects statistics data from thousands of other servers, and thus produces a very heavy random-access write load, formerly replicated with DRBD (which led to performance problems due to massive randomness). After switching to MARS, the performance was provably better.
In Summer 2012, the next "official" internal 1&1 project started. Its goal
was to reach enterprise grade, and therefore to rollout MARS Light on
~15 productive servers, starting with less critical systems like ones
for test webspaces etc.
In Summer 2012, the next "official" internal 1&1 project started. Its goal was to reach enterprise grade, and therefore to rollout MARS onto ~15 productive servers, starting with less critical systems like ones for test webspaces etc.
In December 2012 (shortly before Christmas), I got the official permission
from our CTO Henning Kettler to publish MARS under GPL on github. Many thanks to him!
In December 2012 (shortly before Christmas), I got the official permission from our CTO Henning Kettler to publish MARS under GPL on github. Many thanks to him!
Before that point, I was bound to my working contract which kept internal
software as secret by default (when there was no explicit permission).
Before that point, I was probably bound to my working contract which kept internal software as secret by default (when there was no explicit permission).
Now there is a chance to build up an opensource
community for MARS, partially outside of 1&1.
Now there is a chance to build up an opensource community for MARS, partially outside of 1&1.
I will also try to respect the guidelines from Linus, but probably this
will need more work. I am already planning to invest some time into
community revision of the sourcecode, but there is not yet any schedule.
I am trying to respect the guidelines from Linus, but getting MARS upstream into the Linux kernel will need much more work.
In May 2013, I got help by my new collegue Frank Liepold. He is working
on a fully automatic test suite which automates regression tests
(goal: rolling releases). That test suite is based on the internal
test suite of blkreplay and can be found in the test_suite/ subdirectory.
In November 2013, internal 1&1 projects started for mass rollout to several thousands of servers at Shared Hosting Linux (ShaHoLin).
More than 15 pilot clusters serving real customers are running for several months since Summer 2013. Some of there are known "performance pigs". There were no issues worth mentioning (besides collecting operational experiences, HOWTO do things the right way, finding the best monitoring strategies, etc).
Some other teams, in particular ePages and Mail&Media teams, were the first to use MARS at real production in Spring 2014 at about 10 clusters each, with great success. I did not have much work with those teams: they just rolled it out, and it worked for them. I got only one bug report from them which I had to fix.
In November 2013, internal 1&1 projects started for mass rollout to several thousands of servers.
Unfortunately, the ShaHoLin team was different. Their rollout process to several thousands of servers took extremely long. After more than a year, only about 50 clusters were migrated to MARS. Eventually, I managed to get MARS fully onto the street in April 2015 by developping a fully automatic rollout script and rolling it out myself during two nights, personally, and by personally taking full responsibility for rollout (and there was no incident). Otherwise, it likely would have taken a few years longer, according to some sysadmins.
Although the software continues to be labelled "beta" for the next future, it has reached enterprise grade due to our internal rating process.
Since then MARS is running on several thousands of servers, and on several petabytes of customer data, and it has collected serveral millions of operation hours. It is considered more stable than the hardware now.
## Future Plans / Roadmap
Smaller Reworks: in Winter 2013/2014, some smaller changes to the symlink tree are planned, in order to make it more readable for humans and to prepare for future enhancements. They will only change the syntax, not the semantics. There will be an upgrade plan, i.e. the old symlink tree remains usable; only newly created clusters will use the new structure.
At the moment, cost savings are very important for 1&1. In 2017, MARS will be improved for scalability and load balancing according to the GUUG2017 slides.
In parallel, the software will be internally divided into three parts (mostly syntactical renames automated by a script):
Sketch: the traditional pairwise clusters (originally from DRBD pairs) will be merged into one "big cluster" for EU and US each (with respect to _metadata_ updates only while the data IO paths remain at the sharding principle), such that any resource can be migrated via MARS fast fullsync to any other server. Then background migration of VMs / LXC containers is easily possible for optimizing density and load balancing.
1. Generic brick framework
2. AIO personality with XIO bricks
3. MARS Light application
I hope this will make MARS more attractive for the mainline Linux kernel community. When everything runs fine, the upstream code revision could start in Spring 2014.
MARS FULL is planned in the following steps:
1. MARS FULL infrastructure, IOP replacement for the ad-hoc Light instantiation logic, functionally equivalent (regression testing with Frank's test suite).
2. Remote device. `/dev/mars/mydata` can appear anywhere in a cluster, independently from primary switching. Estimated release date: end of 2014.
3. Virtual point-in-time restore. Creates a read-only snapshot on-the-fly, for any unplanned time in the past (provided that transaction logs have not yet been deleted), with second resolution. Estimated release date: end of 2015.
Further MARS FULL features are possible, but there is no schedule yet.
Kernel upstream development is planned to resume later.