mirror of https://github.com/schoebel/mars
doc: clarify Cloud Storage
This commit is contained in:
parent
7c0a61d435
commit
ae81e36816
|
@ -483,12 +483,195 @@ eventually consistent
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Standard
|
\begin_layout Standard
|
||||||
There are some consequences from this definition:
|
\noindent
|
||||||
|
\begin_inset Graphics
|
||||||
|
filename images/lightbulb_brightlit_benj_.png
|
||||||
|
lyxscale 12
|
||||||
|
scale 7
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
Notice that the term
|
||||||
|
\begin_inset Quotes eld
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
network
|
||||||
|
\begin_inset Quotes erd
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
does not occur in this definition.
|
||||||
|
However, the term
|
||||||
|
\begin_inset Quotes eld
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
distributed resources
|
||||||
|
\begin_inset Quotes erd
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
is implying
|
||||||
|
\emph on
|
||||||
|
some(!)
|
||||||
|
\emph default
|
||||||
|
kind of network.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Enumerate
|
\begin_layout Standard
|
||||||
Distributed Storage, in particular BigCluster architectures (see section
|
\noindent
|
||||||
|
\begin_inset Graphics
|
||||||
|
filename images/lightbulb_brightlit_benj_.png
|
||||||
|
lyxscale 12
|
||||||
|
scale 7
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
Important! The definition does
|
||||||
|
\emph on
|
||||||
|
not
|
||||||
|
\emph default
|
||||||
|
imply some
|
||||||
|
\emph on
|
||||||
|
specific
|
||||||
|
\emph default
|
||||||
|
type of network, such as a
|
||||||
|
\series bold
|
||||||
|
storage network
|
||||||
|
\series default
|
||||||
|
which must be capable of transporting masses of IO operations in
|
||||||
|
\series bold
|
||||||
|
realtime
|
||||||
|
\series default
|
||||||
|
.
|
||||||
|
We are free to use other types of networks, such as
|
||||||
|
\series bold
|
||||||
|
replication networks
|
||||||
|
\series default
|
||||||
|
, which need not be dimensioned for realtime IO traffic, but are usable
|
||||||
|
for
|
||||||
|
\series bold
|
||||||
|
background data migration
|
||||||
|
\series default
|
||||||
|
, and even over long distances, where the network typically has some bottlenecks.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
\noindent
|
||||||
|
\begin_inset Graphics
|
||||||
|
filename images/lightbulb_brightlit_benj_.png
|
||||||
|
lyxscale 12
|
||||||
|
scale 7
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
Notice that the definition says nothing about the
|
||||||
|
\series bold
|
||||||
|
time scale
|
||||||
|
\series default
|
||||||
|
of operations
|
||||||
|
\begin_inset Foot
|
||||||
|
status open
|
||||||
|
|
||||||
|
\begin_layout Plain Layout
|
||||||
|
Notice: go down to a time scale of microseconds.
|
||||||
|
You will then notice that typical IO operations will require several hundreds
|
||||||
|
of machine instructions between IO request
|
||||||
|
\emph on
|
||||||
|
submission
|
||||||
|
\emph default
|
||||||
|
and the corresponding IO request
|
||||||
|
\emph on
|
||||||
|
completion
|
||||||
|
\emph default
|
||||||
|
.
|
||||||
|
This is not only true for local IO.
|
||||||
|
In network clusters like Ceph, it will even involve creation of network
|
||||||
|
packets, and lead to additional IO latencies implied by the network packet
|
||||||
|
transfer latencies.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
.
|
||||||
|
We are free to implement certain operations, such as background data migrations
|
||||||
|
, in a rather long timescale (from a human point of view).
|
||||||
|
Example: increasing the number of replicas in an operational Ceph cluster,
|
||||||
|
already containing a few hundreds of terabytes of data, will not only require
|
||||||
|
additional storage hardware, but also take a rather long time, implied
|
||||||
|
by the very nature of such reorganisational tasks.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
\noindent
|
||||||
|
\begin_inset Graphics
|
||||||
|
filename images/lightbulb_brightlit_benj_.png
|
||||||
|
lyxscale 12
|
||||||
|
scale 7
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
The famous CAP theorem is one of the motivations behind requirement (4)
|
||||||
|
|
||||||
|
\begin_inset Quotes eld
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
eventually consistent
|
||||||
|
\begin_inset Quotes erd
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
.
|
||||||
|
This is not an accident.
|
||||||
|
There is a
|
||||||
|
\emph on
|
||||||
|
reason
|
||||||
|
\emph default
|
||||||
|
for it, although it is not a
|
||||||
|
\emph on
|
||||||
|
hard
|
||||||
|
\emph default
|
||||||
|
requirement.
|
||||||
|
Strict consistency is not needed for many applications running on top of
|
||||||
|
cloud storage.
|
||||||
|
In addition, the CAP theorem and some other theorems cited at
|
||||||
|
\begin_inset Flex URL
|
||||||
|
status open
|
||||||
|
|
||||||
|
\begin_layout Plain Layout
|
||||||
|
|
||||||
|
https://en.wikipedia.org/wiki/CAP_theorem
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
are telling us that Strict Consistency would be
|
||||||
|
\series bold
|
||||||
|
difficult and expensive
|
||||||
|
\series default
|
||||||
|
to achieve at global level in a bigger Distributed System, and at the cost
|
||||||
|
of other properties.
|
||||||
|
More detailed explanations are in section
|
||||||
|
\begin_inset CommandInset ref
|
||||||
|
LatexCommand vref
|
||||||
|
reference "sec:Explanation-via-CAP"
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
There are some consequences from this definition of Cloud Storage, for each
|
||||||
|
of our high-level storage architectures:
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Description
|
||||||
|
Distributed
|
||||||
|
\begin_inset space ~
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
Storage, in particular
|
||||||
|
\family typewriter
|
||||||
|
BigCluster
|
||||||
|
\family default
|
||||||
|
architectures (see section
|
||||||
\begin_inset CommandInset ref
|
\begin_inset CommandInset ref
|
||||||
LatexCommand ref
|
LatexCommand ref
|
||||||
reference "sec:Distributed-vs-Local:"
|
reference "sec:Distributed-vs-Local:"
|
||||||
|
@ -501,8 +684,12 @@ s.
|
||||||
of data.
|
of data.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Enumerate
|
\begin_layout Description
|
||||||
Centralized Storage: does not conform to (1) and to (4) by definition
|
Centralized
|
||||||
|
\begin_inset space ~
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
Storage: does not conform to (1) and to (4) by definition
|
||||||
\begin_inset Foot
|
\begin_inset Foot
|
||||||
status open
|
status open
|
||||||
|
|
||||||
|
@ -533,11 +720,11 @@ almost
|
||||||
sub-component
|
sub-component
|
||||||
\emph default
|
\emph default
|
||||||
).
|
).
|
||||||
Typical granularity is replication of whole storage pools, or of LVs, or
|
Typical granularity is replication of whole internal storage pools, or
|
||||||
of filesystem data.
|
of LVs, or of filesystem data.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Enumerate
|
\begin_layout Description
|
||||||
LocalStorage, and some further models like RemoteSharding (see section
|
LocalStorage, and some further models like RemoteSharding (see section
|
||||||
\begin_inset CommandInset ref
|
\begin_inset CommandInset ref
|
||||||
LatexCommand ref
|
LatexCommand ref
|
||||||
|
@ -590,7 +777,7 @@ locally: Strict local consistency at LV granularity, also
|
||||||
\emph on
|
\emph on
|
||||||
within
|
within
|
||||||
\emph default
|
\emph default
|
||||||
any LV replica.
|
each of the LV replicas.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Description
|
\begin_layout Description
|
||||||
|
@ -603,6 +790,52 @@ between
|
||||||
|
|
||||||
\end_deeper
|
\end_deeper
|
||||||
\end_deeper
|
\end_deeper
|
||||||
|
\begin_layout Standard
|
||||||
|
\noindent
|
||||||
|
\begin_inset Graphics
|
||||||
|
filename images/lightbulb_brightlit_benj_.png
|
||||||
|
lyxscale 12
|
||||||
|
scale 7
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
Notice:
|
||||||
|
\family typewriter
|
||||||
|
BigCluster
|
||||||
|
\family default
|
||||||
|
architectures are creating
|
||||||
|
\emph on
|
||||||
|
virtual
|
||||||
|
\emph default
|
||||||
|
storage pools out of physically distributed storage servers.
|
||||||
|
For fairness reasons, creation of a big virtual LVM pool, must be considered
|
||||||
|
as
|
||||||
|
\emph on
|
||||||
|
another
|
||||||
|
\emph default
|
||||||
|
valid Cloud Storage
|
||||||
|
\emph on
|
||||||
|
model
|
||||||
|
\emph default
|
||||||
|
, matching the above definition of Cloud Storage.
|
||||||
|
The main architectural difference is granularity, as explained in section
|
||||||
|
|
||||||
|
\begin_inset CommandInset ref
|
||||||
|
LatexCommand ref
|
||||||
|
reference "sec:Granularity-at-Architecture"
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
, and the stacking order of sub-components.
|
||||||
|
Notice that Football is creating
|
||||||
|
\series bold
|
||||||
|
location transparency
|
||||||
|
\series default
|
||||||
|
inside of the distributed virtual LVM pool.
|
||||||
|
This is an important (though not always required) basic property of any
|
||||||
|
type of clusters and/or grids.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Section
|
\begin_layout Section
|
||||||
Granularity at Architecture
|
Granularity at Architecture
|
||||||
\begin_inset CommandInset label
|
\begin_inset CommandInset label
|
||||||
|
@ -10885,12 +11118,19 @@ does not exist
|
||||||
|
|
||||||
\end_inset
|
\end_inset
|
||||||
|
|
||||||
Conclusion from the CAP theorem: don't use DRBD (or other
|
Conclusion from the CAP theorem: when P is a
|
||||||
|
\emph on
|
||||||
|
hard
|
||||||
|
\emph default
|
||||||
|
|
||||||
|
\emph on
|
||||||
|
requirement
|
||||||
|
\emph default
|
||||||
|
, don't use DRBD (or other
|
||||||
\emph on
|
\emph on
|
||||||
synchronous
|
synchronous
|
||||||
\emph default
|
\emph default
|
||||||
replication implementations) for long-distance and/or Cloud Storage scenarios
|
replication implementations) for long-distance and/or Cloud Storage scenarios.
|
||||||
where P is a hard requirement.
|
|
||||||
The red X is in particular problematic during re-sync, after the network
|
The red X is in particular problematic during re-sync, after the network
|
||||||
has become healthy again (cf section
|
has become healthy again (cf section
|
||||||
\begin_inset CommandInset ref
|
\begin_inset CommandInset ref
|
||||||
|
@ -10912,6 +11152,49 @@ local
|
||||||
of its regular operation.
|
of its regular operation.
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
\noindent
|
||||||
|
\begin_inset Graphics
|
||||||
|
filename images/MatieresCorrosives.png
|
||||||
|
lyxscale 50
|
||||||
|
scale 17
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
Another conclusion from the CAP theorem: when A+C is a
|
||||||
|
\emph on
|
||||||
|
hard requirement
|
||||||
|
\emph default
|
||||||
|
, and when P can be faithfully assumed as already given by passive crossover
|
||||||
|
cables, then don't use the current version of MARS.
|
||||||
|
Use DRBD instead.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
|
\begin_layout Standard
|
||||||
|
\noindent
|
||||||
|
\begin_inset Graphics
|
||||||
|
filename images/MatieresToxiques.png
|
||||||
|
lyxscale 50
|
||||||
|
scale 17
|
||||||
|
|
||||||
|
\end_inset
|
||||||
|
|
||||||
|
If you think that you require alle three properties C+A+P, but you don't
|
||||||
|
have passive crossover cables over short distances, you are requiring something
|
||||||
|
which is
|
||||||
|
\series bold
|
||||||
|
impossible
|
||||||
|
\series default
|
||||||
|
.
|
||||||
|
There exists no solution, with whatever component, or from whatever commercial
|
||||||
|
storage vendor.
|
||||||
|
The CAP theorem is as hard as Einstein's natural laws are.
|
||||||
|
Rethink your complete concept, from end to end.
|
||||||
|
Something is wrong, somewhere.
|
||||||
|
Ignoring this on enterprise-critical use cases can endanger a company and/or
|
||||||
|
your career.
|
||||||
|
\end_layout
|
||||||
|
|
||||||
\begin_layout Section
|
\begin_layout Section
|
||||||
Higher Consistency Guarantees vs Actuality
|
Higher Consistency Guarantees vs Actuality
|
||||||
\end_layout
|
\end_layout
|
||||||
|
|
Loading…
Reference in New Issue