mirror of https://github.com/schoebel/mars
191 lines
16 KiB
Plaintext
191 lines
16 KiB
Plaintext
# test_suite.txt Version 0.03
|
|
#
|
|
# description of the tests to execute before a mars release
|
|
#
|
|
# author: Frank Liepold frank.liepold@1und1.de
|
|
#
|
|
# Can be printed with a2ps -R --rows=1 --columns=1 -l 130 -L101 test_suite.txt
|
|
|
|
|
|
abbreviations:
|
|
data_dev_writer : process writing to data device on primary and producing a protocoll containing statistics
|
|
about runtime and written data.
|
|
device cksum : checking that cksum primary = cksum secondary
|
|
|
|
recovery procedures : Some testcases cause more or less serious crashes or standstills (e.g. A4.1 below). If
|
|
there are documented repair strategies they will be tested, too.
|
|
|
|
category id testname description testcase and steps to check
|
|
=====================================================================================================================
|
|
basic B marsadm testing of pre and the scope of tests is specified by the
|
|
post conditions of documents resource_states.txt and
|
|
all marsadm cmds states_and_actions.txt and comprises at the
|
|
moment about 20 tests of the most important
|
|
marsadm commands by checking their pre and
|
|
post conditions
|
|
|
|
---------------------------------------------------------------------------------------------------------------------
|
|
basic B1 wait_role marsadm secondary B1.1 - marsadm secondary marsadm ROLE must
|
|
resp. primary - marsadm ROLE return secondary
|
|
may only return - ls /dev/mars/... ls must fail
|
|
with success after
|
|
the device has B1.2 - marsadm primary marsadm ROLE must
|
|
disappeared resp. - marsadm ROLE return primary
|
|
appeared - ls /dev/mars/... ls must succeed
|
|
---------------------------------------------------------------------------------------------------------------------
|
|
admin A1 growing growing the data A1.1 - start data_dev_writer - device cksum
|
|
device in a running - lvresize on primary and secondary
|
|
mars connection - pause-sync on primary and secondary
|
|
- marsadm resize on primary
|
|
- resume-sync on primary and secondary
|
|
- wait for sync end
|
|
- stop data_dev_writer
|
|
- wait for fetch and apply end
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
admin A2 secon2prima host a: primary A2.1 - start data_dev_writer - device cksum
|
|
host b: secondary - marsadm primary on host b (must fail)
|
|
switch secondary -> - stop data_dev_writer
|
|
primary on host b - umount data device
|
|
- marsadm primary on host b
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
admin A3 apply_fetch indepedency of apply A3.1 - start data_dev_writer apply must run to
|
|
and fetch - pause-apply on secondary (nearly) end of
|
|
- pause-fetch on secondary fetched logfile
|
|
- resume-apply on secondary
|
|
|
|
A3.2 - start data_dev_writer the whole logfile
|
|
- pause-apply on secondary must be fetched
|
|
- pause-fetch on secondary
|
|
- stop data_dev_writer
|
|
- resume-fetch on secondary
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
hardcore H2 mars_dir_full /mars full H2.1 running full because of logfiles device cksum
|
|
is regarded as an generated by data_dev_writer
|
|
admin error - start data_dev_writer
|
|
until /mars full
|
|
- rmmod mars on all cluster hosts
|
|
- resize /mars
|
|
- modprobe mars on all cluster hosts
|
|
- start second data_dev_writer
|
|
- stop all data_dev_writers
|
|
H2.2 running full because another process
|
|
is filling /mars
|
|
|
|
mars provides different levels
|
|
of emergency strategies, which will be
|
|
documented and then tested
|
|
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
admin A5 datadev_full data device full A5.1 - start data_dev_writer device cksum
|
|
- wait for data device full
|
|
see A1.1
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
admin A6 logrotate looping logrotate A6.1 - start data_dev_writer - device cksum
|
|
- endless loop logrotate - impact of logfile
|
|
- stop loop after n minutes sizes on write
|
|
- stop data_dev_writer performance
|
|
- wait for fetch and apply end - impact of
|
|
logrotate
|
|
frequency on write
|
|
performance
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
admin A7 logdelete looping logrotate A7.1 - start data_dev_writer see A6.1
|
|
and logdelete - endless loop logrotate
|
|
and logdelete
|
|
- stop loop after n minutes
|
|
- stop data_dev_writer
|
|
- wait for fetch and apply end
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
admin A8 compatibel compatibility of these testcases are to be implemented, when
|
|
mars versions there are different versions in production
|
|
userspace versions
|
|
kernel versions
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
admin A9 standstill recognizing, A9.1 - logfile damage on secondary - error indicator
|
|
indicating and (still to specify)
|
|
repair of - repair (if
|
|
exceptional automatable)
|
|
standstills - device cksum
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
admin A10 mult_device multiple data A10.* run several tests parallel - given by the single
|
|
devices (resources) on multiple mars connections where tests
|
|
per host the data devices are in some cases - impact on write
|
|
located on the same host performance
|
|
still to specify
|
|
A10.1 - for i in 1 2 3; do
|
|
start data_dev_writer on $i resources
|
|
stop data_dev_writer on resources
|
|
take write rate of each resource
|
|
A10.2 - like A10.1 but with regular log-rotate
|
|
and log-delete
|
|
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
admin A11 small_sec_dev secondary data A11.1 - primary create resource (100 MB) - device cksum
|
|
device smaller at cmd - secondary join resource (80 MB)
|
|
marsadm join-resource - start data_dev_writer
|
|
see mail uli 06/23/13 - stop data_dev_writer
|
|
- switch primary -> secondary
|
|
- wait for fetch and apply end
|
|
on secondary
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
admin A12 casc_resize cascades of resize A12.1 to specify amount of synced
|
|
operations data
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
admin A13 sync_pos testing new symlink A13.1 to specify
|
|
syncpos
|
|
(commit: "light: add syncpos symlink")
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
admin A14 filesys all tests on different
|
|
filesystems
|
|
--------------------------------------------------------------------------------------------------------------------
|
|
perf P1 fullsync performance P1.1 - data on both data devices nearly - device cksum
|
|
matching (= secondary data device - sync time
|
|
patched with some bytes at some - transfer rate
|
|
offsets)
|
|
- default mars sync (fast fullsync)
|
|
- 2 GB data device
|
|
- secondary down
|
|
- secondary invalidate
|
|
- secondary up
|
|
- wait for sync end
|
|
P1.2 - similar to P1.1 but:
|
|
- "slow" mars sync
|
|
P1.3 - similar to P1.1 but:
|
|
- data on both with strong differences
|
|
P1.4 - similar to P1.1 but:
|
|
- data on both with strong differences
|
|
- "slow" mars sync
|
|
P1.11 equal to P1.1 - P1.4 but with - see P1.1
|
|
... data_dev_writer - impact of sync
|
|
P1.14 on write
|
|
performance
|
|
---------------------------------------------------------------------------------------------------------------------
|
|
stabil S1 net_failure network broken S1.1 - start data_dev_writer - device cksum
|
|
- manipulation=total cut of connection - impact on write
|
|
- restore network connection performance
|
|
- stop data_dev_writer
|
|
S1.2ff similar to S1.1 but with different
|
|
network connection manipulations
|
|
still to specify
|
|
---------------------------------------------------------------------------------------------------------------------
|
|
stabil S2 crash_prim reboot of primary S2.1 - start data_dev_writer device cksum
|
|
while writing - reboot primary (ipmitool)
|
|
---------------------------------------------------------------------------------------------------------------------
|
|
stabil S3 crash_sec reboot of secondary S3.1 - start data_dev_writer device cksum
|
|
while applying and - reboot secondary
|
|
fetching
|
|
---------------------------------------------------------------------------------------------------------------------
|
|
hardcore H1 gap_in_log create and repair H1.1 - pause-apply on secondary
|
|
gap in logfile - start data_dev_writer
|
|
- stop data_dev_writer after n minutes
|
|
- wait until fetch complete
|
|
- create gap in logfile
|
|
- resume-apply
|
|
- wait until apply stops apply must stop
|
|
at gap
|
|
- repair gap (apply must continue) device cksum
|
|
---------------------------------------------------------------------------------------------------------------------
|
|
hardcore H3 late_log_comp belatedly completed H3.1 - pause-apply on secondary
|
|
logfile after new
|
|
logfile has already
|