mars/test_suite/test_suite.txt
2013-09-17 13:36:27 +02:00

213 lines
18 KiB
Plaintext

# test_suite.txt Version 0.03
#
# description of the tests to execute before a mars release
#
# author: Frank Liepold frank.liepold@1und1.de
#
# Can be printed with a2ps -R --rows=1 --columns=1 -l 130 -L101 test_suite.txt
abbreviations:
data_dev_writer : process writing to data device on primary and producing a protocoll containing statistics
about runtime and written data.
device cksum : checking that cksum primary = cksum secondary
recovery procedures : Some testcases cause more or less serious crashes or standstills (e.g. A4.1 below). If
there are documented repair strategies they will be tested, too.
category id testname description testcase and steps to check
=====================================================================================================================
basic B marsadm testing of pre and the scope of tests is specified by the
post conditions of documents resource_states.txt and
all marsadm cmds states_and_actions.txt and comprises at the
moment about 20 tests of the most important
marsadm commands by checking their pre and
post conditions
---------------------------------------------------------------------------------------------------------------------
basic B1 wait_role marsadm secondary B1.1 - marsadm secondary marsadm ROLE must
resp. primary - marsadm ROLE return secondary
may only return - ls /dev/mars/... ls must fail
with success after
the device has B1.2 - marsadm primary marsadm ROLE must
disappeared resp. - marsadm ROLE return primary
appeared - ls /dev/mars/... ls must succeed
---------------------------------------------------------------------------------------------------------------------
admin A1 growing growing the data A1.1 - start data_dev_writer - device cksum
device in a running - lvresize on primary and secondary
mars connection - pause-sync on primary and secondary
- marsadm resize on primary
- resume-sync on primary and secondary
- wait for sync end
- stop data_dev_writer
- wait for fetch and apply end
--------------------------------------------------------------------------------------------------------------------
admin A2 secon2prima host a: primary A2.1 - start data_dev_writer - device cksum
host b: secondary - marsadm primary on host b (must fail)
switch secondary -> - stop data_dev_writer
primary on host b - umount data device
- marsadm primary on host b
--------------------------------------------------------------------------------------------------------------------
admin A3 apply_fetch indepedency of apply A3.1 - start data_dev_writer apply must run to
and fetch - pause-apply on secondary (nearly) end of
- pause-fetch on secondary fetched logfile
- resume-apply on secondary
A3.2 - start data_dev_writer the whole logfile
- pause-apply on secondary must be fetched
- pause-fetch on secondary
- stop data_dev_writer
- resume-fetch on secondary
--------------------------------------------------------------------------------------------------------------------
hardcore H2 mars_dir_full /mars full H2.1 running full because of logfiles device cksum
is regarded as an generated by data_dev_writer
admin error - start data_dev_writer
until /mars full
- rmmod mars on all cluster hosts
- resize /mars
- modprobe mars on all cluster hosts
- start second data_dev_writer
- stop all data_dev_writers
H2.2 running full because another process
is filling /mars
mars provides different levels
of emergency strategies, which will be
documented and then tested
--------------------------------------------------------------------------------------------------------------------
admin A5 datadev_full data device full A5.1 - start data_dev_writer device cksum
- wait for data device full
see A1.1
--------------------------------------------------------------------------------------------------------------------
admin A6 logrotate looping logrotate A6.1 - start data_dev_writer - device cksum
- endless loop logrotate - impact of logfile
- stop loop after n minutes sizes on write
- stop data_dev_writer performance
- wait for fetch and apply end - impact of
logrotate
frequency on write
performance
--------------------------------------------------------------------------------------------------------------------
admin A7 logdelete looping logrotate A7.1 - start data_dev_writer see A6.1
and logdelete - endless loop logrotate
and logdelete
- stop loop after n minutes
- stop data_dev_writer
- wait for fetch and apply end
--------------------------------------------------------------------------------------------------------------------
admin A8 compatibel compatibility of these testcases are to be implemented, when
mars versions there are different versions in production
userspace versions
kernel versions
--------------------------------------------------------------------------------------------------------------------
admin A9 standstill recognizing, A9.1 - logfile damage on secondary - error indicator
indicating and (still to specify)
repair of - repair (if
exceptional automatable)
standstills - device cksum
--------------------------------------------------------------------------------------------------------------------
admin A10 mult_device multiple data A10.* run several tests parallel - given by the single
devices (resources) on multiple mars connections where tests
per host the data devices are in some cases - impact on write
located on the same host performance
still to specify
A10.1 - for i in 1 2 3; do
start data_dev_writer on $i resources
stop data_dev_writer on resources
take write rate of each resource
A10.2 - like A10.1 but with regular log-rotate
and log-delete
--------------------------------------------------------------------------------------------------------------------
admin A11 small_sec_dev secondary data A11.1 - primary create resource (100 MB) - device cksum
device smaller at cmd - secondary join resource (80 MB)
marsadm join-resource - start data_dev_writer
see mail uli 06/23/13 - stop data_dev_writer
- switch primary -> secondary
- wait for fetch and apply end
on secondary
--------------------------------------------------------------------------------------------------------------------
admin A12 casc_resize cascades of resize A12.1 to specify amount of synced
operations data
--------------------------------------------------------------------------------------------------------------------
admin A13 sync_pos testing new symlink A13.1 to specify
syncpos
(commit: "light: add syncpos symlink")
--------------------------------------------------------------------------------------------------------------------
admin A14 filesys all tests on different
filesystems
--------------------------------------------------------------------------------------------------------------------
perf P1 fullsync performance P1.1 - data on both data devices nearly - device cksum
matching (= secondary data device - sync time
patched with some bytes at some - transfer rate
offsets)
- default mars sync (fast fullsync)
- 2 GB data device
- secondary down
- secondary invalidate
- secondary up
- wait for sync end
P1.2 - similar to P1.1 but:
- "slow" mars sync
P1.3 - similar to P1.1 but:
- data on both with strong differences
P1.4 - similar to P1.1 but:
- data on both with strong differences
- "slow" mars sync
P1.11 equal to P1.1 - P1.4 but with - see P1.1
... data_dev_writer - impact of sync
P1.14 on write
performance
---------------------------------------------------------------------------------------------------------------------
stabil S1 net_failure network broken S1.1 - start data_dev_writer - device cksum
- manipulation=total cut of connection - impact on write
- restore network connection performance
- stop data_dev_writer
S1.2ff similar to S1.1 but with different
network connection manipulations
still to specify
---------------------------------------------------------------------------------------------------------------------
stabil S2 crash_prim reboot of primary S2.1 - start data_dev_writer device cksum
while writing - reboot primary (ipmitool)
---------------------------------------------------------------------------------------------------------------------
stabil S3 crash_sec reboot of secondary S3.1 - start data_dev_writer device cksum
while applying and - reboot secondary
fetching
---------------------------------------------------------------------------------------------------------------------
hardcore H1 gap_in_log create and repair H1.1 - pause-apply on secondary
gap in logfile - start data_dev_writer
- stop data_dev_writer after n minutes
- wait until fetch complete
- create gap in logfile
- resume-apply
- wait until apply stops apply must stop
at gap
- repair gap (apply must continue) device cksum
---------------------------------------------------------------------------------------------------------------------
hardcore H3 late_log_comp belatedly completed H3.1 - pause-apply on secondary
logfile after new
logfile has already
---------------------------------------------------------------------------------------------------------------------
perf P2 perf_general fetch only P2.1 - pause-replay - time + rate
apply only P2.2 - disconnect "
sync P2.3 - invalidate "
fetch & apply P2.4 - = normal mode - time + rate
- time + rate
per action
write P2.5 - writing data device - write rate
P2.6 equal to P2.1 to P2.4 as P2.1 to P2.4
... with parallel writing write rate
P2.9
with n devices P2.1 to P2.9 with
2 4 8 devices