# test_suite.txt Version 0.03 # # description of the tests to execute before a mars release # # author: Frank Liepold frank.liepold@1und1.de # # Can be printed with a2ps -R --rows=1 --columns=1 -l 130 -L101 test_suite.txt abbreviations: data_dev_writer : process writing to data device on primary and producing a protocoll containing statistics about runtime and written data. device cksum : checking that cksum primary = cksum secondary recovery procedures : Some testcases cause more or less serious crashes or standstills (e.g. A4.1 below). If there are documented repair strategies they will be tested, too. category id testname description testcase and steps to check ===================================================================================================================== basic B marsadm testing of pre and the scope of tests is specified by the post conditions of documents resource_states.txt and all marsadm cmds states_and_actions.txt and comprises at the moment about 20 tests of the most important marsadm commands by checking their pre and post conditions --------------------------------------------------------------------------------------------------------------------- basic B1 wait_role marsadm secondary B1.1 - marsadm secondary marsadm ROLE must resp. primary - marsadm ROLE return secondary may only return - ls /dev/mars/... ls must fail with success after the device has B1.2 - marsadm primary marsadm ROLE must disappeared resp. - marsadm ROLE return primary appeared - ls /dev/mars/... ls must succeed --------------------------------------------------------------------------------------------------------------------- admin A1 growing growing the data A1.1 - start data_dev_writer - device cksum device in a running - lvresize on primary and secondary mars connection - pause-sync on primary and secondary - marsadm resize on primary - resume-sync on primary and secondary - wait for sync end - stop data_dev_writer - wait for fetch and apply end -------------------------------------------------------------------------------------------------------------------- admin A2 secon2prima host a: primary A2.1 - start data_dev_writer - device cksum host b: secondary - marsadm primary on host b (must fail) switch secondary -> - stop data_dev_writer primary on host b - umount data device - marsadm primary on host b -------------------------------------------------------------------------------------------------------------------- admin A3 apply_fetch indepedency of apply A3.1 - start data_dev_writer apply must run to and fetch - pause-apply on secondary (nearly) end of - pause-fetch on secondary fetched logfile - resume-apply on secondary A3.2 - start data_dev_writer the whole logfile - pause-apply on secondary must be fetched - pause-fetch on secondary - stop data_dev_writer - resume-fetch on secondary -------------------------------------------------------------------------------------------------------------------- hardcore H2 mars_dir_full /mars full H2.1 running full because of logfiles device cksum is regarded as an generated by data_dev_writer admin error - start data_dev_writer until /mars full - rmmod mars on all cluster hosts - resize /mars - modprobe mars on all cluster hosts - start second data_dev_writer - stop all data_dev_writers H2.2 running full because another process is filling /mars mars provides different levels of emergency strategies, which will be documented and then tested -------------------------------------------------------------------------------------------------------------------- admin A5 datadev_full data device full A5.1 - start data_dev_writer device cksum - wait for data device full see A1.1 -------------------------------------------------------------------------------------------------------------------- admin A6 logrotate looping logrotate A6.1 - start data_dev_writer - device cksum - endless loop logrotate - impact of logfile - stop loop after n minutes sizes on write - stop data_dev_writer performance - wait for fetch and apply end - impact of logrotate frequency on write performance -------------------------------------------------------------------------------------------------------------------- admin A7 logdelete looping logrotate A7.1 - start data_dev_writer see A6.1 and logdelete - endless loop logrotate and logdelete - stop loop after n minutes - stop data_dev_writer - wait for fetch and apply end -------------------------------------------------------------------------------------------------------------------- admin A8 compatibel compatibility of these testcases are to be implemented, when mars versions there are different versions in production userspace versions kernel versions -------------------------------------------------------------------------------------------------------------------- admin A9 standstill recognizing, A9.1 - logfile damage on secondary - error indicator indicating and (still to specify) repair of - repair (if exceptional automatable) standstills - device cksum -------------------------------------------------------------------------------------------------------------------- admin A10 mult_device multiple data A10.* run several tests parallel - given by the single devices (resources) on multiple mars connections where tests per host the data devices are in some cases - impact on write located on the same host performance still to specify A10.1 - for i in 1 2 3; do start data_dev_writer on $i resources stop data_dev_writer on resources take write rate of each resource A10.2 - like A10.1 but with regular log-rotate and log-delete -------------------------------------------------------------------------------------------------------------------- admin A11 small_sec_dev secondary data A11.1 - primary create resource (100 MB) - device cksum device smaller at cmd - secondary join resource (80 MB) marsadm join-resource - start data_dev_writer see mail uli 06/23/13 - stop data_dev_writer - switch primary -> secondary - wait for fetch and apply end on secondary -------------------------------------------------------------------------------------------------------------------- admin A12 casc_resize cascades of resize A12.1 to specify amount of synced operations data -------------------------------------------------------------------------------------------------------------------- admin A13 sync_pos testing new symlink A13.1 to specify syncpos (commit: "light: add syncpos symlink") -------------------------------------------------------------------------------------------------------------------- admin A14 filesys all tests on different filesystems -------------------------------------------------------------------------------------------------------------------- perf P1 fullsync performance P1.1 - data on both data devices nearly - device cksum matching (= secondary data device - sync time patched with some bytes at some - transfer rate offsets) - default mars sync (fast fullsync) - 2 GB data device - secondary down - secondary invalidate - secondary up - wait for sync end P1.2 - similar to P1.1 but: - "slow" mars sync P1.3 - similar to P1.1 but: - data on both with strong differences P1.4 - similar to P1.1 but: - data on both with strong differences - "slow" mars sync P1.11 equal to P1.1 - P1.4 but with - see P1.1 ... data_dev_writer - impact of sync P1.14 on write performance --------------------------------------------------------------------------------------------------------------------- stabil S1 net_failure network broken S1.1 - start data_dev_writer - device cksum - manipulation=total cut of connection - impact on write - restore network connection performance - stop data_dev_writer S1.2ff similar to S1.1 but with different network connection manipulations still to specify --------------------------------------------------------------------------------------------------------------------- stabil S2 crash_prim reboot of primary S2.1 - start data_dev_writer device cksum while writing - reboot primary (ipmitool) --------------------------------------------------------------------------------------------------------------------- stabil S3 crash_sec reboot of secondary S3.1 - start data_dev_writer device cksum while applying and - reboot secondary fetching --------------------------------------------------------------------------------------------------------------------- hardcore H1 gap_in_log create and repair H1.1 - pause-apply on secondary gap in logfile - start data_dev_writer - stop data_dev_writer after n minutes - wait until fetch complete - create gap in logfile - resume-apply - wait until apply stops apply must stop at gap - repair gap (apply must continue) device cksum --------------------------------------------------------------------------------------------------------------------- hardcore H3 late_log_comp belatedly completed H3.1 - pause-apply on secondary logfile after new logfile has already