mars/test_suite/test_suite.txt

# test_suite.txt Version 0.03
#
# description of the tests to execute before a mars release
#
# author: Frank Liepold frank.liepold@1und1.de
#
# Can be printed with a2ps -R --rows=1 --columns=1 -l 130 -L101 test_suite.txt


abbreviations:
    data_dev_writer : process writing to data device on primary and producing a protocoll containing statistics
                      about runtime and written data.
    device cksum    : checking that cksum primary = cksum secondary

recovery procedures : Some testcases cause more or less serious crashes or standstills (e.g. A4.1 below). If
                      there are documented repair strategies they will be tested, too.

category id  testname      description           testcase and steps                              to check
=====================================================================================================================
basic    B   marsadm       testing of pre and    the scope of tests is specified by the
                           post conditions of    documents resource_states.txt and
                           all marsadm cmds      states_and_actions.txt and comprises at the
                                                 moment about 20 tests of the most important
                                                 marsadm commands by checking their pre and
                                                 post conditions

---------------------------------------------------------------------------------------------------------------------
basic    B1 wait_role      marsadm secondary     B1.1  - marsadm secondary                      marsadm ROLE must
                           resp. primary               - marsadm ROLE                           return secondary
                           may only return             - ls /dev/mars/...                       ls must fail
                           with success after
                           the device has        B1.2  - marsadm primary                        marsadm ROLE must
                           disappeared resp.           - marsadm ROLE                           return primary
                           appeared                    - ls /dev/mars/...                       ls must succeed
---------------------------------------------------------------------------------------------------------------------
admin    A1  growing       growing the data      A1.1  - start data_dev_writer                  - device cksum
                           device in a running         - lvresize on primary and secondary
                           mars connection             - pause-sync on primary and secondary
                                                       - marsadm resize on primary
                                                       - resume-sync on primary and secondary
                                                       - wait for sync end
                                                       - stop data_dev_writer
                                                       - wait for fetch and apply end
--------------------------------------------------------------------------------------------------------------------
admin    A2  secon2prima   host a: primary       A2.1  - start data_dev_writer                  - device cksum
                           host b: secondary           - marsadm primary on host b (must fail)
                           switch secondary ->         - stop data_dev_writer
                             primary on host b         - umount data device
                                                       - marsadm primary on host b
--------------------------------------------------------------------------------------------------------------------
admin    A3  apply_fetch   indepedency of apply  A3.1  - start data_dev_writer                  apply must run to
                           and fetch                   - pause-apply on secondary               (nearly) end of
                                                       - pause-fetch on secondary               fetched logfile
                                                       - resume-apply on secondary

                                                 A3.2  - start data_dev_writer                  the whole logfile
                                                       - pause-apply on secondary               must be fetched
                                                       - pause-fetch on secondary
                                                       - stop data_dev_writer
                                                       - resume-fetch on secondary
--------------------------------------------------------------------------------------------------------------------
hardcore H2  mars_dir_full /mars full            H2.1 running full because of logfiles          device cksum
                           is regarded as an          generated by data_dev_writer
                           admin error                - start data_dev_writer
                                                        until /mars full
                                                      - rmmod mars on all cluster hosts
                                                      - resize /mars
                                                      - modprobe mars on all cluster hosts
                                                      - start second data_dev_writer
                                                      - stop all data_dev_writers
                                                 H2.2 running full because another process
                                                      is filling /mars

                                                      mars provides different levels
                                                      of emergency strategies, which will be
                                                      documented and then tested

--------------------------------------------------------------------------------------------------------------------
admin    A5  datadev_full  data device full      A5.1  - start data_dev_writer                  device cksum
                                                       - wait for data device full
                                                       see A1.1
--------------------------------------------------------------------------------------------------------------------
admin    A6  logrotate     looping logrotate     A6.1  - start data_dev_writer                  - device cksum
                                                       - endless loop logrotate                 - impact of logfile
                                                       - stop loop after n minutes                sizes on write
                                                       - stop data_dev_writer                     performance
                                                       - wait for fetch and apply end           - impact of
                                                                                                  logrotate
                                                                                                  frequency on write
                                                                                                  performance
--------------------------------------------------------------------------------------------------------------------
admin    A7  logdelete     looping logrotate     A7.1  - start data_dev_writer                  see A6.1
                           and logdelete               - endless loop logrotate
                                                            and logdelete
                                                       - stop loop after n minutes
                                                       - stop data_dev_writer
                                                       - wait for fetch and apply end
--------------------------------------------------------------------------------------------------------------------
admin    A8  compatibel    compatibility of      these testcases are to be implemented, when
                           mars versions         there are different versions in production
                           userspace versions
                           kernel versions
--------------------------------------------------------------------------------------------------------------------
admin    A9  standstill    recognizing,          A9.1 - logfile damage on secondary              - error indicator
                           indicating and                                                         (still to specify)
                           repair of                                                             - repair (if
                           exceptional                                                             automatable)
                           standstills                                                           - device cksum
--------------------------------------------------------------------------------------------------------------------
admin    A10 mult_device   multiple data         A10.* run several tests parallel                - given by the single
                           devices (resources)         on multiple mars connections where          tests
                           per host                    the data devices are in some cases        - impact on write
                                                       located on the same host                    performance
                                                       still to specify
                                                 A10.1 - for i in 1 2 3; do
                                                             start data_dev_writer on $i resources
                                                             stop  data_dev_writer on resources
                                                             take write rate of each resource
                                                 A10.2 - like A10.1 but with regular log-rotate
                                                         and log-delete

--------------------------------------------------------------------------------------------------------------------
admin    A11 small_sec_dev secondary data        A11.1 - primary create resource (100 MB)        - device cksum
                          device smaller at cmd        - secondary join resource (80 MB)
                          marsadm join-resource        - start data_dev_writer
                          see mail uli 06/23/13        - stop data_dev_writer
                                                       - switch primary -> secondary
                                                       - wait for fetch and apply end
                                                         on secondary
--------------------------------------------------------------------------------------------------------------------
admin    A12 casc_resize  cascades of resize     A12.1 to specify                                amount of synced
                          operations                                                             data
--------------------------------------------------------------------------------------------------------------------
admin    A13 sync_pos     testing new symlink    A13.1 to specify
                          syncpos
                          (commit: "light: add syncpos symlink")
--------------------------------------------------------------------------------------------------------------------
admin    A14 filesys      all tests on different
                          filesystems
--------------------------------------------------------------------------------------------------------------------
perf     P1  fullsync      performance           P1.1  - data on both data devices nearly        - device cksum
                                                         matching (= secondary data device       - sync time
                                                         patched with some bytes at some         - transfer rate
                                                         offsets)
                                                       - default mars sync (fast fullsync)
                                                       - 2 GB data device
                                                       - secondary down
                                                       - secondary invalidate
                                                       - secondary up
                                                       - wait for sync end
                                                 P1.2  - similar to P1.1 but:
                                                       - "slow" mars sync
                                                 P1.3  - similar to P1.1 but:
                                                       - data on both with strong differences
                                                 P1.4  - similar to P1.1 but:
                                                       - data on both with strong differences
                                                       - "slow" mars sync
                                                 P1.11 equal to P1.1 - P1.4 but with             - see P1.1
                                                  ...  data_dev_writer                           - impact of sync
                                                 P1.14                                             on write
                                                                                                   performance
---------------------------------------------------------------------------------------------------------------------
stabil   S1  net_failure   network broken        S1.1  - start data_dev_writer                   - device cksum
                                                       - manipulation=total cut of connection    - impact on write
                                                       - restore network connection                performance
                                                       - stop data_dev_writer
                                                 S1.2ff similar to S1.1 but with different
                                                        network connection manipulations
                                                        still to specify
---------------------------------------------------------------------------------------------------------------------
stabil   S2  crash_prim    reboot of primary     S2.1  - start data_dev_writer                    device cksum
                           while writing               - reboot primary (ipmitool)
---------------------------------------------------------------------------------------------------------------------
stabil   S3  crash_sec     reboot of secondary   S3.1  - start data_dev_writer                    device cksum
                           while applying and          - reboot secondary
                           fetching
---------------------------------------------------------------------------------------------------------------------
hardcore H1  gap_in_log    create and repair     H1.1  - pause-apply on secondary
                           gap in logfile              - start data_dev_writer
                                                       - stop data_dev_writer after n minutes
                                                       - wait until fetch complete
                                                       - create gap in logfile
                                                       - resume-apply
                                                       - wait until apply stops                   apply must stop
                                                                                                  at gap
                                                       - repair gap (apply must continue)         device cksum
---------------------------------------------------------------------------------------------------------------------
hardcore H3  late_log_comp belatedly completed   H3.1  - pause-apply on secondary
                           logfile after new
                           logfile has already
---------------------------------------------------------------------------------------------------------------------
perf     P2  perf_general  fetch only            P2.1  - pause-replay                             - time + rate
                           apply only            P2.2  - disconnect                                   "
                           sync                  P2.3  - invalidate                                   "
                           fetch & apply         P2.4  - = normal mode                            - time + rate
                                                                                                  - time + rate
                                                                                                    per action
                           write                 P2.5  - writing data device                      - write rate
                                                 P2.6  equal to P2.1 to P2.4                      as P2.1 to P2.4
                                                 ...   with parallel writing                      write rate
                                                 P2.9
                           with n devices              P2.1 to P2.9 with
                                                       2 4 8 devices