mars/test_suite
Andrea Gelmini dd1e4e1323 Fix typos
[small adaptations by Thomas Schoebel-Theuer, and
some problems with LyX-specific file format fixed]
2023-04-05 13:30:38 +02:00
..
example
mars test_suite: docu of email transmission 2014-04-30 10:10:54 +02:00
scripts test_suite: fix: check for program pstree 2014-04-25 07:46:23 +02:00
README Fix typos 2023-04-05 13:30:38 +02:00
default-main.conf

README

GPLed software AS IS, sponsored by 1&1 Internet AG (www.1und1.de).

The test suite is work in progress.

The test suite was developed by Frank Liepold during his stay at 1&1.

The email address frank.liepold@1und1.de will no longer work, since Frank has
left 1&1 since May 2014.

Currently, there is no dedicated successor for him :(

Also note that Frank developed the test suite on his personal workstation,
which had a slow network connection to the datacenters where the test hosts
were running. Thus it happeend that ssh commands were given slowly. When
you have very fast ssh connections, some races may show up which did not
occur in the environment where Frank was working.

As a workaround, you may place something like "sleep 3;" in front of any
ssh command, but this will of course lead to a large slowdown (anyway,
a near-full run of the testsuite takes almost 24h when started
on my workstation). This should be fixed when a successor of Frank
is available.

=============================================================================

Contents
--------
1. Framework
1.1. Global settings
1.2. Running a test
1.2.1. Test output
1.2.2. Help to understand a test
1.3. Programming hints and conventions
1.4. Error handling
1.5. Signal handling and unexpected termination
1.6. Example
2. mars specific settings
2.1. Configuration of the local build environment
2.2. Configuration of the test hosts on which the tests are running
2.2.1. Access requirements
2.2.2. Installation kernel and mars module on the test hosts
2.3. Configuration of logical volumes
2.4. Resources
2.5. Starting the whole test suite via cronjob
2.6. Concurrent test runs
2.7. Firewalls

1. Framework
------------

1.1. Global settings
------------------
The directory where this README resides (normally <git-repo>/test_suite) is
called base directory in the following. If relative paths are given they refer to
this base directory.

The frame work of the test suite consists of the files README,
default-main.conf and the subdirectory scripts.

You specify your use case of the test suite by creating a subdirectory
- e.g. my_use_case_dir - of the base directory.

In this subdirectory a file global.conf must be created, containing at least
the following two variables:
global_user_dir must be set to .../test_suite/my_use_case_dir
global_user_module_dir must be set to a subdirectory of global_user_dir.
In global_user_module_dir the use case specific shell scripts (called modules)
are to be located.
We recommend global_user_module_dir=$global_user_dir/modules.

1.2. Running a test
-------------------

Tests are executed by a call to scripts/start_test.sh from a subdirectory of
$global_user_dir.

The scope and configuration of a test is completely described by this
subdirectory (we call it start directory) as follows:

- All subdirectories of the start directory containing a (usually empty)
  file named i_am_a_testdirectory define one test case. In the following
  we call such a subdirectory a test directory.
  If a test for one test directory below the start directory fails, the tests
  belonging to the other test directories of the start directory will not be
  executed.
  Though only tests which are likely not to fail should be executed by only
  one call to start_test.sh. The simple and recommended way is to call
  start_test.sh for each test directory once (which means that the
  start directory consists of only one test directory and coincides with it).

- The function set usable is defined in the following set of shell scripts:
  -- scripts/lib/lib*.sh
  -- scripts/modules/*.sh
  -- $global_user_module_dir/*.sh

- The configuration of the test case is defined by a set of *.conf files which
  are included by start_test.sh in the following order:

  -- default-*.conf:
        These are the configuration files belonging to the scripts
        scripts/modules/*.sh which are also included bei start_test.sh and
        which define some library functions which are independent of the
        specific use case.

  -- $global_user_dir/global.conf:
        Configuration file for use case specific global variables which cannot be
        assigned to a certain user module.

  -- $global_user_dir/default-*.conf:
        These are the configuration files belonging to the scripts
        $global_user_module_dir/*.sh which are also included bei start_test.sh and
        which define all functions necessary for your use case.


  -- <subdirname>.conf where subdirname runs top down through the parent
     directories starting with the first subdirectory of a directory called
     root config directory to the test directory.
     This config root directory is set to the start directory (on the directory
     branch start directory -> test directory) by default. It may be changed by
     the option --config_root_dir=<my dir>.
     If the start directory coincides with the test directory the file 
     <test directory name>.conf is included for convenience (in the strict
     sense there are no true subdirectories of the start directory residing
     above the test directory).
     The <subdirnam>.conf files may reside in the test directory or any of
     its parent directories (up to 20 levels).


     Examples: 

         1. start directory = test_cases/perf
            test directory  = test_cases/perf/apply/no_parallel_writer
            Option --config_root_dir not given

            leads to including of apply.conf and no_parallel_writer.conf (in
            this order!) which must reside as mentioned above in the test
            directory or any of it's parent directories (up to 20 levels).

         2. start directory = test_cases/perf/apply/no_parallel_writer
            test directory  = test_cases/perf/apply/no_parallel_writer
            Option --config_root_dir=test_cases/perf

            leads to including the same *.conf files as above.

         3. start directory = test_cases/perf/apply/no_parallel_writer
            test directory  = test_cases/perf/apply/no_parallel_writer
            Option --config_root_dir given or not

            leads to including of no_parallel_writer.conf

- In at least one of the included <subdirname>.conf files of a test you must
  set one of the variables prepare_list, setup_list, run_list, cleanup_list or
  finish_list.
  Each of these variables may be empty or contain a list of shell function
  names of functions defined in one of the set of shell scripts mentioned
  above.
  These functions are called by start_test.sh in the order they appear in the
  mentioned *list variables. The *list variables are evaluated in the order
  prepare_list, setup_list, run_list, cleanup_list, finish_list.

1.2.1. Test output
------------------

The output of start_test.sh is very verbose to ease the research in case of any
errors. You are encouraged to use grep & Co. to shrink it as desired.
The output consists of the following sections:

- List titled "Sourcing libraries in <library directory>" of included scripts
  from <libraries directory>/lib*.sh

- List titled "Sourcing modules and default configuration" of included scripts
  (= the set of shell scripts mentioned above)  and corresponding included
  default-*.conf files.

- Line titled "Scanning subdirectories of <start directory>" followed by a
  list of ignored or skipped subdirectories

- Per test directory:
    - Line titled "Test directory <test directory> <date and time>"
    - List  titled "Sourcing config files between <config_root_dir> and
      <test directory>" of included *.conf files corresponding to the directory
      structure.

    - List titled "Configuration variables" of all configuration variables set
      via all included *.conf file

    - Section titled "Creating lock files"

    - Section titled "Starting <test directory> <date and time>"
      In this section lines starting with "calling ..." mark the call to one of
      the functions listed in the aforementioned variables prepare_list,
      setup_list, run_list, cleanup_list and finish_list.
      The main part of this section consists of output lines of the function
      lib_vmsg. These lines have the following format:
      <date time> [[<callstack>]] <message>
       The stack level which the bash call stack is printed from (<callstack>)
       is configurable via variable main_min_stack_level.

      - In case of an error: Subsection titled "Callstack" (to stderr)

      - Subsection titled "Deleting lock files"

      - Subsection titled "General checks of error and log files"
    - Line titled "Failure <return code> <test directory> <date and time>" 
      (to stderr) in case of failure,
      "Finished <test directory> <date and time>" (to stdout) otherwise

- If all tests in all test directories of the start directory terminated 
  successfully: Line titled "Finished start directory <start directory>"

1.2.2. Help to understand a test
--------------------------------------
Calling start_test.sh --help from a test directory doesn't start a test but
gives you the output mentioned in 1.2.1 without the sections produced during
real test execution.
In particular the section "Configuration variables" is printed. So you can
determine which functions are called via the variable run_list. These functions
should be commented extensively enough to be able to understand the test's 
purpose.
        
1.3. Programming hints and conventions
--------------------------------------

- all global variables should be defined and explained in a
   default-<module_name>.conf file

- the names of all global variables resp. all functions should have as prefix
  the module name of the default-*.conf file resp. of the *.sh file which they
  are defined in.
 
- in case of an error certain cleanup functions may be called (e.g. if
  a network connection is cut during a test case it should be restored).
  The associative array main_error_recovery_functions has function names as
  index and function parameters as values. In lib_exit each function contained
  in main_error_recovery_functions is called with it's corresponding arguments.


1.4. Error handling
-------------------

Instead of directly calling exit you should use lib_exit from lib/lib.sh, which
prints a call stack and does further checks.
If the variable lib_general_checks_after_every_test_function is set in
global_user_dir/global.conf to a shell function contained in the use case
specific modules, this function is called by lib_exit.

1.5. Signal handling and unexpected termination
-----------------------------------------------

Normally signals given to start_test.sh should terminate all child processes but
this can not be guaranteed. Thus you have to kill left over child processes
manually. 
Thus we recommend that all scripts and programs you are starting within a test
case should have a significant part of name - e.g. in the use case mars we used 
the string MARS as part of all script names.

1.6. Example
------------

In the base directory's subdirectory example you find a simple use case. The
directory tree looks as follows:

example/blue.conf
example/color_tests.conf
example/default-colors.conf
example/default-lib_err.conf
example/global.conf
example/green.conf
example/red.conf
example/color_tests
example/color_tests/blue
example/color_tests/blue/i_am_a_testdirectory
example/color_tests/green
example/color_tests/green/i_am_a_testdirectory
example/color_tests/red
example/color_tests/red/i_am_a_testdirectory
example/modules
example/modules/colors.sh
example/modules/lib_err.sh

Three test case are defined because there are three files i_am_a_testdirectory.
The used shell functions are defined in the scripts colors.sh and lib_err.sh
with corresponding *.conf files default-colors.conf resp. default-lib_err.conf.
The test case configurations are given via the directory tree and the *.conf 
files red.conf, green.conf and blue.conf.
In global.conf we defined global_user_dir, global_user_module_dir,
main_min_stack_level and lib_general_checks_after_every_test_function.

To start all three tests do:

cd base directory/example
../scripts/start_test.sh 2>&1 | tee /tmp/test_all.out

Thoroughly studying the output file /tmp/my_test.out will clarify the working of
the test frame work.
The third test (red) is of special interest because it fails.
Thus in this case you can study the call stack, the call to
lib_general_checks_after_every_test_function (in our example set to
lib_err_general_checks_after_every_test_case) and the use of
main_error_recovery_functions (used in colors.sh).

To start only the test case red do:

cd base directory/example/colors/red
../../../scripts/start_test.sh --config_root_dir=../.. 2>&1 | tee /tmp/test_red.out

Note that the option --config_root_dir is important in this case because it
tells start_test.sh up to which level it should regard parent directories and
the corresponding *.conf files (see section 1.2).


2. mars specific settings
-------------------------

2.1. Configuration of the local build environment
-------------------------------------------------
To build the mars module and userspace tools you need:

- a directory containing a git repository which contains the kernel sources
                       variable : checkout_mars_kernel_src_directory
    to be specified in file     : default-checkout_mars.conf

- the branch from which to take the kernel sources
                       variable : checkout_mars_kernel_git_branch
    to be specified in file     : default-checkout_mars.conf

- a directory containing the kernel sources against which the mars module
  should be built (may be the same as checkout_mars_kernel_src_directory)
                       variable : make_mars_kernel_src_directory
    to be specified in file     : default-make_mars.conf

- a directory containing a git repository which contains the mars sources
                       variable : checkout_mars_src_directory
    to be specified in file     : default-checkout_mars.conf

- the branch from which to take the mars sources
                       variable : checkout_mars_git_branch
    to be specified in file     : default-checkout_mars.conf

- a directory containing the mars sources which is linked in to the
  kernel source directory make_mars_kernel_src_directory for making the mars
  module.
  May coincide with checkout_mars_src_directory
                       variable : make_mars_src_directory
    to be specified in file     : default-make_mars.conf


2.2. Configuration of the test hosts on which the tests are running
-------------------------------------------------------------------

The test hosts are given in global_host_list (global.conf).
The two values can be overwritten by the environment variables
MARS_INITIAL_PRIMARY_HOST and MARS_INITIAL_SECONDARY_HOST

2.2.1. Access requirements
--------------------------
The tests are executed by ssh commands from your work station to the
test hosts. These commands must not prompt for a password. Though a
id file (for ssh -i <id-file>) with null password must be specified in
main_ssh_idfile_opt (default-main.conf) or given via the environment variable
MARS_SSH_KEYFILE

All test hosts must have ssh access to each other without prompt for passwords
(this is required by marsadm join-cluster).


On your work station you must have sudo permissions for "make install", "cp".

And root of your work station must have ssh access (as root) without prompt for
password to all test hosts (this is required, because local files belonging to
root must be copied to the test hosts).


2.2.2. Installation kernel and mars module on the test hosts
------------------------------------------------------------
This is done by executing the tests

build_test_environment/checkout
build_test_environment/make/make_mars/grub
build_test_environment/install_mars

It's a must to read default-checkout_mars.conf, default-make_mars.conf,
default-install_mars.conf and the *.conf files in the directories above
otherwise you risk damage of the boot information on your work station and/or
the test hosts.

After these three tests (which you should call one after the other!) you can
reboot the test hosts with the new kernel.

2.3. Configuration of logical volumes
-------------------------------------
This is done by the test

build_test_environment/lv_config

The configuration options are contained in default-lv_config.conf


2.4. Resources
--------------
Cluster and Resources are created by

build_test_environment/cluster
build_test_environment/resource/create_resource

Initially the first host in global_host_list is the primary, the following hosts
secondaries.
If you do not change any configuration variables after the
build_test_environment/resource/create_resource Test you will have created one
resource lv-1-2 on all hosts with underlying data devices /dev/vg-mars/lv-1-2.
/mars will be mounted on /dev/vg-mars/lv-6-100.

For further information please read default-cluster.conf and
default-resource.conf.


2.5. Starting the whole test suite via cronjob
----------------------------------------------
This can be done mars_test_cronjob.sh. The variable tests_to_execute contains
all tests to be executed. See the documentation in the header.

2.6. Concurrent test runs
-------------------------
To avoid concurrent run of tests on the same host each run creates temporary
files (/tmp/test-suite_on.<host>, see main_lock_file_list in global.conf) and
deletes them afterwards. It may happen that these files are left over
unintentionally in some cases of unexpected termination. Then you must remove
them manually before starting the next test run.

2.7. Firewalls
--------------
Certain tests cut temporarily network connections by defining firewall rules.
Normally even in case of failure these connections are restored. To be sure
that no firewall rules prevent tests from running the flag
net_clear_iptables_in_prepare_phase is set to 1 in default-net.conf.
This flag leads to deletion of *all* iptable chains on the test hosts.
If you do not want this behaviour set net_clear_iptables_in_prepare_phase=0.