light: new systematics for emergency modes (filesystem full)

This commit is contained in:
Thomas Schoebel-Theuer 2013-04-05 13:04:55 +02:00
parent a6aaa93da7
commit c275bec28d
7 changed files with 268 additions and 118 deletions

View File

@ -186,54 +186,152 @@ config MARS_LOGROT
---help---
Normally ON. Switch off only for EXPERIMENTS!
config MARS_MIN_SPACE_4
int "absolutely necessary free space in /mars/ (hard limit in GB)"
depends on MARS
default 2
---help---
HARDEST EMERGENCY LIMIT
When free space in /mars/ drops under this limit,
transaction logging to /mars/ will stop at all,
even at all primary resources. All IO will directly go to the
underlying raw devices. The transaction logfile sequence numbers
will be disrupted, deliberately leaving holes in the sequence.
This is a last-resort desperate action of the kernel.
As a consequence, all secodaries will have no chance to
replay at that gap, even if they got the logfiles.
The secondaries will stop at the gap, left in an outdated,
but logically consistent state.
After the problem has been fixed, the secondaries must
start a full-sync in order to continue replication at the
recent state.
This is the hardest measure the kernel can take in order
to TRY to continue undisrupted operation at the primary side.
In general, you should avoid such situations at the admin level.
Please implement your own monitoring at the admin level, which warns
you and/or takes appropriate countermeasures much earlier.
Never rely on this emergency feature!
config MARS_MIN_SPACE_3
int "free space in /mars/ for primary logfiles (additional limit in GB)"
depends on MARS
default 2
---help---
MEDIUM EMERGENCY LIMIT
When free space in /mars/ drops under
MARS_MIN_SPACE_4 + MARS_MIN_SPACE_3,
elder transaction logfiles will be deleted at primary resources.
As a consequence, the secondaries may no longer be able to
get a consecute series of copies of logfiles.
As a result, they may get stuck somewhere inbetween at an
outdated, but logically consistent state.
This is a desperate action of the kernel.
After the problem has been fixed, some secondaries may need to
start a full-sync in order to continue replication at the
recent state.
In general, you should avoid such situations at the admin level.
Please implement your own monitoring at the admin level, which warns
you and/or takes appropriate countermeasures much earlier.
Never rely on this emergency feature!
config MARS_MIN_SPACE_2
int "free space in /mars/ for secondary logfiles (additional limit in GB)"
depends on MARS
default 2
---help---
MEDIUM EMERGENCY LIMIT
When free space in /mars/ drops under
MARS_MIN_SPACE_4 + MARS_MIN_SPACE_3 + MARS_MIN_SPACE_2,
elder transaction logfiles will be deleted at secondary resources.
As a consequence, some local secondary resources
may get stuck somewhere inbetween at an
outdated, but logically consistent state.
This is a desperate action of the kernel.
After the problem has been fixed and the free space becomes
larger than MARS_MIN_SPACE_4 + MARS_MIN_SPACE_3 + MARS_MIN_SPACE_2
+ MARS_MIN_SPACE_2, the secondary tries to fetch the missing
logfiles from the primary again.
However, if the necessary logfiles have been deleted at the
primary side in the meantime, this may fail.
In general, you should avoid such situations at the admin level.
Please implement your own monitoring at the admin level, which warns
you and/or takes appropriate countermeasures much earlier.
Never rely on this emergency feature!
config MARS_MIN_SPACE_1
int "free space in /mars/ for replication (additional limit in GB)"
depends on MARS
default 2
---help---
LOWEST EMERGENCY LIMIT
When free space in /mars/ drops under MARS_MIN_SPACE_4
+ MARS_MIN_SPACE_3 + MARS_MIN_SPACE_2 + MARS_MIN_SPACE_1,
fetching of transaction logfiles will stop at local secondary
resources.
As a consequence, some local secondary resources
may get stuck somewhere inbetween at an
outdated, but logically consistent state.
This is a desperate action of the kernel.
After the problem has been fixed and the free space becomes
larger than MARS_MIN_SPACE_4 + MARS_MIN_SPACE_3 + MARS_MIN_SPACE_2
+ MARS_MIN_SPACE_2, the secondary will continue fetching its
copy of logfiles from the primary side.
In general, you should avoid such situations at the admin level.
Please implement your own monitoring at the admin level, which warns
you and/or takes appropriate countermeasures much earlier.
Never rely on this emergency feature!
config MARS_MIN_SPACE_0
int "total space needed in /mars/ for (additional limit in GB)"
depends on MARS
default 12
---help---
Operational pre-requirement.
In order to use MARS, the total space available in /mars/ must
be at least MARS_MIN_SPACE_4 + MARS_MIN_SPACE_3 + MARS_MIN_SPACE_2
+ MARS_MIN_SPACE_1 + MARS_MIN_SPACE_0.
If you cannot afford that amount of storage space, please use
DRBD in place of MARS.
config MARS_LOGROT_AUTO
int "automatic logrotate when logfile exceeds size (in GB)"
depends on MARS_LOGROT
default 32
---help---
You could switch this off by setting to 0. However, deletion
of really huge logfile can take several minutes, or even substantial
of really huge logfiles can take several minutes, or even substantial
fractions of hours (depending on the underlying filesystem).
Thus it is highly recommended to limit the logfile size to some
reasonable maximum size. Switch only off for experiments!
config MARS_MIN_SPACE_BASE
int "free space in /mars/ (hard limit in gigabytes)"
depends on MARS
default 8
---help---
when this limit is exceeded, transaction logging to /mars/
will stop. This affects not only write IO to /dev/mars/*,
but also logfile transfers etc. As a consequence, writes
on the primary will directly go to the device, and the secondaries
will be outdated.
In order to retain full operations, you _need_ to implement
your own monitoring which _must_ warn you long before this
hard limit catches you. If it already has caught you, it
will be too late: you have to recover by hand, e.g. by
starting a full sync.
config MARS_MIN_SPACE_PERCENT
int "free space in /mars/ (hard limit in percent)"
depends on MARS
default 0
---help---
this limit is in addition to CONFIG_MARS_MIN_SPACE_BASE.
config MARS_LOGDELETE_AUTO
int "automatic log-delete when space in /mars/ gets short (in GB)"
depends on MARS
default 8
---help---
This limit is in addition to CONFIG_MARS_MIN_SPACE_BASE
and CONFIG_MARS_MIN_SPACE_PERCENT.
You can switch this off by setting to 0.
When the limit is hit, MARS tries to delete the eldest logfile,
when possible, in order to free some space.
Notice: deleting logfiles on a primary which have not yet
been transferred to some secondary is possible, but as a
consequence the secondary may need a full-sync afterwards.
config MARS_PREFER_SIO
bool "prefer sio bricks instead of aio"
depends on MARS

View File

@ -51,17 +51,30 @@
#define inline __attribute__((__noinline__))
#endif
loff_t global_total_space = 0;
EXPORT_SYMBOL_GPL(global_total_space);
loff_t global_remaining_space = 0;
EXPORT_SYMBOL_GPL(global_remaining_space);
int global_logrot_auto = CONFIG_MARS_LOGROT_AUTO;
EXPORT_SYMBOL_GPL(global_logrot_auto);
int global_logdel_auto = CONFIG_MARS_LOGDELETE_AUTO;
EXPORT_SYMBOL_GPL(global_logdel_auto);
int global_free_space_0 = CONFIG_MARS_MIN_SPACE_0;
EXPORT_SYMBOL_GPL(global_free_space_0);
int global_free_space_base = CONFIG_MARS_MIN_SPACE_BASE;
EXPORT_SYMBOL_GPL(global_free_space_base);
int global_free_space_1 = CONFIG_MARS_MIN_SPACE_1;
EXPORT_SYMBOL_GPL(global_free_space_1);
int global_free_space_percent = CONFIG_MARS_MIN_SPACE_PERCENT;
EXPORT_SYMBOL_GPL(global_free_space_percent);
int global_free_space_2 = CONFIG_MARS_MIN_SPACE_2;
EXPORT_SYMBOL_GPL(global_free_space_2);
int global_free_space_3 = CONFIG_MARS_MIN_SPACE_3;
EXPORT_SYMBOL_GPL(global_free_space_3);
int global_free_space_4 = CONFIG_MARS_MIN_SPACE_4;
EXPORT_SYMBOL_GPL(global_free_space_4);
int mars_rollover_interval = CONFIG_MARS_ROLLOVER_INTERVAL;
EXPORT_SYMBOL_GPL(mars_rollover_interval);
@ -84,6 +97,78 @@ int mars_fast_fullsync =
;
EXPORT_SYMBOL_GPL(mars_fast_fullsync);
int mars_emergency_mode = 0;
EXPORT_SYMBOL_GPL(mars_emergency_mode);
int mars_reset_emergency = 1;
EXPORT_SYMBOL_GPL(mars_reset_emergency);
#define IS_EXHAUSTED() (mars_emergency_mode > 0)
#define IS_EMERGENCY_SECONDARY() (mars_emergency_mode > 1)
#define IS_EMERGENCY_PRIMARY() (mars_emergency_mode > 2)
#define IS_JAMMED() (mars_emergency_mode > 3)
static
void _make_alivelink(const char *name, loff_t val)
{
char *src = path_make("%lld", val);
char *dst = path_make("/mars/%s-%s", name, my_id());
if (!src || !dst) {
MARS_ERR("cannot make alivelink paths\n");
goto err;
}
MARS_DBG("'%s' -> '%s'\n", src, dst);
mars_symlink(src, dst, NULL, 0);
err:
brick_string_free(dst);
brick_string_free(src);
}
static
int compute_emergency_mode(void)
{
loff_t rest = 0;
loff_t limit = 0;
int mode = 4;
mars_remaining_space("/mars", &global_total_space, &rest);
#define CHECK_LIMIT(LIMIT_VAR) \
if (LIMIT_VAR > 0) \
limit += (loff_t)LIMIT_VAR * 1024 * 1024; \
if (rest < limit) { \
mars_emergency_mode = mode; \
goto done; \
} \
mode--; \
CHECK_LIMIT(global_free_space_4);
CHECK_LIMIT(global_free_space_3);
CHECK_LIMIT(global_free_space_2);
CHECK_LIMIT(global_free_space_1);
/* No limit has hit.
* Decrease the emergeny mode only in single steps.
*/
if (mars_reset_emergency && mars_emergency_mode > 0) {
mars_emergency_mode--;
}
done:
_make_alivelink("emergency", mars_emergency_mode);
global_remaining_space = rest - limit;
_make_alivelink("rest-space", global_remaining_space / (1024 * 1024));
limit += global_free_space_0;
if (unlikely(global_total_space < limit)) {
return -ENOMEM;
}
return 0;
}
///////////////////////////////////////////////////////////////////
static struct task_struct *main_thread = NULL;
typedef int (*light_worker_fn)(void *buf, struct mars_dent *dent);
@ -190,11 +275,6 @@ EXPORT_SYMBOL_GPL(mars_mem_percent);
//#define COPY_APPEND_MODE 1 // FIXME: does not work yet
#define COPY_PRIO MARS_PRIO_LOW
#define EXHAUSTED_LIMIT(max) ((long long)(max) * global_free_space_percent / 100 + (long long)global_free_space_base * 1024 * 1024)
#define EXHAUSTED(x,max) ((x) <= EXHAUSTED_LIMIT(max))
#define JAMMED(x) ((x) <= 1024 * 1024)
static
int _set_trans_params(struct mars_brick *_brick, void *private)
{
@ -1081,7 +1161,7 @@ int __make_copy(
(const struct generic_brick_type*)&copy_brick_type,
(const struct generic_brick_type*[]){NULL,NULL,NULL,NULL},
"%s",
(global->exhausted || !switch_path[0]) ? -1 : 0,
(!switch_path[0] || IS_EXHAUSTED()) ? -1 : 0,
"%s",
(const char *[]){"%s", "%s", "%s", "%s"},
4,
@ -2696,9 +2776,11 @@ int make_log_finalize(struct mars_global *global, struct mars_dent *dent)
/* Handle jamming (a very exceptional state)
*/
if (global->jammed) {
if (rot->todo_primary || rot->is_primary)
if (IS_JAMMED()) {
brick_say_logging = 0;
if (rot->todo_primary || rot->is_primary) {
trans_brick->cease_logging = true;
}
} else if (!rot->todo_primary && !rot->is_primary) {
trans_brick->cease_logging = false;
}
@ -2727,14 +2809,13 @@ int make_log_finalize(struct mars_global *global, struct mars_dent *dent)
rot->copy_next_is_available = 0;
}
#define LIMIT1 ((loff_t)EXHAUSTED_LIMIT(rot->total_space))
#define LIMIT2 ((loff_t)global_logdel_auto * 1024 * 1024)
if (rot->remaining_space <= LIMIT1 + LIMIT2) {
MARS_WRN("filesystem space = %lld kiB is lower than %lld + %lld = %lld\n", rot->remaining_space, LIMIT1, LIMIT2, LIMIT1 + LIMIT2);
if (IS_EMERGENCY_PRIMARY() || (!rot->todo_primary && IS_EMERGENCY_SECONDARY())) {
if (rot->first_log && rot->first_log != rot->relevant_log) {
MARS_DBG("freeing old logfile '%s'\n", rot->first_log->d_path);
MARS_WRN("freeing old logfile '%s'\n", rot->first_log->d_path);
mars_unlink(rot->first_log->d_path);
rot->first_log->d_killme = true;
// give it a chance to cease deleting next time
compute_emergency_mode();
}
}
@ -2974,7 +3055,7 @@ int make_dev(void *buf, struct mars_dent *dent)
(rot->todo_primary &&
!rot->trans_brick->replay_mode &&
rot->trans_brick->power.led_on);
if (!global->global_power.button || global->exhausted) {
if (!global->global_power.button) {
switch_on = false;
}
@ -3404,8 +3485,7 @@ enum {
CL_IPS,
CL_PEERS,
CL_ALIVE,
CL_EXHAUSTED,
CL_JAMMED,
CL_EMERGENCY,
CL_REST_SPACE,
// resource definitions
CL_RESOURCE,
@ -3519,8 +3599,8 @@ static const struct light_class light_classes[] = {
},
/* Indicate whether filesystem is full
*/
[CL_EXHAUSTED] = {
.cl_name = "exhausted-",
[CL_EMERGENCY] = {
.cl_name = "emergency-",
.cl_len = 10,
.cl_type = 'l',
.cl_father = CL_ROOT,
@ -3533,12 +3613,6 @@ static const struct light_class light_classes[] = {
.cl_type = 'l',
.cl_father = CL_ROOT,
},
[CL_JAMMED] = {
.cl_name = "jammed-",
.cl_len = 7,
.cl_type = 'l',
.cl_father = CL_ROOT,
},
/* Directory containing all items of a resource
*/
@ -3954,22 +4028,6 @@ static int light_worker(struct mars_global *global, struct mars_dent *dent, bool
return 0;
}
static
void _make_alivelink(const char *name, loff_t val)
{
char *src = path_make("%lld", val);
char *dst = path_make("/mars/%s-%s", name, my_id());
if (!src || !dst) {
MARS_ERR("cannot make symlink paths\n");
goto err;
}
MARS_DBG("'%s' -> '%s'\n", src, dst);
mars_symlink(src, dst, NULL, 0);
err:
brick_string_free(dst);
brick_string_free(src);
}
static struct mars_global _global = {
.dent_anchor = LIST_HEAD_INIT(_global.dent_anchor),
.brick_anchor = LIST_HEAD_INIT(_global.brick_anchor),
@ -3998,9 +4056,6 @@ static int light_thread(void *data)
while (_global.global_power.button || !list_empty(&_global.brick_anchor)) {
int status;
loff_t rest_space;
bool exhausted;
bool jammed;
MARS_DBG("-------- NEW ROUND %d ---------\n", atomic_read(&server_handler_count));
@ -4018,22 +4073,7 @@ static int light_thread(void *data)
}
_make_alivelink("alive", _global.global_power.button ? 1 : 0);
mars_remaining_space("/mars", &_global.total_space, &_global.remaining_space);
exhausted = EXHAUSTED(_global.remaining_space, _global.total_space);
_global.exhausted = exhausted;
_make_alivelink("exhausted", exhausted ? 1 : 0);
if (exhausted)
MARS_WRN("EXHAUSTED filesystem space = %lld\n", _global.remaining_space);
jammed = JAMMED(_global.remaining_space);
_global.jammed = jammed;
_make_alivelink("jammed", jammed ? 1 : 0);
if (jammed)
MARS_WRN("JAMMED filesystem space = %lld, STOPPING TRANSACTION LOGGING\n", _global.remaining_space);
rest_space = _global.remaining_space - EXHAUSTED_LIMIT(_global.total_space);
_make_alivelink("rest-space", rest_space);
compute_emergency_mode();
MARS_DBG("-------- start worker ---------\n");
_global.deleted_min = 0;
@ -4182,6 +4222,12 @@ static int __init init_light(void)
brick_mem_reserve(&global_reserve);
status = compute_emergency_mode();
if (unlikely(status < 0)) {
MARS_ERR("Sorry, your /mars/ filesystem is too small!\n");
goto done;
}
main_thread = brick_thread_create(light_thread, NULL, "mars_light");
if (unlikely(!main_thread)) {
status = -ENOENT;

View File

@ -222,9 +222,13 @@ ctl_table mars_table[] = {
INT_ENTRY("sync_flip_interval_sec", mars_sync_flip_interval, 0600),
INT_ENTRY("do_fast_fullsync", mars_fast_fullsync, 0600),
INT_ENTRY("logrot_auto_gb", global_logrot_auto, 0600),
INT_ENTRY("logdel_auto_gb", global_logdel_auto, 0600),
INT_ENTRY("required_free_space_mb", global_free_space_base, 0600),
INT_ENTRY("required_free_space_percent", global_free_space_percent, 0600),
INT_ENTRY("required_total_space_0_gb", global_free_space_0, 0600),
INT_ENTRY("required_free_space_1_gb", global_free_space_1, 0600),
INT_ENTRY("required_free_space_2_gb", global_free_space_2, 0600),
INT_ENTRY("required_free_space_3_gb", global_free_space_3, 0600),
INT_ENTRY("required_free_space_4_gb", global_free_space_4, 0600),
INT_ENTRY("mars_emergency_mode", mars_emergency_mode, 0600),
INT_ENTRY("mars_reset_emergency", mars_reset_emergency, 0600),
#ifdef CONFIG_MARS_LOADAVG_LIMIT
INT_ENTRY("loadavg_limit", mars_max_loadavg, 0600),
#endif

View File

@ -10,18 +10,24 @@
#define MARS_ARGV_MAX 4
#define MARS_PATH_MAX 256
extern loff_t global_total_space;
extern loff_t global_remaining_space;
extern int global_logrot_auto;
extern int global_logdel_auto;
extern int global_free_space_base;
extern int global_free_space_percent;
extern int global_free_space_0;
extern int global_free_space_1;
extern int global_free_space_2;
extern int global_free_space_3;
extern int global_free_space_4;
extern int mars_rollover_interval;
extern int mars_scan_interval;
extern int mars_propagate_interval;
extern int mars_sync_flip_interval;
extern int mars_emergency_mode;
extern int mars_reset_emergency;
extern int mars_fast_fullsync;
extern char *my_id(void);
#define MARS_DENT(TYPE) \
@ -66,14 +72,10 @@ struct mars_global {
struct list_head dent_anchor;
struct list_head brick_anchor;
wait_queue_head_t main_event;
loff_t total_space;
loff_t remaining_space;
int global_version;
int deleted_border;
int deleted_min;
bool main_trigger;
bool exhausted;
bool jammed;
};
extern void bind_to_dent(struct mars_dent *dent, struct say_channel **ch);

View File

@ -1667,7 +1667,7 @@ void show_statistics(struct mars_global *global, const char *class)
}
up_read(&global->dent_mutex);
MARS_STAT("==================== %s STATISTICS: %d dents, %d bricks, %lld KB free\n", class, dent_count, brick_count, global->remaining_space);
MARS_STAT("==================== %s STATISTICS: %d dents, %d bricks, %lld KB free\n", class, dent_count, brick_count, global_remaining_space);
}
EXPORT_SYMBOL_GPL(show_statistics);

View File

@ -794,9 +794,9 @@ sub info_version {
#########################################################################################
### avg_limit
sub check_jammed {
my $jammed = check_link "$mars_dir/jammed-$himself";
my $jammed = check_link "$mars_dir/emergency-$himself";
print_screen "-> Mars-Transaktion ", 'bold';
if (( !$jammed ) || ( $jammed ne 0 )) {
if (!$jammed) {
print_screen "running normaly\n", "$Color_green";
} else {
print_screen "and Replication not runnunig !!!\n", 'red';

View File

@ -949,7 +949,7 @@ sub mars_state_cmd {
sub show_cmd {
my ($cmd, $res) = @_;
$res = "*" if !$res || $res eq "all";
my $glob = "$mars/{ips/ip-$host,alive-$host,exhausted-$host,jammed-$host,rest-space-$host,resource-$res/{device,primary,size,actsize-$host,syncstatus-$host,replay-$host,actual-$host/*,todo-$host/*}}";
my $glob = "$mars/{ips/ip-$host,alive-$host,emergency-$host,rest-space-$host,resource-$res/{device,primary,size,actsize-$host,syncstatus-$host,replay-$host,actual-$host/*,todo-$host/*}}";
foreach my $link (glob($glob)) {
next unless -l $link;
my $res = readlink($link);