Commit Graph

26 Commits

Author SHA1 Message Date
Julius Volz
dcfe55d7e6 Rename alert_manager to alertmanager. 2013-08-05 11:49:56 +02:00
Julius Volz
571931a052 Make config reloadable during runtime. 2013-08-02 17:25:39 +02:00
Julius Volz
1233f671c7 Persist silences to a local JSON file.
Load silences at startup from a local JSON file, write them out every 10
seconds.

It's not perfect (writes may possibly be interrupted/inconsistent if the
program is terminated while writing), but this is a temporary solution
to keep people from going crazy about lost silences until we have a
proper storage management. If the JSON file gets corrupted, the alert
manager simply starts up without any silences loaded.
2013-07-31 17:58:39 +02:00
Julius Volz
65f83e973e Rename suppressions to silences everywhere.
This makes internal code consistent with the API and user interface.
2013-07-31 14:39:01 +02:00
Julius Volz
02ab1f904a PR comment fixups. 2013-07-30 16:36:43 +02:00
Julius Volz
70e67b920c Implement PagerDuty notifications. 2013-07-30 13:49:55 +02:00
Julius Volz
f431335c69 Add more required fields to Event.
This adds mandatory Summary and Description fields to Event.

As for the alert name, there were two options: keep it a separate field and
treat it separately everywhere (including in silence Filter matching), or
make it a required field in the event's labels. The latter was causing far
less trouble, so I went with that. The alertname label still doesn't have
a special meaning to most parts of the code, except that the API checks its
presence and the web UI displays it differently.
2013-07-30 13:18:11 +02:00
Julius Volz
a64c37bb03 Add missing format string and separators in event fingerprinting. 2013-07-30 13:12:24 +02:00
Julius Volz
5d4f9f7e11 Add notification options to configuration. 2013-07-29 18:43:01 +02:00
juliusv
3f9cc9e3e3 Merge pull request #5 from prometheus/refactor/editable-silences
Implement silence create/read/update/delete API and UI workflow.
2013-07-26 07:08:31 -07:00
Julius Volz
9b1a5aaf40 PR comment fixups. 2013-07-26 12:40:53 +02:00
Julius Volz
0c3c75edb3 Change Aggregator from channel-based to mutex-based.
This removes >100 lines of boilerplate code in the Aggregator alone.
2013-07-26 02:13:11 +02:00
Julius Volz
ba2247857d Implement silence create/read/update/delete API and UI workflow. 2013-07-26 00:23:13 +02:00
Julius Volz
b49b7bba6f Change Suppressor from channel-based to mutex-based, add tests.
Start with the simplest possible locking scheme: lock the object-global
mutex at the beginning of each user-facing method. This is equivalent to
implicit locking provided by the reactor.

The reasoning behind this change is the incredible overhead of the
previous reactor request/response code:

Overhead for current model for every user-facing method:

- 2 struct type definitions (req/resp)
- 1 channel
  - 1 struct member definition site
  - 1 channel init site
  - 1 struct population site
  - 1 struct servicing site
  - 1 struct closing site
- 1 actual execution method

New lock-based code:

Per object: 1 lock
Per method:
- 1 taking the lock
- 1 actual execution method
2013-07-22 18:32:45 +02:00
Julius Volz
ed289d58f0 Remove crufty logging statement. 2013-07-22 16:26:54 +02:00
Julius Volz
71a9d4af35 Ensure minimum repeat rate for events. 2013-07-22 11:12:25 +02:00
Julius Volz
436643f94e Cleanup: rename "element" to "event". 2013-07-22 11:12:25 +02:00
Julius Volz
24d9977c95 Run go fmt. 2013-07-22 11:02:45 +02:00
Julius Volz
ca1eb66df4 Add BUG comment about aggregator draining. 2013-07-19 18:10:40 +02:00
Julius Volz
606d120541 Move aggregator scenario tests to separate type. 2013-07-19 15:26:51 +02:00
Julius Volz
f9bca4ba2b Remove unused timer. 2013-07-19 15:15:53 +02:00
Julius Volz
bc57fa4936 Add initial aggregator tests. 2013-07-19 15:05:45 +02:00
Julius Volz
a8bd98b7e1 Fix regex filters to match complete string.
If someone specifies

  service = "foo-service"

...they probably don't want it to match:

  service = "foo-servicebar"
2013-07-19 13:39:15 +02:00
Julius Volz
648a79a3e1 Synchronize Close(), fix race conditions.
Close() was not synced through the main dispatcher loop, so it could close
channels that were currently being written to by methods called from said
dispatcher loop. This leads to a crash. Instead, Close() now writes a
closeRequest, which is handled in the dispatcher.
2013-07-19 13:39:05 +02:00
Julius Volz
cf78397107 Change model to be more state- and less event-focussed. 2013-07-19 10:52:04 +02:00
Julius Volz
44c69920f4 Preliminary web interface and pairing refactorings. 2013-07-17 17:45:01 +02:00