Commit Graph

2494 Commits

Author SHA1 Message Date
Julius Volz
1233f671c7 Persist silences to a local JSON file.
Load silences at startup from a local JSON file, write them out every 10
seconds.

It's not perfect (writes may possibly be interrupted/inconsistent if the
program is terminated while writing), but this is a temporary solution
to keep people from going crazy about lost silences until we have a
proper storage management. If the JSON file gets corrupted, the alert
manager simply starts up without any silences loaded.
2013-07-31 17:58:39 +02:00
juliusv
0480321010 Merge pull request #9 from prometheus/rename-suppressions
Rename suppressions to silences everywhere.
2013-07-31 07:17:40 -07:00
Julius Volz
65f83e973e Rename suppressions to silences everywhere.
This makes internal code consistent with the API and user interface.
2013-07-31 14:39:01 +02:00
juliusv
854f5ef47e Merge pull request #8 from prometheus/feature/notifications
First support for alert notifications; minor other features
2013-07-30 12:17:51 -07:00
Julius Volz
02ab1f904a PR comment fixups. 2013-07-30 16:36:43 +02:00
Julius Volz
d3f08b760b Minor UI cleanup. 2013-07-30 13:50:00 +02:00
Julius Volz
70e67b920c Implement PagerDuty notifications. 2013-07-30 13:49:55 +02:00
Julius Volz
f431335c69 Add more required fields to Event.
This adds mandatory Summary and Description fields to Event.

As for the alert name, there were two options: keep it a separate field and
treat it separately everywhere (including in silence Filter matching), or
make it a required field in the event's labels. The latter was causing far
less trouble, so I went with that. The alertname label still doesn't have
a special meaning to most parts of the code, except that the API checks its
presence and the web UI displays it differently.
2013-07-30 13:18:11 +02:00
Julius Volz
a64c37bb03 Add missing format string and separators in event fingerprinting. 2013-07-30 13:12:24 +02:00
Julius Volz
dbdd7aef16 Indicate silence status for each alert. 2013-07-29 18:45:33 +02:00
Julius Volz
669f5ef916 Fix /metrics endpoint name (was /metrics.json). 2013-07-29 18:44:06 +02:00
Julius Volz
5d4f9f7e11 Add notification options to configuration. 2013-07-29 18:43:01 +02:00
juliusv
55719e9d3e Merge pull request #7 from prometheus/feature/config-loading
Add loading configuration from file.
2013-07-27 14:32:37 -07:00
Julius Volz
db599b6d26 PR comments fixups. 2013-07-26 17:39:46 +02:00
Julius Volz
24ac73af5d Add loading configuration from file. 2013-07-26 16:12:11 +02:00
juliusv
3f9cc9e3e3 Merge pull request #5 from prometheus/refactor/editable-silences
Implement silence create/read/update/delete API and UI workflow.
2013-07-26 07:08:31 -07:00
juliusv
3ddabd6fca Merge pull request #6 from prometheus/refactor/mutex-aggregator
Change Aggregator from channel-based to mutex-based.
2013-07-26 03:45:30 -07:00
Julius Volz
9b1a5aaf40 PR comment fixups. 2013-07-26 12:40:53 +02:00
Julius Volz
0c3c75edb3 Change Aggregator from channel-based to mutex-based.
This removes >100 lines of boilerplate code in the Aggregator alone.
2013-07-26 02:13:11 +02:00
Julius Volz
ba2247857d Implement silence create/read/update/delete API and UI workflow. 2013-07-26 00:23:13 +02:00
juliusv
00efa4a4a5 Merge pull request #4 from prometheus/refactor/mutex-based
Change Suppressor from channel-based to mutex-based, add tests.
2013-07-22 09:51:23 -07:00
Julius Volz
b49b7bba6f Change Suppressor from channel-based to mutex-based, add tests.
Start with the simplest possible locking scheme: lock the object-global
mutex at the beginning of each user-facing method. This is equivalent to
implicit locking provided by the reactor.

The reasoning behind this change is the incredible overhead of the
previous reactor request/response code:

Overhead for current model for every user-facing method:

- 2 struct type definitions (req/resp)
- 1 channel
  - 1 struct member definition site
  - 1 channel init site
  - 1 struct population site
  - 1 struct servicing site
  - 1 struct closing site
- 1 actual execution method

New lock-based code:

Per object: 1 lock
Per method:
- 1 taking the lock
- 1 actual execution method
2013-07-22 18:32:45 +02:00
juliusv
19e1ad7096 Merge pull request #3 from prometheus/refactor/show-alerts
Show alerts in UI; add no-op silence dialog; cleanups.
2013-07-22 07:28:27 -07:00
Julius Volz
ed289d58f0 Remove crufty logging statement. 2013-07-22 16:26:54 +02:00
Julius Volz
827d3c3710 Show actual alerts in UI and add no-op silence dialog. 2013-07-22 11:12:25 +02:00
Julius Volz
71a9d4af35 Ensure minimum repeat rate for events. 2013-07-22 11:12:25 +02:00
Julius Volz
436643f94e Cleanup: rename "element" to "event". 2013-07-22 11:12:25 +02:00
Julius Volz
13b106bb2b Ensure flag parsing on startup. 2013-07-22 11:12:25 +02:00
juliusv
8182f1bbc0 Merge pull request #2 from prometheus/refactor/aggregator
Fix bugs in aggregator and filters, add initial tests
2013-07-22 02:11:33 -07:00
Julius Volz
24d9977c95 Run go fmt. 2013-07-22 11:02:45 +02:00
Julius Volz
ca1eb66df4 Add BUG comment about aggregator draining. 2013-07-19 18:10:40 +02:00
Julius Volz
606d120541 Move aggregator scenario tests to separate type. 2013-07-19 15:26:51 +02:00
Julius Volz
f9bca4ba2b Remove unused timer. 2013-07-19 15:15:53 +02:00
Julius Volz
bc57fa4936 Add initial aggregator tests. 2013-07-19 15:05:45 +02:00
Julius Volz
a8bd98b7e1 Fix regex filters to match complete string.
If someone specifies

  service = "foo-service"

...they probably don't want it to match:

  service = "foo-servicebar"
2013-07-19 13:39:15 +02:00
Julius Volz
648a79a3e1 Synchronize Close(), fix race conditions.
Close() was not synced through the main dispatcher loop, so it could close
channels that were currently being written to by methods called from said
dispatcher loop. This leads to a crash. Instead, Close() now writes a
closeRequest, which is handled in the dispatcher.
2013-07-19 13:39:05 +02:00
juliusv
f362b04f61 Merge pull request #1 from prometheus/event-model-reworking
Change model to be more state- and less event-focussed.
2013-07-19 02:17:43 -07:00
Julius Volz
cf78397107 Change model to be more state- and less event-focussed. 2013-07-19 10:52:04 +02:00
Julius Volz
44c69920f4 Preliminary web interface and pairing refactorings. 2013-07-17 17:45:01 +02:00
Matt T. Proud
48d0ba2b9b Initial interface cleanups. 2013-07-16 23:17:42 +02:00
Matt T. Proud
ebbef79014 Run gofmt. 2013-07-16 19:18:05 +02:00
Julius Volz
116175a054 Add LICENSE file. 2013-07-16 17:10:34 +02:00
juliusv
f59d8fb2fc Create README.md 2013-07-16 17:09:56 +02:00
Julius Volz
f86966a0e7 Initial transliteration of https://gist.github.com/matttproud-soundcloud/fcb5153716eed0816863 2013-07-16 17:03:56 +02:00