Building a hash over an entire set of alerts causes problems, because
the hash differs, on any change, whereas we only want to send
notifications if the alert and it's state have changed. Therefore this
introduces a list of alerts that are active and a list of alerts that
are resolved. If the currently active alerts of a group are a subset of
the ones that have been notified about before then they are
deduplicated. The resolved notifications work the same way, with a
separate list of resolved notifications that have already been sent.
* Vendor dependencies.
This updates several old dependencies, removes
some that are no longer needed, and adds
`pkg/labels` from prometheus `dev-2.0` branch.
* Add metrics selector parsing code
This is a temporary simplified re-implementation
of promQL's metric selector parsing.
* Add alerts filtering
Filter alerts through `?filter=` query string.
* Add silences filtering
Filter silences through `?filter=` query string.
* Move `parse` to `pkg/parse`
* api: Expose mesh status
The weaveworks mesh package reveals information about the current status
of the mesh network between alertmanager instances. This commit exposes
the current address and connection status of each instance connected to
the targeted alertmanager instance via the /status API endpoint.
* api: Replace LocalConnectionStatus with PeerStatus
Now meshStatus contains all peers (Name, NickName, UID) of the network.
Additionally adding Name and Nickname of current target to toplevel of
meshNetwork object.
* Find MAC address if mesh.hardware-addr not given
Defaulting to the machine's MAC address fails
sometimes fails and causes a panic. Allow the user
to specify custom address to skip this so they can
run AlertManager.
* -mesh.hardware-address -> -mesh.peer-id
* Fix command-line invocation
This implements the suggestion to name Stuart as the maintainer for
now. We can later revisit if this is a solution that works and is
sustainable for everyone involved.
* Wait for test server to be ready before running tests
This fixes problems when running the acceptance tests in slow or CPU-starved
machines, as mentioned in #472.
* Export parsed configuration as JSON in /api/v1/status.
* Avoid exporting the XXX fields.
* Drop js-yaml library and use already parsed configuration.
* Go fmt + go-bindata
This fixes#559 by removing concurrent map writes to the matcher cache.
The cache was guarded by the Silence's main lock, which only used a
read-lock on queries.
The cache's get methods lazily loads data into the cache and thus
causing concurrent writes.
We just change the main lock to always write-lock, as we don't expect
high lock contention at this point and would have it in a dedicated
cache lock anyway.
Previously errors would look like:
ERRO[0008] api error: {bad_data 0xc4201caae0}
source=api.go:534
Now they look like:
ERRO[0001] api error: bad_data: json: cannot unmarshal object into Go
value of type []*types.Alert source=api.go:534
Nodes with `continue: true` were being ignored,
and only the first match in the routing tree was
being highlighted.
Additionally, each node was incorrectly being set
to continue=true if nothing is received from the
API. The API sends nothing if continue=false.