Alarm#
- class lsst.ts.watcher.Alarm(name, log=None)#
Bases:
objectA Watcher alarm.
- Parameters:
name (
str) – Name of alarm. This must be unique among all alarms and should be of the form system.[subsystem….]_name so that groups of related alarms can be acknowledged.log (
logging.Logger, optional) – Parent logger.
- acknowledged_by#
The
userargument when the alarm is acknowledged. “” if not acknowledged.- Type:
- auto_acknowledge_delay#
The delay (seconds) after which an alarm will be automatically acknowledged. Never if 0 (the default).
- Type:
- auto_unacknowledge_delay#
The delay (seconds) after which an alarm will be automatically unacknowledged. Never if 0 (the default).
- Type:
- do_escalate#
Should the alarm be escalated? The value is set by this class and is intended to be read by the alarm callback.
- Type:
- escalation_delay#
If an alarm goes to critical state and remains unacknowledged for this period of time (seconds), the alarm should be escalated. If 0, the alarm will not be escalated.
- Type:
- escalation_responder#
Who or what to escalate the alarm to. If blank, the alarm will not be escalated.
- Type:
- escalated_id#
ID of the SquadCast escalation alert. “” if not escalated. Set to “Failed: {reason}” if escalation failed. This is set to “” by
reset, and intended to be set to non-empty values by the alarm callback.- Type:
- severity_queue#
Intended only for unit tests. Defaults to None. If a unit test sets this to an
asyncio.Queue,set_severitywill queue the severity every time it returns True.- Type:
- auto_acknowledge_task#
A task that monitors the automatic acknowledge timer.
- Type:
- auto_unacknowledge_task#
A task that monitors the automatic unacknowledge timer.
- Type:
- escalating_task#
A task that monitors the process of escalating an alarm to a notification service such as SquadCast. This timer is managed by WatcherCsc, because it knows how to communicate with the notification service.
- Type:
- escalation_timer_task#
A task that monitors the escalation timer. When this timer fires, it sets do_escalate to true and calls the callback. It is then up the CSC to actually escalate the alarm (see escalating_task).
- Type:
- unmute_task#
A task that monitors the unmute timer.
- Type:
Attributes Summary
Get the callback function.
Is this alarm muted?
severity = max severity = NONE.
Methods Summary
acknowledge(severity, user)Acknowledge the alarm.
assert_equal(other[, ignore_attrs])Assert that this alarm equals another alarm.
assert_next_severity(expected_severity[, ...])Wait for and check the next severity.
close()Cancel pending tasks.
configure_basics([callback, ...])Configure the callback function and auto ack/unack delays.
configure_escalation(escalation_delay, ...)Configure escalation.
Remove all items from the severity queue.
Initialize the severity queue.
make_log_entry(log_server_url)Post message to narrative log entry in response to alarm.
mute(duration, severity, user)Mute this alarm for a specified duration and severity.
reset()Reset the alarm to nominal state.
Run the callback function, if present.
set_severity(severity, reason)Set the severity.
unacknowledge([escalate])Unacknowledge the alarm.
unmute()Unmute this alarm.
Attributes Documentation
- callback#
Get the callback function.
- muted#
Is this alarm muted?
- nominal#
severity = max severity = NONE.
When the alarm is in nominal state it should not be displayed in the Watcher GUI.
- Type:
True if alarm is in nominal state
Methods Documentation
- async acknowledge(severity, user)#
Acknowledge the alarm.
Halt the escalation timer, if running, and set do_escalate False. Restart the auto unacknowledge timer, if configured (self.auto_unacknowledge_delay > 0).
- Parameters:
- Returns:
updated – True if the alarm state changed (any fields were modified other than tasks being cancelled), False otherwise.
- Return type:
- Raises:
ValueError – If
severity < self.max_severity. In this case the acknowledge method does not change the alarm state.
Notes
The reason
severityis an argument is to handle the case that a user acknowledges an alarm just as the alarm severity increases. To avoid the danger of accidentally acknowledging an alarm at a higher severity than intended, the acknowledgement is rejected.
- assert_equal(other, ignore_attrs=())#
Assert that this alarm equals another alarm.
Compares all attributes except tasks and those specified in ignore_attrs.
- async assert_next_severity(expected_severity, check_empty=True, flush=False, timeout=10)#
Wait for and check the next severity.
Only intended for tests. In order to call this you must first call
init_severity_queue(once) to set up a severity queue.- Parameters:
expected_severity (
AlarmSeverity) – The expected severity.check_empty (
bool, optional) – If true (the default): check that the severity queue is empty, after getting the severity.flush (
bool, optional) – If true (not the default): flush all existing values from the queue, then wait for the next severity. This is useful for polling alarms.timeout (
float, optional) – Maximum time to wait (seconds)
- Raises:
AssertionError – If the severity is not as expected, or if
check_emptytrue and there are additional queued severities.asyncio.TimeoutError – If no new severity is seen in time.
RuntimeError – If you never called
init_severity_queue.
Notes
Here is the typical way to use this method: * Create a rule * Call
rule.alarm.init_severity_queue()* Write SAL messages that are expected to change the alarm severity. * After writing each such message, call:await rule.alarm.assert_next_severity(expected_severity)
- close()#
Cancel pending tasks.
- configure_basics(callback=None, auto_acknowledge_delay=0, auto_unacknowledge_delay=0)#
Configure the callback function and auto ack/unack delays.
- Parameters:
callback (callable, optional) – Function or coroutine to call whenever the alarm changes state, or None if no callback wanted. The function receives one argument: this alarm.
auto_acknowledge_delay (
float, optional) – Delay (in seconds) before a stale alarm is automatically acknowledged, or 0 for no automatic acknowledgement. A stale alarm is one that has not yet been acknowledged, but its severity has gone to NONE.auto_unacknowledge_delay (
float, optional) – Delay (in seconds) before an acknowledged alarm is automatically unacknowledged, or 0 for no automatic unacknowledgement. Automatic unacknowledgement only occurs if the alarm persists, because an acknowledged alarm is reset if severity goes to NONE.
- configure_escalation(escalation_delay, escalation_responder)#
Configure escalation.
Set the following attributes:
escalation_delay
escalation_responder
- Parameters:
- Raises:
ValueError – If escalation_delay < 0. If escalation_delay > 0 and escalation_responder empty, or escalation_delay = 0 and escalation_responder not empty.
TypeError – If escalation_responder is not a str.
- init_severity_queue()#
Initialize the severity queue.
You must call this once before calling
assert_next_severity. You may call it again to reset the queue, but that is uncommon.Warning
Only tests should call this method. Calling this in production code will cause a memory leak.
- async make_log_entry(log_server_url)#
Post message to narrative log entry in response to alarm.
- async mute(duration, severity, user)#
Mute this alarm for a specified duration and severity.
Muting also cancels the escalation timer.
- Parameters:
- Raises:
ValueError – If
duration <= 0,severity == AlarmSeverity.NONEorseverityis not a validAlarmSeverityenum value.
Notes
An alarm cannot have multiple mute levels and durations. If mute is called multiple times, the most recent call overwrites information from earlier calls.
- reset()#
Reset the alarm to nominal state.
Do not call the callback function. This is designed to be called by Model.enable, which first resets alarms and then feeds them data before writing alarm state.
It sets too many fields to be called by set_severity.
- async run_callback()#
Run the callback function, if present.
- async set_severity(severity, reason)#
Set the severity.
Call the callback function unless the alarm was nominal and remains nominal. Put the new severity on the severity queue (if it exists), regardless of whether the alarm was nominal.
- Parameters:
- Returns:
updated – True if the alarm state changed (i.e. if any fields were modified), False otherwise.
- Return type:
- async unacknowledge(escalate=True)#
Unacknowledge the alarm. Basically a no-op if nominal or not acknowledged.
- async unmute()#
Unmute this alarm.