Alarm

class lsst.ts.watcher.Alarm(name, log=None)

Bases: object

A Watcher alarm.

Parameters:
namestr

Name of alarm. This must be unique among all alarms and should be of the form system.[subsystem….]_name so that groups of related alarms can be acknowledged.

loglogging.Logger, optional

Parent logger.

Attributes:
namestr

Name of alarm.

acknowledged_bystr

The user argument when the alarm is acknowledged. “” if not acknowledged.

auto_acknowledge_delayfloat

The delay (seconds) after which an alarm will be automatically acknowledged. Never if 0 (the default).

auto_unacknowledge_delayfloat

The delay (seconds) after which an alarm will be automatically unacknowleddged. Never if 0 (the default).

do_escalatebool

Should the alarm be escalated? The value is set by this class and is intended to be read by the alarm callback.

escalation_delayfloat

If an alarm goes to critical state and remains unacknowledged for this period of time (seconds), the alarm should be escalated. If 0, the alarm will not be escalated.

escalation_responderstr

Who or what to escalate the alarm to. If blank, the alarm will not be escalated.

escalated_idstr

ID of the SquadCast escalation alert. “” if not escalated. Set to “Failed: {reason}” if escalation failed. This is set to “” by reset, and intended to be set to non-empty values by the alarm callback.

severity_queueasyncio.Queue or None

Intended only for unit tests. Defaults to None. If a unit test sets this to an asyncio.Queue, set_severity will queue the severity every time it returns True.

auto_acknowledge_task: asyncio.Future

A task that monitors the automatic acknowledge timer.

auto_unacknowledge_task: asyncio.Future

A task that monitors the automatic unacknowledge timer.

escalating_taskasyncio.Future

A task that monitors the process of escalating an alarm to a notification service such as SquadCast. This timer is managed by WatcherCsc, because it knows how to communicate with the notification service.

escalation_timer_taskasyncio.Future

A task that monitors the escalation timer. When this timer fires, it sets do_escalate to true and calls the callback. It is then up the CSC to actually escalate the alarm (see escalating_task).

unmute_task: asyncio.Future

A task that monitors the unmute timer.

Attributes Summary

callback

Get the callback function.

muted

Is this alarm muted?

nominal

True if alarm is in nominal state: severity = max severity = NONE.

Methods Summary

acknowledge(severity, user)

Acknowledge the alarm.

assert_equal(other[, ignore_attrs])

Assert that this alarm equals another alarm.

assert_next_severity(expected_severity[, ...])

Wait for and check the next severity.

close()

Cancel pending tasks.

configure_basics([callback, ...])

Configure the callback function and auto ack/unack delays.

configure_escalation(escalation_delay, ...)

Configure escalation.

flush_severity_queue()

Remove all items from the severity queue.

init_severity_queue()

Initialize the severity queue.

make_log_entry(log_server_url)

Post message to narrative log entry in response to alarm Parameters ---------- log_server_url: str URL of the narrativelog service.

mute(duration, severity, user)

Mute this alarm for a specified duration and severity.

reset()

Reset the alarm to nominal state.

run_callback()

Run the callback function, if present.

set_severity(severity, reason)

Set the severity.

unacknowledge([escalate])

Unacknowledge the alarm.

unmute()

Unmute this alarm.

Attributes Documentation

callback

Get the callback function.

muted

Is this alarm muted?

nominal

True if alarm is in nominal state: severity = max severity = NONE.

When the alarm is in nominal state it should not be displayed in the Watcher GUI.

Methods Documentation

async acknowledge(severity, user)

Acknowledge the alarm.

Halt the escalation timer, if running, and set do_escalate False. Restart the auto unacknowledge timer, if configured (self.auto_unacknowledge_delay > 0).

Parameters:
severitylsst.ts.idl.enums.Watcher.AlarmSeverity or int

Severity to acknowledge. Must be >= self.max_severity. If the severity goes above this level the alarm will unacknowledge itself.

userstr

Name of user; used to set acknowledged_by.

Returns:
updatedbool

True if the alarm state changed (any fields were modified other than tasks being cancelled), False otherwise.

Raises:
ValueError

If severity < self.max_severity. In this case the acknowledge method does not change the alarm state.

Notes

The reason severity is an argument is to handle the case that a user acknowledges an alarm just as the alarm severity increases. To avoid the danger of accidentally acknowledging an alarm at a higher severity than intended, the acknowledgement is rejected.

assert_equal(other, ignore_attrs=())

Assert that this alarm equals another alarm.

Compares all attributes except tasks and those specified in ignore_attrs.

Parameters:
otherAlarm

Alarm to compare.

ignore_attrslist [str], optional

Sequence of attribute names to ignore (in addition to task attributes, which are always ignored.)

async assert_next_severity(expected_severity, check_empty=True, flush=False, timeout=10)

Wait for and check the next severity.

Only intended for tests. In order to call this you must first call init_severity_queue (once) to set up a severity queue.

Parameters:
expected_severityAlarmSeverity

The expected severity.

check_emptybool, optional

If true (the default): check that the severity queue is empty, after getting the severity.

flushbool, optional

If true (not the default): flush all existing values from the queue, then wait for the next severity. This is useful for polling alarms.

timeoutfloat, optional

Maximum time to wait (seconds)

Raises:
AssertionError

If the severity is not as expected, or if check_empty true and there are additional queued severities.

asyncio.TimeoutError

If no new severity is seen in time.

RuntimeError

If you never called init_severity_queue.

Notes

Here is the typical way to use this method: * Create a rule * Call rule.alarm.init_severity_queue() * Write SAL messages that are expected to change the alarm severity. * After writing each such message, call:

await rule.alarm.assert_next_severity(expected_severity)
close()

Cancel pending tasks.

configure_basics(callback=None, auto_acknowledge_delay=0, auto_unacknowledge_delay=0)

Configure the callback function and auto ack/unack delays.

Parameters:
callbackcallable, optional

Function or coroutine to call whenever the alarm changes state, or None if no callback wanted. The function receives one argument: this alarm.

auto_acknowledge_delayfloat, optional

Delay (in seconds) before a stale alarm is automatically acknowledged, or 0 for no automatic acknowledgement. A stale alarm is one that has not yet been acknowledged, but its severity has gone to NONE.

auto_unacknowledge_delayfloat, optional

Delay (in seconds) before an acknowledged alarm is automatically unacknowledged, or 0 for no automatic unacknowledgement. Automatic unacknowledgement only occurs if the alarm persists, because an acknowledged alarm is reset if severity goes to NONE.

configure_escalation(escalation_delay, escalation_responder)

Configure escalation.

Set the following attributes:

  • escalation_delay

  • escalation_responder

Parameters:
escalation_delayfloat

Delay before escalating a critical unacknowledged alarm (sec). If 0 the alarm is not escalated.

escalation_responderstr

Who or what to escalate the alarm to. If blank, the alarm will not be escalated.

Raises:
ValueError

If escalation_delay < 0. If escalation_delay > 0 and escalation_responder empty, or escalation_delay = 0 and escalation_responder not empty.

TypeError

If escalation_responder is not a str.

flush_severity_queue() None

Remove all items from the severity queue.

init_severity_queue()

Initialize the severity queue.

You must call this once before calling assert_next_severity. You may call it again to reset the queue, but that is uncommon.

Warning

Only tests should call this method. Calling this in production code will cause a memory leak.

async make_log_entry(log_server_url)

Post message to narrative log entry in response to alarm Parameters ———- log_server_url: str

URL of the narrativelog service.

Returns:
response: dict

JSON respose from Post

async mute(duration, severity, user)

Mute this alarm for a specified duration and severity.

Muting also cancels the escalation timer.

Parameters:
durationfloat

How long to mute the alarm (sec).

severitylsst.ts.idl.enums.Watcher.AlarmSeverity or int

Severity to mute. If the alarm’s current or max severity goes above this level the alarm should be displayed.

userstr

Name of user who muted this alarm. Used to set muted_by.

Raises:
ValueError

If duration <= 0, severity == AlarmSeverity.NONE or severity is not a valid AlarmSeverity enum value.

Notes

An alarm cannot have multiple mute levels and durations. If mute is called multiple times, the most recent call overwrites information from earlier calls.

reset()

Reset the alarm to nominal state.

Do not call the callback function. This is designed to be called by Model.enable, which first resets alarms and then feeds them data before writing alarm state.

It sets too many fields to be called by set_severity.

async run_callback()

Run the callback function, if present.

async set_severity(severity, reason)

Set the severity.

Call the callback function unless the alarm was nominal and remains nominal. Put the new severity on the severity queue (if it exists), regardless of whether the alarm was nominal.

Parameters:
severitylsst.ts.idl.enums.Watcher.AlarmSeverity or int

New severity.

reasonstr

The reason for this state; this should be a brief message explaining what is wrong. Ignored if severity is NONE.

Returns:
updatedbool

True if the alarm state changed (i.e. if any fields were modified), False otherwise.

async unacknowledge(escalate=True)

Unacknowledge the alarm. Basically a no-op if nominal or not acknowledged.

Parameters:
escalatebool, optional

Escalate the alarm, if max_severity is critical? Defaults to true. Only set false if automatically unacknowledging the alarm.

Returns:
updatedbool

True if the alarm state changed (i.e. if any fields were modified), False otherwise.

async unmute()

Unmute this alarm.