Automatic Server Recovery (ASR) is configured using RBSU available during the initial boot of the server by pressing the F9 key when prompted.
The Server Availability menu in RBSU includes options that configure the ASR features:
• ASR Status
• ASR Timeout
ASR Status
The ASR Status option is a toggle setting that either enables or disables ASR. When set to Disabled, no ASR features function.
ASR Timeout
The ASR Timeout option sets a timeout limit for resetting a server that is not responding. When the server has not responded in the selected amount of time, the server automatically resets.
The available time increments are:
• 10 minutes
• 15 minutes
• 20 minutes
• 30 minutes
• 5 minutes
This ASR feature is implemented using a “heartbeat” timer that continually counts down. The Health Monitor frequently reloads the counter to prevent it from counting down to zero.
If the ASR counts down to zero, it is assumed that the operating system has locked up and the system will automatically attempt to reboot.
Events which may contribute to the operating system locking up includes:
• A peripheral device such as a Peripheral Component Interconnect Specification (PCI) adapter that generates numerous spurious interrupts when it fails.
• A high priority software application consumes all the available central processing unit (CPU) cycles and does not allow the operating system scheduler to run the ASR timer reset process.
• A software or kernel application consumes all available memory, including the virtual memory space (for example, swap). This may cause the operating system scheduler to cease functioning.
• A critical operating system component, such as a file system, fails and causes the operating system scheduler to cease functioning.
• Any other event besides an ASR timeout that causes a Non-Maskable Interrupt (NMI) to be generated.
The ASR feature is a hardware-based timer. If a true hardware failure occurs, the Health Monitor might not be called, but the server will be reset as if the power switch were pressed. The ProLiant ROM code may log an event to the IML when the server reboots.
The Health Monitor is notified of ASR timeout through a NMI. If possible, the driver will attempt to perform the following actions:
• Displays a message on the console stating the problem.
• Makes an entry in the IML.
• Attempts to gracefully shut down the operating system to close the file systems.
However, there is no guarantee that the operating system will gracefully shutdown. This shutdown depends on the type of error condition (software or hardware) and its severity. The Health Monitor logs a series of messages when an ASR event occurs. The presence or absence of these messages can provide some insight into the reason for the ASR event. The order of the messages is important, since the ASR event is always a symptom of another error condition.