Critical Errors

The Critical Error Log records non-correctable memory errors, as well as catastrophic hardware and software errors that cause a system to fail. This information helps you quickly identify and correct the problem, minimizing downtime.

This section displays a description of critical errors. The date and time of each error is followed by a brief description of the error. The time shown is rounded to the nearest hour.

If critical errors are marked with an exclamation point (!), indicating corrective action is required, the log condition is degraded. To eliminate the exclamation mark and indicate that an entry has been corrected, select the entries you wish to clear and click the Correct Marked Entries button or run Diagnostics on the device. An asterisk ( * ) indicates the log entry to which the Last Failure Message applies.

--------------------------------------------------------------------------------

IMPORTANT: Agents must have sets enabled and you must have the correct SNMP Community string to be able to mark entries as corrected.

--------------------------------------------------------------------------------

The following list displays errors that may be logged. If you receive any of these errors, run Diagnostics on your system or consult your software documentation.

Abnormal Program Termination - The device has detected a fatal software error resulting in a device failure.

ASR Base Memory Parity Error - The system detected a data error in base memory following a reset due to an ASR timeout.

ASR Extended Memory Parity Error - The system detected a data error in extended memory following a reset due to an ASR timeout.

ASR Memory Parity Error - The system ROM was unable to allocate enough memory to create a stack. It was unable to put a message on the screen or continue booting the server.

ASR Reset Limit Reached - The maximum number of system resets has been reached. The System Configuration Utilities will be loaded.

ASR Reset Occurred - No error data is logged.

ASR Test Event - An ASR Test event was generated by the user through the system utilities. No action is required since the event was user-generated to test the ASR configuration.

ASR Timeout NMI - The server has generated an ASR NMI because the ASR timer has not been refreshed. This generally indicates a driver has not relinquished control of the processor causing a server failure. The resulting ASR NMI was generated to log this event. Note the module that was executing.

CPU Internal Corrected Error Threshold Exceeded - The system has detected that a CPU has exceeded the threshold for the number of internal ECC cache errors.

CPU Processor Power Module Failed - The system has detected that a processor’s power module has failed.

Critical Temperature - The system's critical temperature has been exceeded and auto shutdown has been initiated.

Error Detected On Bootup - The system detected an error during the Power-On Self-Test.

Exception - The processor has detected a critical exception resulting in a device failure.

Fan Failure - The system or processor fan failed.

NMI - CPU Local Error - The processor experienced a fatal error resulting in a device failure.

NMI - Expansion Board Error - A board on the expansion bus indicated an error condition causing a device failure.

NMI - Expansion Bus Arbitration Error - Memory refresh cycles were delayed, potentially leading to data loss. The error results in a system failure.

NMI - Expansion Bus Master Time-out - A bus master expansion board in the indicated slot did not release the bus after its maximum time resulting in a device failure.

NMI - Expansion Bus Slave Time-out - A board on the expansion bus delayed a bus cycle beyond the maximum time resulting in a device failure.

NMI - Failsafe Timer Expiration - The software was unable to reset the system failsafe timer, resulting in a system failure.

NMI - Processor Address Error 1 - A processor internal address parity checking error occurred, resulting in a device failure.

NMI - Processor Address Error 2 - The processor detected an address parity error during an inquire cycle.

NMI - Processor Cache Parity Error - A data error occurred within the processor cache, resulting in a system failure.

NMI - Processor Internal Error 1 - A processor internal parity error occurred, resulting in a device failure.

NMI - Processor Internal Error 2 - The processor detected an internal parity error or a functional redundancy error.

NMI - Processor Parity Error - The processor detected a data error resulting in a device failure.

NMI - Software Generated Interrupt - Software indicated a system error resulting in a system failure.

NMI - System Concurrency Error - A potential error condition was detected within the Data Flow Manager, resulting in a system failure.

NMI - Uncorrectable Memory Error - The device experienced an uncorrectable memory parity error resulting in a device failure.

NMI - Unknown Error Type - The device driver does not recognize this NMI. You may need to upgrade your health driver.

Processor Failure - The processor failed during the Power-On Self-Test.

Server Manager Failure - An error occurred in the server interface with the Server Manager/R.

UPS A/C Line Failure/Shutdown or Battery Low - The device has initiated a UPS or operating system shutdown, or the battery is almost depleted after an AC line failure.

The Last Failure Message on this window displays the last failure message associated with a critical error.