Invention Grant
- Patent Title: Reliable fault resolution in a cluster
- Patent Title (中): 群集中可靠的故障解决方案
-
Application No.: US11773707Application Date: 2007-07-05
-
Publication No.: US07941690B2Publication Date: 2011-05-10
- Inventor: Sudhir G. Rao , Bruce M. Jackson , Mark C. Davis , Srikanth N. Sridhara
- Applicant: Sudhir G. Rao , Bruce M. Jackson , Mark C. Davis , Srikanth N. Sridhara
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Lieberman & Brandsdorfer, LLC
- Main IPC: G06F11/00
- IPC: G06F11/00

Abstract:
A method and system for localizing and resolving a fault in a cluster environment. The cluster is configured with at least one multi-homed node, and at least one gateway for each network interface. Heartbeat messages are sent between peer nodes and the gateway in predefined periodic intervals. In the event of loss of a heartbeat message by any node or gateway, an ICMP echo is issued to each node and gateway in the cluster for each network interface. If neither a node loss not a network loss is validated in response to the ICMP echo, an application level ping is issued to determine if the fault associated with the absence of the heartbeat message is a transient error condition or an application software fault.
Public/Granted literature
- US20100115338A1 Reliable Fault Resolution In A Cluster Public/Granted day:2010-05-06
Information query