Invention Grant
- Patent Title: Recording a communication pattern and replaying messages in a parallel computing system
- Patent Title (中): 记录通信模式并在并行计算系统中重播消息
-
Application No.: US12500715Application Date: 2009-07-10
-
Publication No.: US08407376B2Publication Date: 2013-03-26
- Inventor: Philip Heidelberger , Sameer Kumar
- Applicant: Philip Heidelberger , Sameer Kumar
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Ryan, Mason & Lewis, LLP
- Main IPC: G06F13/28
- IPC: G06F13/28

Abstract:
A parallel computer system includes a plurality of compute nodes. Each of the compute nodes includes at least one processor, at least one memory, and a direct memory address engine coupled to the at least one processor and the at least one memory. The system also includes a network interconnecting the plurality of compute nodes. The network operates a global message-passing application for performing communications across the network. Local instances of the global message-passing application operate at each of the compute nodes to carry out local processing operations independent of processing operations carried out at another one of the compute nodes. The direct memory address engines are configured to interact with the local instances of the global message-passing application via injection FIFO metadata describing an injection FIFO in a corresponding one of the memories. The local instances of the global message passing application are configured to record, in the injection FIFO in the corresponding one of the memories, message descriptors associated with messages of an arbitrary communication pattern in an iteration of an executing application program. The local instances of the global message passing application are configured to replay the message descriptors during a subsequent iteration of the executing application program.
Public/Granted literature
- US20110010471A1 Recording A Communication Pattern and Replaying Messages in a Parallel Computing System Public/Granted day:2011-01-13
Information query