Invention Grant
US07975175B2 Risk indices for enhanced throughput in computing systems 有权
计算系统增强吞吐量的风险指数

Risk indices for enhanced throughput in computing systems
Abstract:
Embodiments of a system that adjusts a checkpointing frequency in a distributed computing system that executes multiple jobs are described. During operation, the system receives signals associated with the operation of the computing nodes. Then, the system determines risk metrics for the computing nodes using a pattern-recognition technique to identify anomalous signals in the received signals. Next, the system adjusts a checkpointing frequency of a given checkpoint for a given computing node based on a comparison of a risk metric associated with the given computing node and a threshold, thereby implementing holistic fault tolerance, in which prediction and prevention of potential faults occurs across the distributed computing system.
Public/Granted literature
Information query
Patent Agency Ranking
0/0