Chicago Journal of Theoretical Computer Science

Volume 1998

Article 4

Published by MIT Press. Copyright 1998 Massachusetts Institute of Technology.

Your institution may already be a subscriber to CJTCS. If not, please subscribe for legitimate access to all journal articles.

Multitolerance in Distributed Reset

Sandeep S. Kulkarni (The Ohio State University) and Anish Arora (The Ohio State University)
7 December 1998

A reset of a distributed system is safe if it does not complete ``prematurely,'' i.e., without having reset some process in the system. Safe resets are possible in the presence of certain faults, such as process fail-stops and repairs, but are not always possible in the presence of more general faults, such as arbitrary transients. In this paper, we design a bounded-memory distributed-reset program that possesses two tolerances: (1) in the presence of fail-stops and repairs, it always executes resets safely, and (2) in the presence of a finite number of transient faults, it eventually executes resets safely. Designing this multitolerance in the reset program introduces the novel concern of designing a safety detector that is itself multitolerant. A broad application of our multitolerant safety detector is to make any total program likewise multitolerant.

DOI: 10.4086/cjtcs.1998.004
[] Article 3 [] Volume 1999, Article 1
[back] Volume 1998 [back] Published articles
[CJCTS home]

Last modified: Tue Feb 9 20:50:58 CST 1999