FRASystem: fault tolerant system using agents in distributed computing systems
- Authors
- Lee, HwaMin; Park, DooSoon; Yu, HeonChang; Lee, Giyeol
- Issue Date
- 3월-2011
- Publisher
- SPRINGER
- Keywords
- Fault tolerance; Multi-agent system; Distributed computing system; Rollback-recovery; Garbage-collection
- Citation
- CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, v.14, no.1, pp.15 - 25
- Indexed
- SCIE
SCOPUS
- Journal Title
- CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS
- Volume
- 14
- Number
- 1
- Start Page
- 15
- End Page
- 25
- URI
- https://scholar.korea.ac.kr/handle/2021.sw.korea/112891
- DOI
- 10.1007/s10586-009-0095-x
- ISSN
- 1386-7857
- Abstract
- In this paper, we present a fault tolerant and recovery system called FRASystem (Fault Tolerant & Recovery Agent System) using multi-agent in distributed computing systems. Previous rollback-recovery protocols were dependent on an inherent communication and an underlying operating system, which caused a decline of computing performance. We propose a rollback-recovery protocol that works independently on an operating system and leads to an increasing portability and extensibility. We define four types of agents: (1) a recovery agent performs a rollback-recovery protocol after a failure, (2) an information agent constructs domain knowledge as a rule of fault tolerance and information during a failure-free operation, (3) a facilitator agent controls the communication between agents, (4) a garbage collection agent performs garbage collection of the useless fault tolerance information. Since agent failures may lead to inconsistent states of a system and a domino effect, we propose an agent recovery algorithm. A garbage collection protocol addresses the performance degradation caused by the increment of saved fault tolerance information in a stable storage. We implemented a prototype of FRASystem using Java and CORBA and experimented the proposed rollback-recovery protocol. The simulations results indicate that the performance of our protocol is better than previous rollback-recovery protocols which use independent checkpointing and pessimistic message logging without using agents. Our contributions are as follows: (1) this is the first rollback-recovery protocol using agents, (2) FRASystem is not dependent on an operating system, and (3) FRASystem provides a portability and extensibility.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - Graduate School > Department of Computer Science and Engineering > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.