Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Fault Tolerance Technique Offlining Faulty Blocks by Heap Memory Management

Authors
Jun, JaeyungPaik, YoonahMin, Gyeong IlKim, Seon WookHan, Youngsun
Issue Date
7월-2019
Publisher
ASSOC COMPUTING MACHINERY
Keywords
DRAM fault recovery
Citation
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, v.24, no.4
Indexed
SCIE
SCOPUS
Journal Title
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS
Volume
24
Number
4
URI
https://scholar.korea.ac.kr/handle/2021.sw.korea/64646
DOI
10.1145/3329079
ISSN
1084-4309
Abstract
As dynamic random access memory (DRAM) cells continue to be scaled down for higher density and capacity, they have more faults. Thus, DRAM reliability becomes a major concern in computer systems. Previous studies have proposed many techniques preserving the reliability in various system components, such as DRAM internal, memory controller, caches, and operating systems. By reviewing the techniques, we identified the following two considerations: First, it is possible to recover faults with reasonable overhead at high fault rate only if the recovery unit is fine-grained. Second, since hardware modification requires additional cost in the employment of a technique, a pure software-based recovery technique is preferable. However, in the existing software-based recovery technique, the recovery unit is too coarse-grained to tolerate the high fault rate. In this article, we propose a pure software-based recovery technique with fine-granularity. Our key idea is based on heap segments being managed by the system library with variable-sized chunks to handle dynamic allocation in user applications. In our technique, faulty blocks in pages are offlined by marking them as allocated chunks. Thus, not only fault-free pages but also the remaining clean blocks in faulty pages are allowed to be usable space. Our technique is implemented by modifying the operating system and the system library. Since hardware assistance is unnecessary in the implementation, we evaluated our method on a real machine. Our evaluation results show that our technique has negligible performance overhead at high bit error rate (BER) 5.12e-5, which a hardware-based recovery technique could not tolerate without unacceptable area overhead. Also, at the same BER, our method provides 5.22x usable space, compared with page-offline, which is the state-of-the-art pure software-based technique.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Engineering > School of Electrical Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Seon Wook photo

Kim, Seon Wook
공과대학 (전기전자공학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE