Exploiting thread-level parallelism in lockstep execution by partially duplicating a single pipeline
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Oh, Jaegeun | - |
dc.contributor.author | Hwang, Seok Joong | - |
dc.contributor.author | Nguyen, Huong Giang | - |
dc.contributor.author | Kim, Areum | - |
dc.contributor.author | Kim, Seon Wook | - |
dc.contributor.author | Kim, Chulwoo | - |
dc.contributor.author | Kim, Jong-Kook | - |
dc.date.accessioned | 2021-09-09T05:34:06Z | - |
dc.date.available | 2021-09-09T05:34:06Z | - |
dc.date.created | 2021-06-10 | - |
dc.date.issued | 2008-08 | - |
dc.identifier.issn | 1225-6463 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/122917 | - |
dc.description.abstract | in most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler. | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | WILEY | - |
dc.title | Exploiting thread-level parallelism in lockstep execution by partially duplicating a single pipeline | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Kim, Seon Wook | - |
dc.contributor.affiliatedAuthor | Kim, Chulwoo | - |
dc.contributor.affiliatedAuthor | Kim, Jong-Kook | - |
dc.identifier.doi | 10.4218/etrij.08.0107.0343 | - |
dc.identifier.scopusid | 2-s2.0-49449105358 | - |
dc.identifier.wosid | 000258418400009 | - |
dc.identifier.bibliographicCitation | ETRI JOURNAL, v.30, no.4, pp.576 - 586 | - |
dc.relation.isPartOf | ETRI JOURNAL | - |
dc.citation.title | ETRI JOURNAL | - |
dc.citation.volume | 30 | - |
dc.citation.number | 4 | - |
dc.citation.startPage | 576 | - |
dc.citation.endPage | 586 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.identifier.kciid | ART001269287 | - |
dc.description.journalClass | 1 | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.description.journalRegisteredClass | kci | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalResearchArea | Telecommunications | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.relation.journalWebOfScienceCategory | Telecommunications | - |
dc.subject.keywordAuthor | ILP | - |
dc.subject.keywordAuthor | TLP | - |
dc.subject.keywordAuthor | SMT | - |
dc.subject.keywordAuthor | CMP | - |
dc.subject.keywordAuthor | MLEP | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.