Silent-PIM: Realizing the Processing-in-Memory Computing With Standard Memory Requests
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kim, Chang Hyun | - |
dc.contributor.author | Lee, Won Jun | - |
dc.contributor.author | Paik, Yoonah | - |
dc.contributor.author | Kwon, Kiyong | - |
dc.contributor.author | Kim, Seok Young | - |
dc.contributor.author | Park, Il | - |
dc.contributor.author | Kim, Seon Wook | - |
dc.date.accessioned | 2022-02-10T13:41:10Z | - |
dc.date.available | 2022-02-10T13:41:10Z | - |
dc.date.created | 2022-02-09 | - |
dc.date.issued | 2022-02-01 | - |
dc.identifier.issn | 1045-9219 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/135224 | - |
dc.description.abstract | The Deep Neural Network (DNN), Recurrent Neural Network (RNN) applications, rapidly becoming attractive to the market, process a large amount of low-locality data; thus, the memory bandwidth limits their peak performance. Therefore, many data centers actively adapt high-bandwidth memory like HBM2/HBM2E to resolve the problem. However, this approach would not provide a complete solution since it still transfers the data from the memory to the computing unit. Thus, processing-in-memory (PIM), which performs the computation inside memory, has attracted attention. However, most previous methods require the modification or the extension of core pipelines and memory system components like memory controllers, making the practical implementation of PIM very challenging and expensive in development. In this article, we propose a Silent-PIM that performs the PIM computation with standard DRAM memory requests; thus, requiring no hardware modifications and allowing the PIM memory device to perform the computation while servicing non-PIM applications' memory requests. We can achieve our design goal by preserving the standard memory request behaviors and satisfying the DRAM standard timing requirements. In addition, using standard memory requests makes it possible to use DMA as a PIM's offloading engine, resulting in processing the PIM memory requests fast and making a core perform other tasks. We compared the performance of three Long Short-Term Memory models (LSTM) kernels on real platforms, such as the Silent-PIM modeled on the FPGA, GPU, and CPU. For (p x 512) x (512 x 2048) matrix multiplication with a batch size p varying from 1 to 128, the Silent-PIM performed up to 16.9x and 24.6x faster than GPU and CPU, respectively, p = 1, which was the case without having any data reuse. At p = 128, the highest data reuse case, the GPU performance was the highest, but the PIM performance was still higher than the CPU execution. Similarly, at (p x 2048) element-wise multiplication and addition, where there was no data reuse, the Silent-PIM always achieved higher than both CPU and GPU. It also showed that when the PIM's EDP performance was superior to the others in all the cases having no data reuse. | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | IEEE COMPUTER SOC | - |
dc.title | Silent-PIM: Realizing the Processing-in-Memory Computing With Standard Memory Requests | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Kim, Seon Wook | - |
dc.identifier.doi | 10.1109/TPDS.2021.3065365 | - |
dc.identifier.scopusid | 2-s2.0-85102706801 | - |
dc.identifier.wosid | 000690438400002 | - |
dc.identifier.bibliographicCitation | IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, v.33, no.2, pp.251 - 262 | - |
dc.relation.isPartOf | IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | - |
dc.citation.title | IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | - |
dc.citation.volume | 33 | - |
dc.citation.number | 2 | - |
dc.citation.startPage | 251 | - |
dc.citation.endPage | 262 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Theory & Methods | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.subject.keywordAuthor | Silent-PIM | - |
dc.subject.keywordAuthor | in-memory processing | - |
dc.subject.keywordAuthor | standard memory requests | - |
dc.subject.keywordAuthor | DMA | - |
dc.subject.keywordAuthor | LSTM | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.