Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Achieving the Performance of All-Bank In-DRAM PIM With Standard Memory Interface: Memory-Computation Decoupling

Full metadata record
DC Field Value Language
dc.contributor.authorPaik, Yoonah-
dc.contributor.authorKim, Chang Hyun-
dc.contributor.authorLee, Won Jun-
dc.contributor.authorKim, Seon Wook-
dc.date.accessioned2022-12-12T00:40:57Z-
dc.date.available2022-12-12T00:40:57Z-
dc.date.created2022-12-08-
dc.date.issued2022-
dc.identifier.issn2169-3536-
dc.identifier.urihttps://scholar.korea.ac.kr/handle/2021.sw.korea/147124-
dc.description.abstractProcessing-in-Memory (PIM) has been actively studied to overcome the memory bottleneck by placing computing units near or in memory, especially for efficiently processing low locality data-intensive applications. We can categorize the in-DRAM PIMs depending on how many banks perform the PIM computation by one DRAM command: per-bank and all-bank. The per-bank PIM operates only one bank, delivering low performance but preserving the standard DRAM interface and servicing non-PIM requests during PIM execution. The all-bank PIM operates all banks, achieving high performance but accompanying design issues like thermal and power consumption. We introduce the memory-computation decoupling execution to achieve the ideal all-bank PIM performance while preserving the standard JEDEC DRAM interface, i.e., performing the per-bank execution, thus easily adapted to commercial platforms. We divide the PIM execution into two phases: memory and computation phases. At the memory phase, we read the bank-private operands from a bank and store them in PIM engines' registers bank-by-bank. At the computation phase, we decouple the PIM engine from the memory array and broadcast a bank-shared operand using a standard read/write command to make all banks perform the computation simultaneously, thus reaching the computing throughput of the all-bank PIM. For extending the computation phase, i.e., maximizing all-bank execution opportunity, we introduce a compiler analysis and code generation technique to identify the bank-private and the bank-shared operands. We compared the performance of Level-2/3 BLAS, multi-batch LSTM-based Seq2Seq model, and BERT on our decoupled PIM with commercial computing platforms. In Level-3 BLAS, we achieved speedups of 75.8x , 1.2x, and 4.7x compared to CPU, GPU, and the per-bank PIM and up to 91.4% of the ideal all-bank PIM performance. Furthermore, our decoupled PIM consumed less energy than GPU and the per-bank PIM by 72.0% and 78.4%, but 7.4%, a little more than the ideal all-bank PIM.-
dc.languageEnglish-
dc.language.isoen-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.subjectDEEP NEURAL-NETWORKS-
dc.subjectACCELERATOR-
dc.subjectLATENCY-
dc.subjectV2-
dc.titleAchieving the Performance of All-Bank In-DRAM PIM With Standard Memory Interface: Memory-Computation Decoupling-
dc.typeArticle-
dc.contributor.affiliatedAuthorKim, Seon Wook-
dc.identifier.doi10.1109/ACCESS.2022.3203051-
dc.identifier.scopusid2-s2.0-85137583844-
dc.identifier.wosid000873917300001-
dc.identifier.bibliographicCitationIEEE ACCESS, v.10, pp.93256 - 93272-
dc.relation.isPartOfIEEE ACCESS-
dc.citation.titleIEEE ACCESS-
dc.citation.volume10-
dc.citation.startPage93256-
dc.citation.endPage93272-
dc.type.rimsART-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaTelecommunications-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.relation.journalWebOfScienceCategoryTelecommunications-
dc.subject.keywordPlusDEEP NEURAL-NETWORKS-
dc.subject.keywordPlusACCELERATOR-
dc.subject.keywordPlusLATENCY-
dc.subject.keywordPlusV2-
dc.subject.keywordAuthorMemory-computation decoupling-
dc.subject.keywordAuthorin-memory processing-
dc.subject.keywordAuthorstandard memory interface-
dc.subject.keywordAuthorall-bank execution-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Engineering > School of Electrical Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Seon Wook photo

Kim, Seon Wook
College of Engineering (School of Electrical Engineering)
Read more

Altmetrics

Total Views & Downloads

BROWSE