Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design

Kim, Nahsung; Shin, Dongyeob; Choi, Wonseok; Kim, Geonho; Park, Jongsun

doi:10.1109/TNNLS.2020.3008996

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Nahsung	-
dc.contributor.author	Shin, Dongyeob	-
dc.contributor.author	Choi, Wonseok	-
dc.contributor.author	Kim, Geonho	-
dc.contributor.author	Park, Jongsun	-
dc.date.accessioned	2021-11-17T18:40:26Z	-
dc.date.available	2021-11-17T18:40:26Z	-
dc.date.created	2021-08-30	-
dc.date.issued	2021-07	-
dc.identifier.issn	2162-237X	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/127783	-
dc.description.abstract	For successful deployment of deep neural networks (DNNs) on resource-constrained devices, retraining-based quantization has been widely adopted to reduce the number of DRAM accesses. By properly setting training parameters, such as batch size and learning rate, bit widths of both weights and activations can be uniformly quantized down to 4 bit while maintaining full precision accuracy. In this article, we present a retraining-based mixed-precision quantization approach and its customized DNN accelerator to achieve high energy efficiency. In the proposed quantization, in the middle of retraining, an additional bit (extra quantization level) is assigned to the weights that have shown frequent switching between two contiguous quantization levels since it means that both quantization levels cannot help to reduce quantization loss. We also mitigate the gradient noise that occurs in the retraining process by taking a lower learning rate near the quantization threshold. For the proposed novel mixed-precision quantized network (MPQ-network), we have implemented a customized accelerator using a 65-nm CMOS process. In the accelerator, the proposed processing elements (PEs) can be dynamically reconfigured to process variable bit widths from 2 to 4 bit for both weights and activations. The numerical results show that the proposed quantization can achieve 1.37x better compression ratio for VGG-9 using CIFAR-10 data set compared with a uniform 4-bit (both weights and activations) model without loss of classification accuracy. The proposed accelerator also shows 1.29x of energy savings for VGG-9 using the CIFAR-10 data set over the state-of-the-art accelerator.	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Park, Jongsun	-
dc.identifier.doi	10.1109/TNNLS.2020.3008996	-
dc.identifier.scopusid	2-s2.0-85111951645	-
dc.identifier.wosid	000670541500011	-
dc.identifier.bibliographicCitation	IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, v.32, no.7, pp.2925 - 2938	-
dc.relation.isPartOf	IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS	-
dc.citation.title	IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS	-
dc.citation.volume	32	-
dc.citation.number	7	-
dc.citation.startPage	2925	-
dc.citation.endPage	2938	-
dc.type.rims	ART	-
dc.type.docType	Article	-
dc.description.journalClass	1	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Computer Science, Hardware & Architecture	-
dc.relation.journalWebOfScienceCategory	Computer Science, Theory & Methods	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.subject.keywordAuthor	Deep neural network (DNN) accelerator	-
dc.subject.keywordAuthor	energy-efficient accelerator	-
dc.subject.keywordAuthor	model compression	-
dc.subject.keywordAuthor	quantization	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > School of Electrical Engineering > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Park, Jong sun photo

Park, Jong sun: 공과대학 (전기전자공학부)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,859,002; Today View :21,326

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE