Towards robust explanations for deep neural networks
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Dombrowski, Ann-Kathrin | - |
dc.contributor.author | Anders, Christopher J. | - |
dc.contributor.author | Mueller, Klaus-Robert | - |
dc.contributor.author | Kessel, Pan | - |
dc.date.accessioned | 2022-02-23T03:41:25Z | - |
dc.date.available | 2022-02-23T03:41:25Z | - |
dc.date.created | 2022-02-11 | - |
dc.date.issued | 2022-01 | - |
dc.identifier.issn | 0031-3203 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/136578 | - |
dc.description.abstract | Explanation methods shed light on the decision process of black-box classifiers such as deep neural networks. But their usefulness can be compromised because they are susceptible to manipulations. With this work, we aim to enhance the resilience of explanations. We develop a unified theoretical framework for deriving bounds on the maximal manipulability of a model. Based on these theoretical insights, we present three different techniques to boost robustness against manipulation: training with weight decay, smoothing activation functions, and minimizing the Hessian of the network. Our experimental results confirm the effectiveness of these approaches. (c) 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ) | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | ELSEVIER SCI LTD | - |
dc.title | Towards robust explanations for deep neural networks | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Mueller, Klaus-Robert | - |
dc.identifier.doi | 10.1016/j.patcog.2021.108194 | - |
dc.identifier.scopusid | 2-s2.0-85112531912 | - |
dc.identifier.wosid | 000701175900010 | - |
dc.identifier.bibliographicCitation | PATTERN RECOGNITION, v.121 | - |
dc.relation.isPartOf | PATTERN RECOGNITION | - |
dc.citation.title | PATTERN RECOGNITION | - |
dc.citation.volume | 121 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.subject.keywordAuthor | Explanation method | - |
dc.subject.keywordAuthor | Saliency map | - |
dc.subject.keywordAuthor | Adversarial attacks | - |
dc.subject.keywordAuthor | Manipulation | - |
dc.subject.keywordAuthor | Neural networks | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.