AnoViT: Unsupervised Anomaly Detection and Localization With Vision Transformer-Based Encoder-Decoder

Lee, Yunseung; Kang, Pilsung

doi:10.1109/ACCESS.2022.3171559

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

AnoViT: Unsupervised Anomaly Detection and Localization With Vision Transformer-Based Encoder-Decoder

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Yunseung	-
dc.contributor.author	Kang, Pilsung	-
dc.date.accessioned	2022-06-12T16:40:19Z	-
dc.date.available	2022-06-12T16:40:19Z	-
dc.date.created	2022-06-09	-
dc.date.issued	2022	-
dc.identifier.issn	2169-3536	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/142155	-
dc.description.abstract	Image anomaly detection problems aim to determine whether an image is abnormal, and to detect anomalous areas. These methods are actively used in various fields such as manufacturing, medical care, and intelligent information. Encoder-decoder structures have been widely used in the field of anomaly detection because they can easily learn normal patterns in an unsupervised learning environment and calculate a score to identify abnormalities through a reconstruction error indicating the difference between input and reconstructed images. Therefore, current image anomaly detection methods have commonly used convolutional encoder-decoders to extract normal information through the local features of images. However, they are limited in that only local features of the image can be utilized when constructing a normal representation owing to the characteristics of convolution operations using a filter of fixed size. Therefore, we propose a vision transformer-based encoder-decoder model, named AnoViT, designed to reflect normal information by additionally learning the global relationship between image patches, which is capable of both image anomaly detection and localization. While existing vision transformers perform image classification using only a class token, the proposed approach constructs a feature map that maintains the existing location information of individual patches by using the embeddings of all patches passed through multiple self-attention layers. Subsequently, the feature map, which has been transformed into three dimensions, is used to perform decoding. This design preserves the spatial information sufficiently by excluding the fully-connected layer, which extracts latent vectors in existing convolution-based encoder-decoders. The proposed AnoViT model performed better than the convolution-based model on three benchmark datasets. In MVTecAD, which is a representative benchmark dataset for anomaly localization, it showed improved results on 10 out of 15 classes compared with the baseline. Furthermore, the proposed method showed good performance regardless of the class and type of the anomalous area when localization results were evaluated qualitatively.	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	AnoViT: Unsupervised Anomaly Detection and Localization With Vision Transformer-Based Encoder-Decoder	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Kang, Pilsung	-
dc.identifier.doi	10.1109/ACCESS.2022.3171559	-
dc.identifier.scopusid	2-s2.0-85129657863	-
dc.identifier.wosid	000791717500001	-
dc.identifier.bibliographicCitation	IEEE ACCESS, v.10, pp.46717 - 46724	-
dc.relation.isPartOf	IEEE ACCESS	-
dc.citation.title	IEEE ACCESS	-
dc.citation.volume	10	-
dc.citation.startPage	46717	-
dc.citation.endPage	46724	-
dc.type.rims	ART	-
dc.type.docType	Article	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Telecommunications	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalWebOfScienceCategory	Telecommunications	-
dc.subject.keywordAuthor	Anomaly detection	-
dc.subject.keywordAuthor	Image reconstruction	-
dc.subject.keywordAuthor	Transformers	-
dc.subject.keywordAuthor	Location awareness	-
dc.subject.keywordAuthor	Task analysis	-
dc.subject.keywordAuthor	Feature extraction	-
dc.subject.keywordAuthor	Decoding	-
dc.subject.keywordAuthor	Anomaly detection	-
dc.subject.keywordAuthor	anomaly localization	-
dc.subject.keywordAuthor	vision transformer	-
dc.subject.keywordAuthor	MVTecAD	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > School of Industrial and Management Engineering > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :7,087,245; Today View :19,256

RSS_1.0 RSS_2.0 ATOM_1.0

145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea+82-2-3290-2963

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE