Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network

Kim, Gyeongmin; Lee, Chanhee; Jo, Jaechoon; Lim, Heuiseok

doi:10.1007/s13042-020-01122-6

Detailed Information

Cited 1 time in webofscience

Cited 3 time in scopus

Metadata Downloads

Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Gyeongmin	-
dc.contributor.author	Lee, Chanhee	-
dc.contributor.author	Jo, Jaechoon	-
dc.contributor.author	Lim, Heuiseok	-
dc.date.accessioned	2021-08-30T13:02:07Z	-
dc.date.available	2021-08-30T13:02:07Z	-
dc.date.created	2021-06-18	-
dc.date.issued	2020-10	-
dc.identifier.issn	1868-8071	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/52665	-
dc.description.abstract	Countless cyber threat intelligence (CTI) reports are used by companies around the world on a daily basis for security reasons. To secure critical cybersecurity information, analysts and individuals should accordingly analyze information on threats and vulnerabilities. However, analyzing such overwhelming volumes of reports requires considerable time and effort. In this study, we propose a novel approach that automatically extracts core information from CTI reports using a named entity recognition (NER) system. During the process of constructing our proposed NER system, we defined meaningful keywords in the security domain as entities, including malware, domain/URL, IP address, Hash, and Common Vulnerabilities and Exposures. Furthermore, we linked these keywords with the words extracted from the text data of the report. To achieve a higher performance, we utilized the character-level feature vector as an input to bidirectional long-short-term memory using a conditional random field network. We finally achieved an average F1-score of 75.05%. We release 498,000 tag datasets created during our research.	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	SPRINGER HEIDELBERG	-
dc.title	Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Lim, Heuiseok	-
dc.identifier.doi	10.1007/s13042-020-01122-6	-
dc.identifier.scopusid	2-s2.0-85085090748	-
dc.identifier.wosid	000530236500001	-
dc.identifier.bibliographicCitation	INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, v.11, no.10, pp.2341 - 2355	-
dc.relation.isPartOf	INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS	-
dc.citation.title	INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS	-
dc.citation.volume	11	-
dc.citation.number	10	-
dc.citation.startPage	2341	-
dc.citation.endPage	2355	-
dc.type.rims	ART	-
dc.type.docType	Article	-
dc.description.journalClass	1	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.subject.keywordAuthor	Cybersecurity	-
dc.subject.keywordAuthor	Vulnerability	-
dc.subject.keywordAuthor	Cyber threat intelligence	-
dc.subject.keywordAuthor	Named entity recognition	-
dc.subject.keywordAuthor	Bidirectional long-short-term memory conditional random field	-

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,288,155; Today View :810

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE