Web robot detection based on pattern-matching technique
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kwon, Shinil | - |
dc.contributor.author | Kim, Young-Gab | - |
dc.contributor.author | Cha, Sungdeok | - |
dc.date.accessioned | 2021-09-06T21:46:53Z | - |
dc.date.available | 2021-09-06T21:46:53Z | - |
dc.date.created | 2021-06-18 | - |
dc.date.issued | 2012-04 | - |
dc.identifier.issn | 0165-5515 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/108839 | - |
dc.description.abstract | In web robot detection it is important is to find features that are common characteristics of diverse robots, in order to differentiate between them and humans. Existing approaches employ fairly simple features (e.g. empty referrer field, interval between successive requests), which often fail to reflect web robots' behaviour accurately. False alarms may therefore occur unacceptably often. In this paper we propose a fresh approach that expresses the behaviour of interactive users and various web robots in terms of a sequence of request types, called request patterns. Previous proposals have primarily targeted the detection of text crawlers, but our approach works well on many other web robots, such as image crawlers, email collectors and link checkers. In empirical evaluation of more than 1 billion requests collected at www.microsoft.com, our approach achieved 94% accuracy in web robot detection, estimated by F-measure. A decision tree algorithm proposed by Tan and Kumar was also applied to the same data. A comparison shows that the proposed approach is more accurate, and that real-time detection of web robots is feasible. | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | SAGE PUBLICATIONS LTD | - |
dc.subject | DISCOVERY | - |
dc.title | Web robot detection based on pattern-matching technique | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Kim, Young-Gab | - |
dc.contributor.affiliatedAuthor | Cha, Sungdeok | - |
dc.identifier.doi | 10.1177/0165551511435969 | - |
dc.identifier.scopusid | 2-s2.0-84861796095 | - |
dc.identifier.wosid | 000302629300002 | - |
dc.identifier.bibliographicCitation | JOURNAL OF INFORMATION SCIENCE, v.38, no.2, pp.118 - 126 | - |
dc.relation.isPartOf | JOURNAL OF INFORMATION SCIENCE | - |
dc.citation.title | JOURNAL OF INFORMATION SCIENCE | - |
dc.citation.volume | 38 | - |
dc.citation.number | 2 | - |
dc.citation.startPage | 118 | - |
dc.citation.endPage | 126 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | ssci | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Information Science & Library Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.relation.journalWebOfScienceCategory | Information Science & Library Science | - |
dc.subject.keywordPlus | DISCOVERY | - |
dc.subject.keywordAuthor | web robot detection | - |
dc.subject.keywordAuthor | web robot pattern | - |
dc.subject.keywordAuthor | human pattern | - |
dc.subject.keywordAuthor | pattern analysis | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.