Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Examining the impact of data-access cost on XML twig pattern matching

Authors
Lee, SangKeunRyu, Byung-GulWu, Kun-Lung
Issue Date
25-Oct-2012
Publisher
ELSEVIER SCIENCE INC
Keywords
XML twig pattern matching; Labeling scheme; Branching node stream; Leaf node stream; Pointer structure
Citation
INFORMATION SCIENCES, v.203, pp.24 - 43
Indexed
SCIE
SCOPUS
Journal Title
INFORMATION SCIENCES
Volume
203
Start Page
24
End Page
43
URI
https://scholar.korea.ac.kr/handle/2021.sw.korea/107187
DOI
10.1016/j.ins.2012.03.011
ISSN
0020-0255
Abstract
To process a large size of XML document, data-access time dominates the whole system performance in most cases. However, few techniques exist today that optimize the data-access cost of performing twig pattern matching. TJFast [18] is one of the few that do. TJFast could reduce the number of elements scanned by deriving all the element names along the path from the root to the element with the extended Dewey label of an element alone. However, there is still much room for improvement. We empirically observe that (1) many irrelevant elements can still be accessed and processed by TJFast, unnecessarily incurring both data-access and computation overhead, and (2) there still exists substantial redundant label-to-element name decoding, needlessly increasing processing cost. In this paper, we present TJFast-BNS, an optimization of TJFast, to further reduce the data-access cost of twig pattern matching. TJFast-BNS efficiently identifies and filters out many irrelevant elements by introducing a new labeling scheme, termed E2Dewey. and a novel pointer structure. E2Dewey includes the total number of children of an element in the element's label. This is used to quickly identify unnecessary paths. The pointer structure to the descendants of a branching element supports random access to leaf and non-top branching elements. Extensive performance studies on various datasets clearly show that our approach accesses much fewer elements to process a twig query than others, leading to a superior performance gain in execution time. (C) 2012 Elsevier Inc. All rights reserved.
Files in This Item
There are no files associated with this item.
Appears in
Collections
Graduate School > Department of Artificial Intelligence > 1. Journal Articles
College of Informatics > Department of Computer Science and Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher LEE, Sang Keun photo

LEE, Sang Keun
Department of Artificial Intelligence
Read more

Altmetrics

Total Views & Downloads

BROWSE