SSFile: A novel column-store for efficient data analysis in Hadoop-based distributed systems
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Son, Jihoon | - |
dc.contributor.author | Ryu, Hyoseok | - |
dc.contributor.author | Yi, Sungmin | - |
dc.contributor.author | Chung, Yon Dohn | - |
dc.date.accessioned | 2021-09-04T12:35:05Z | - |
dc.date.available | 2021-09-04T12:35:05Z | - |
dc.date.created | 2021-06-18 | - |
dc.date.issued | 2015-09-20 | - |
dc.identifier.issn | 0020-0255 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/92458 | - |
dc.description.abstract | Recently, large-scale relational data analysis has gained much attention. Several Hadoop-based distributed systems have been proposed for scalable relational data analysis. Because the column-store approach is very suitable for analytic queries, many studies on column-oriented storage and query processing for Hadoop-based distributed systems have been conducted. However, two problems have arisen in existing studies, the first of which is that only a small amount of data is processed per task during distributed processing. Each task reads only the necessary data using the columnar structure. Because the task initialization in Hadoop requires a large overhead, it is inefficient that each task processes a small amount of data. The second problem is the lack of support for techniques that optimize columnar execution. Although many such techniques have been proposed for efficient columnar query execution, existing column-store methods for Hadoop-based distributed systems cannot support them efficiently. In this paper, we propose a novel column-store method called SSFile for Hadoop-based distributed systems. SSFile increases the actual amount of data processed per task and supports representative columnar execution techniques for efficient query processing. Through extensive experiments, we show that SSFile significantly improves the performance of distributed processing. (C) 2015 Elsevier Inc. All rights reserved. | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | ELSEVIER SCIENCE INC | - |
dc.subject | DATA PLACEMENT | - |
dc.subject | MAPREDUCE | - |
dc.title | SSFile: A novel column-store for efficient data analysis in Hadoop-based distributed systems | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Chung, Yon Dohn | - |
dc.identifier.doi | 10.1016/j.ins.2015.04.014 | - |
dc.identifier.scopusid | 2-s2.0-84930060594 | - |
dc.identifier.wosid | 000356732600005 | - |
dc.identifier.bibliographicCitation | INFORMATION SCIENCES, v.316, pp.68 - 86 | - |
dc.relation.isPartOf | INFORMATION SCIENCES | - |
dc.citation.title | INFORMATION SCIENCES | - |
dc.citation.volume | 316 | - |
dc.citation.startPage | 68 | - |
dc.citation.endPage | 86 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.subject.keywordPlus | DATA PLACEMENT | - |
dc.subject.keywordPlus | MAPREDUCE | - |
dc.subject.keywordAuthor | Column-store | - |
dc.subject.keywordAuthor | Hadoop | - |
dc.subject.keywordAuthor | HDFS | - |
dc.subject.keywordAuthor | Relational data analysis | - |
dc.subject.keywordAuthor | Distributed systems | - |
dc.subject.keywordAuthor | Server clusters | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.