Characterizing water quality and quantity profiles with poor quality data in a machine learning algorithm
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kim, Zhonghyun | - |
dc.contributor.author | Jeong, Heewon | - |
dc.contributor.author | Shin, Sora | - |
dc.contributor.author | Jung, Jinho | - |
dc.contributor.author | Kim, Joon Ha | - |
dc.contributor.author | Ki, Seo Jin | - |
dc.date.accessioned | 2021-08-31T04:58:30Z | - |
dc.date.available | 2021-08-31T04:58:30Z | - |
dc.date.created | 2021-06-18 | - |
dc.date.issued | 2020-04 | - |
dc.identifier.issn | 1944-3994 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/56855 | - |
dc.description.abstract | Statistical analyses are often subject to misinterpretation due to poor data quality which is inaccurate, incomplete, or unavailable. This study describes how incomplete data diminishes the screening accuracy of water pollution hotspots using a self-organizing map (SOM), a popular algorithm in reducing the dimension of complex data in a nonlinear fashion. A full data set consisting of 12 water quality and quantity parameters monitored monthly over 3 years at the Yeongsan River in Korea was provided to SOM as a reference input. For purposes of comparison, SOM was further allowed to accept three incomplete data sets in terms of variable availability as well as data loss for single and multiple parameters and different pollution levels. We found that data loss of either single or multiple parameters exceeding 15% of the entire data set led to significant changes in spatial and temporal patterns of the original data. However, the variables intentionally unavailable in the given data set affected the screening performance of water pollution hotspots in SOM, to a less obvious extent, as long as the percentage of missing data fell below 10%. The same applied to data loss with three pollution levels, from high through moderate to low concentrations of one important variable. Therefore, we recommend the use of multiple approaches that couple dimensionality reduction algorithms with reasonable imputation methods for the data set with a high percentage (e.g. above 15%) of missing values. | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | DESALINATION PUBL | - |
dc.subject | SELF-ORGANIZING MAPS | - |
dc.subject | POLLUTION | - |
dc.subject | NETWORK | - |
dc.title | Characterizing water quality and quantity profiles with poor quality data in a machine learning algorithm | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Jung, Jinho | - |
dc.identifier.doi | 10.5004/dwt.2020.25481 | - |
dc.identifier.scopusid | 2-s2.0-85098757684 | - |
dc.identifier.wosid | 000545015800013 | - |
dc.identifier.bibliographicCitation | DESALINATION AND WATER TREATMENT, v.182, pp.127 - 134 | - |
dc.relation.isPartOf | DESALINATION AND WATER TREATMENT | - |
dc.citation.title | DESALINATION AND WATER TREATMENT | - |
dc.citation.volume | 182 | - |
dc.citation.startPage | 127 | - |
dc.citation.endPage | 134 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalResearchArea | Water Resources | - |
dc.relation.journalWebOfScienceCategory | Engineering, Chemical | - |
dc.relation.journalWebOfScienceCategory | Water Resources | - |
dc.subject.keywordPlus | SELF-ORGANIZING MAPS | - |
dc.subject.keywordPlus | POLLUTION | - |
dc.subject.keywordPlus | NETWORK | - |
dc.subject.keywordAuthor | Non-linear data analysis | - |
dc.subject.keywordAuthor | Dimensionality reduction | - |
dc.subject.keywordAuthor | Water quality data | - |
dc.subject.keywordAuthor | Pollution hotspots | - |
dc.subject.keywordAuthor | Incomplete data | - |
dc.subject.keywordAuthor | Self-organizing map | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea+82-2-3290-2963
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.