SVM2Motif-Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Vidovic, Marina M. -C. | - |
dc.contributor.author | Goernitz, Nico | - |
dc.contributor.author | Mueller, Klaus-Robert | - |
dc.contributor.author | Raetsch, Gunnar | - |
dc.contributor.author | Kloft, Marius | - |
dc.date.accessioned | 2021-09-04T09:13:38Z | - |
dc.date.available | 2021-09-04T09:13:38Z | - |
dc.date.created | 2021-06-18 | - |
dc.date.issued | 2015-12-21 | - |
dc.identifier.issn | 1932-6203 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/91548 | - |
dc.description.abstract | Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but-due to its black-box character-motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k <= 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs-regardless of their length and complexity-underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | PUBLIC LIBRARY SCIENCE | - |
dc.title | SVM2Motif-Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Mueller, Klaus-Robert | - |
dc.identifier.doi | 10.1371/journal.pone.0144782 | - |
dc.identifier.scopusid | 2-s2.0-84956919555 | - |
dc.identifier.wosid | 000367092300017 | - |
dc.identifier.bibliographicCitation | PLOS ONE, v.10, no.12 | - |
dc.relation.isPartOf | PLOS ONE | - |
dc.citation.title | PLOS ONE | - |
dc.citation.volume | 10 | - |
dc.citation.number | 12 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Science & Technology - Other Topics | - |
dc.relation.journalWebOfScienceCategory | Multidisciplinary Sciences | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea+82-2-3290-2963
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.