Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Li, Honglan | - |
dc.contributor.author | Joh, Yoon Sung | - |
dc.contributor.author | Kim, Hyunwoo | - |
dc.contributor.author | Paek, Eunok | - |
dc.contributor.author | Lee, Sang-Won | - |
dc.contributor.author | Hwang, Kyu-Baek | - |
dc.date.accessioned | 2021-09-03T15:36:56Z | - |
dc.date.available | 2021-09-03T15:36:56Z | - |
dc.date.created | 2021-06-16 | - |
dc.date.issued | 2016-12-22 | - |
dc.identifier.issn | 1471-2164 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/86504 | - |
dc.description.abstract | Background: Proteogenomics is a promising approach for various tasks ranging from gene annotation to cancer research. Databases for proteogenomic searches are often constructed by adding peptide sequences inferred from genomic or transcriptomic evidence to reference protein sequences. Such inflation of databases has potential of identifying novel peptides. However, it also raises concerns on sensitive and reliable peptide identification. Spurious peptides included in target databases may result in underestimated false discovery rate (FDR). On the other hand, inflation of decoy databases could decrease the sensitivity of peptide identification due to the increased number of high-scoring random hits. Although several studies have addressed these issues, widely applicable guidelines for sensitive and reliable proteogenomic search have hardly been available. Results: To systematically evaluate the effect of database inflation in proteogenomic searches, we constructed a variety of real and simulated proteogenomic databases for yeast and human tandem mass spectrometry (MS/MS) data, respectively. Against these databases, we tested two popular database search tools with various approaches to search result validation: the target-decoy search strategy (with and without a refined scoring-metric) and a mixture model-based method. The effect of separate filtering of known and novel peptides was also examined. The results from real and simulated proteogenomic searches confirmed that separate filtering increases the sensitivity and reliability in proteogenomic search. However, no one method consistently identified the largest (or the smallest) number of novel peptides from real proteogenomic searches. Conclusions: We propose to use a set of search result validation methods with separate filtering, for sensitive and reliable identification of peptides in proteogenomic search. | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | BIOMED CENTRAL LTD | - |
dc.subject | MASS-SPECTROMETRY | - |
dc.subject | GENOME ANNOTATION | - |
dc.subject | RNA-SEQ | - |
dc.subject | STATISTICAL-MODEL | - |
dc.subject | TARGET-DECOY | - |
dc.subject | STRATEGIES | - |
dc.subject | PROTEOMICS | - |
dc.subject | DISCOVERY | - |
dc.subject | CANCER | - |
dc.subject | MS/MS | - |
dc.title | Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Lee, Sang-Won | - |
dc.identifier.doi | 10.1186/s12864-016-3327-5 | - |
dc.identifier.scopusid | 2-s2.0-85006700540 | - |
dc.identifier.wosid | 000393278000009 | - |
dc.identifier.bibliographicCitation | BMC GENOMICS, v.17 | - |
dc.relation.isPartOf | BMC GENOMICS | - |
dc.citation.title | BMC GENOMICS | - |
dc.citation.volume | 17 | - |
dc.type.rims | ART | - |
dc.type.docType | Article; Proceedings Paper | - |
dc.description.journalClass | 1 | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Biotechnology & Applied Microbiology | - |
dc.relation.journalResearchArea | Genetics & Heredity | - |
dc.relation.journalWebOfScienceCategory | Biotechnology & Applied Microbiology | - |
dc.relation.journalWebOfScienceCategory | Genetics & Heredity | - |
dc.subject.keywordPlus | MASS-SPECTROMETRY | - |
dc.subject.keywordPlus | GENOME ANNOTATION | - |
dc.subject.keywordPlus | RNA-SEQ | - |
dc.subject.keywordPlus | STATISTICAL-MODEL | - |
dc.subject.keywordPlus | TARGET-DECOY | - |
dc.subject.keywordPlus | STRATEGIES | - |
dc.subject.keywordPlus | PROTEOMICS | - |
dc.subject.keywordPlus | DISCOVERY | - |
dc.subject.keywordPlus | CANCER | - |
dc.subject.keywordPlus | MS/MS | - |
dc.subject.keywordAuthor | False discovery rate | - |
dc.subject.keywordAuthor | Proteogenomic search | - |
dc.subject.keywordAuthor | Separate false discovery rate analysis | - |
dc.subject.keywordAuthor | Simulation | - |
dc.subject.keywordAuthor | Target-decoy approach | - |
dc.subject.keywordAuthor | Model-based approach | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.