Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification

Li, Honglan; Joh, Yoon Sung; Kim, Hyunwoo; Paek, Eunok; Lee, Sang-Won; Hwang, Kyu-Baek

doi:10.1186/s12864-016-3327-5

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification

Full metadata record

DC Field	Value	Language
dc.contributor.author	Li, Honglan	-
dc.contributor.author	Joh, Yoon Sung	-
dc.contributor.author	Kim, Hyunwoo	-
dc.contributor.author	Paek, Eunok	-
dc.contributor.author	Lee, Sang-Won	-
dc.contributor.author	Hwang, Kyu-Baek	-
dc.date.accessioned	2021-09-03T15:36:56Z	-
dc.date.available	2021-09-03T15:36:56Z	-
dc.date.created	2021-06-16	-
dc.date.issued	2016-12-22	-
dc.identifier.issn	1471-2164	-
dc.identifier.uri	https://scholar.korea.ac.kr/handle/2021.sw.korea/86504	-
dc.description.abstract	Background: Proteogenomics is a promising approach for various tasks ranging from gene annotation to cancer research. Databases for proteogenomic searches are often constructed by adding peptide sequences inferred from genomic or transcriptomic evidence to reference protein sequences. Such inflation of databases has potential of identifying novel peptides. However, it also raises concerns on sensitive and reliable peptide identification. Spurious peptides included in target databases may result in underestimated false discovery rate (FDR). On the other hand, inflation of decoy databases could decrease the sensitivity of peptide identification due to the increased number of high-scoring random hits. Although several studies have addressed these issues, widely applicable guidelines for sensitive and reliable proteogenomic search have hardly been available. Results: To systematically evaluate the effect of database inflation in proteogenomic searches, we constructed a variety of real and simulated proteogenomic databases for yeast and human tandem mass spectrometry (MS/MS) data, respectively. Against these databases, we tested two popular database search tools with various approaches to search result validation: the target-decoy search strategy (with and without a refined scoring-metric) and a mixture model-based method. The effect of separate filtering of known and novel peptides was also examined. The results from real and simulated proteogenomic searches confirmed that separate filtering increases the sensitivity and reliability in proteogenomic search. However, no one method consistently identified the largest (or the smallest) number of novel peptides from real proteogenomic searches. Conclusions: We propose to use a set of search result validation methods with separate filtering, for sensitive and reliable identification of peptides in proteogenomic search.	-
dc.language	English	-
dc.language.iso	en	-
dc.publisher	BIOMED CENTRAL LTD	-
dc.subject	MASS-SPECTROMETRY	-
dc.subject	GENOME ANNOTATION	-
dc.subject	RNA-SEQ	-
dc.subject	STATISTICAL-MODEL	-
dc.subject	TARGET-DECOY	-
dc.subject	STRATEGIES	-
dc.subject	PROTEOMICS	-
dc.subject	DISCOVERY	-
dc.subject	CANCER	-
dc.subject	MS/MS	-
dc.title	Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Lee, Sang-Won	-
dc.identifier.doi	10.1186/s12864-016-3327-5	-
dc.identifier.scopusid	2-s2.0-85006700540	-
dc.identifier.wosid	000393278000009	-
dc.identifier.bibliographicCitation	BMC GENOMICS, v.17	-
dc.relation.isPartOf	BMC GENOMICS	-
dc.citation.title	BMC GENOMICS	-
dc.citation.volume	17	-
dc.type.rims	ART	-
dc.type.docType	Article; Proceedings Paper	-
dc.description.journalClass	1	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Biotechnology & Applied Microbiology	-
dc.relation.journalResearchArea	Genetics & Heredity	-
dc.relation.journalWebOfScienceCategory	Biotechnology & Applied Microbiology	-
dc.relation.journalWebOfScienceCategory	Genetics & Heredity	-
dc.subject.keywordPlus	MASS-SPECTROMETRY	-
dc.subject.keywordPlus	GENOME ANNOTATION	-
dc.subject.keywordPlus	RNA-SEQ	-
dc.subject.keywordPlus	STATISTICAL-MODEL	-
dc.subject.keywordPlus	TARGET-DECOY	-
dc.subject.keywordPlus	STRATEGIES	-
dc.subject.keywordPlus	PROTEOMICS	-
dc.subject.keywordPlus	DISCOVERY	-
dc.subject.keywordPlus	CANCER	-
dc.subject.keywordPlus	MS/MS	-
dc.subject.keywordAuthor	False discovery rate	-
dc.subject.keywordAuthor	Proteogenomic search	-
dc.subject.keywordAuthor	Separate false discovery rate analysis	-
dc.subject.keywordAuthor	Simulation	-
dc.subject.keywordAuthor	Target-decoy approach	-
dc.subject.keywordAuthor	Model-based approach	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Science > Department of Chemistry > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher LEE, Sang Won photo

LEE, Sang Won: 이과대학 (화학과)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,685,578; Today View :17,007

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE