Clustering orthologous proteins across phylogenetically distant species
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kim, Sunshin | - |
dc.contributor.author | Kang, Jaewoo | - |
dc.contributor.author | Chung, Yong Je | - |
dc.contributor.author | Li, Jinyan | - |
dc.contributor.author | Ryu, Keun Ho | - |
dc.date.accessioned | 2021-09-09T08:25:24Z | - |
dc.date.available | 2021-09-09T08:25:24Z | - |
dc.date.created | 2021-06-10 | - |
dc.date.issued | 2008-05-15 | - |
dc.identifier.issn | 0887-3585 | - |
dc.identifier.uri | https://scholar.korea.ac.kr/handle/2021.sw.korea/123536 | - |
dc.description.abstract | The quality of orthologous protein clusters (OPCs) is largely dependent on the results of the reciprocal BLAST (basic local alignment search tool) hits among genomes. The BLAST algorithm is very efficient and fast, but it is very difficult to get optimal solution among phylogenetically distant species because the genomes with large evolutionary distance typically have low similarity in their protein sequences. To reduce the false positives in the OPCs, thresholding is often employed on the BLAST scores. However, the thresholding also eliminates large numbers of true positives as the ortho-logs from distant species likely have low BLAST scores. To rectify this problem, we introduce a new hybrid method combining the Recursive and the Markov CLuster (MCL) algorithms without using the BLAST thresholding. In the first step, we use InParanoid to produce n(n-1)/2 ortholog tables from n genomes. After combining all the tables into one, our clustering algorithm clusters ortholog pairs recursively in the table. Then, our method employs MCL algorithm to compute the clusters and refines the clusters by adjusting the inflation factor. We tested our method using six different genomes and evaluated the results by comparing against Kegg Orthology (KO) OPCs, which are generated from manually curated pathways. To quantify the accuracy of the results, we introduced a new intuitive similarity measure based on our Least-move algorithm that computes the consistency between two OPCs. We compared the resulting OPCs with the KO OPCs using this measure. We also evaluated the performance of our method using InParanoid as the baseline approach. The experimental results show that, at the inflation factor 1.3, we produced 54% more orthologs than InParanoid sacrificing a little less accuracy (1.7% less) than InParanoid, and at the factor 1.4, produced not only 15% more orthologs than InParanoid but also a higher accuracy (1.4% more) than InParanoid. | - |
dc.language | English | - |
dc.language.iso | en | - |
dc.publisher | WILEY-LISS | - |
dc.subject | COG DATABASE | - |
dc.subject | GENOMES | - |
dc.subject | EUKARYOTES | - |
dc.subject | ORTHOMCL | - |
dc.subject | FAMILIES | - |
dc.subject | NEMATODE | - |
dc.subject | GENES | - |
dc.subject | YEAST | - |
dc.subject | KEGG | - |
dc.subject | TOOL | - |
dc.title | Clustering orthologous proteins across phylogenetically distant species | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Kang, Jaewoo | - |
dc.identifier.doi | 10.1002/prot.21792 | - |
dc.identifier.scopusid | 2-s2.0-42449130541 | - |
dc.identifier.wosid | 000255269200006 | - |
dc.identifier.bibliographicCitation | PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, v.71, no.3, pp.1113 - 1122 | - |
dc.relation.isPartOf | PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS | - |
dc.citation.title | PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS | - |
dc.citation.volume | 71 | - |
dc.citation.number | 3 | - |
dc.citation.startPage | 1113 | - |
dc.citation.endPage | 1122 | - |
dc.type.rims | ART | - |
dc.type.docType | Article | - |
dc.description.journalClass | 1 | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Biochemistry & Molecular Biology | - |
dc.relation.journalResearchArea | Biophysics | - |
dc.relation.journalWebOfScienceCategory | Biochemistry & Molecular Biology | - |
dc.relation.journalWebOfScienceCategory | Biophysics | - |
dc.subject.keywordPlus | COG DATABASE | - |
dc.subject.keywordPlus | GENOMES | - |
dc.subject.keywordPlus | EUKARYOTES | - |
dc.subject.keywordPlus | ORTHOMCL | - |
dc.subject.keywordPlus | FAMILIES | - |
dc.subject.keywordPlus | NEMATODE | - |
dc.subject.keywordPlus | GENES | - |
dc.subject.keywordPlus | YEAST | - |
dc.subject.keywordPlus | KEGG | - |
dc.subject.keywordPlus | TOOL | - |
dc.subject.keywordAuthor | orthologs | - |
dc.subject.keywordAuthor | species | - |
dc.subject.keywordAuthor | genomes | - |
dc.subject.keywordAuthor | clustering | - |
dc.subject.keywordAuthor | BLAST | - |
dc.subject.keywordAuthor | proteins | - |
dc.subject.keywordAuthor | paralogs | - |
dc.subject.keywordAuthor | homologs | - |
dc.subject.keywordAuthor | database | - |
dc.subject.keywordAuthor | threshold | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(02841) 서울특별시 성북구 안암로 14502-3290-1114
COPYRIGHT © 2021 Korea University. All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.