Adaptive GDDA-BLAST: Fast and Efficient Algorithm for Protein Sequence Embedding

Hong, Yoojin; Kang, Jaewoo; Lee, Dongwon; van Rossum, Damian B.

doi:10.1371/journal.pone.0013596

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Adaptive GDDA-BLAST: Fast and Efficient Algorithm for Protein Sequence Embedding

Authors: Hong, Yoojin; Kang, Jaewoo; Lee, Dongwon; van Rossum, Damian B.

Issue Date: 22-10월-2010

Publisher: PUBLIC LIBRARY SCIENCE

Citation: PLOS ONE, v.5, no.10

Indexed: SCIE
SCOPUS

Journal Title: PLOS ONE

Volume: 5

Number: 10

URI: https://scholar.korea.ac.kr/handle/2021.sw.korea/115496

DOI: 10.1371/journal.pone.0013596

ISSN: 1932-6203

Abstract: A major computational challenge in the genomic era is annotating structure/function to the vast quantities of sequence information that is now available. This problem is illustrated by the fact that most proteins lack comprehensive annotations, even when experimental evidence exists. We previously theorized that embedded-alignment profiles (simply "alignment profiles" hereafter) provide a quantitative method that is capable of relating the structural and functional properties of proteins, as well as their evolutionary relationships. A key feature of alignment profiles lies in the interoperability of data format (e.g., alignment information, physio-chemical information, genomic information, etc.). Indeed, we have demonstrated that the Position Specific Scoring Matrices (PSSMs) are an informative M-dimension that is scored by quantitatively measuring the embedded or unmodified sequence alignments. Moreover, the information obtained from these alignments is informative, and remains so even in the "twilight zone" of sequence similarity (<25% identity) [1-5]. Although our previous embedding strategy was powerful, it suffered from contaminating alignments (embedded AND unmodified) and high computational costs. Herein, we describe the logic and algorithmic process for a heuristic embedding strategy named "Adaptive GDDA-BLAST." Adaptive GDDA-BLAST is, on average, up to 19 times faster than, but has similar sensitivity to our previous method. Further, data are provided to demonstrate the benefits of embedded-alignment measurements in terms of detecting structural homology in highly divergent protein sequences and isolating secondary structural elements of transmembrane and ankyrin-repeat domains. Together, these advances allow further exploration of the embedded alignment data space within sufficiently large data sets to eventually induce relevant statistical inferences. We show that sequence embedding could serve as one of the vehicles for measurement of low-identity alignments and for incorporation thereof into high-performance PSSM-based alignment profiles.

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School > Department of Computer Science and Engineering > 1. Journal Articles; College of Education > Department of Computer Science Education > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Kang, Jae woo photo

Kang, Jae woo: 컴퓨터학과

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,673,166; Today View :4,593

RSS_1.0 RSS_2.0 ATOM_1.0

(02841) 서울특별시 성북구 안암로 14502-3290-1114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE