Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

An MLP-based feature subset selection for HIV-1 protease cleavage site analysis

Authors
Kim, GilhanKim, YeonjooLim, HeuiseokKim, Hyeoncheol
Issue Date
2월-2010
Publisher
ELSEVIER SCIENCE BV
Keywords
Feature selection; Multi-layered perceptron; HIV-1 protease cleavage site prediction; Dimension reduction
Citation
ARTIFICIAL INTELLIGENCE IN MEDICINE, v.48, no.2-3, pp.83 - 89
Indexed
SCIE
SCOPUS
Journal Title
ARTIFICIAL INTELLIGENCE IN MEDICINE
Volume
48
Number
2-3
Start Page
83
End Page
89
URI
https://scholar.korea.ac.kr/handle/2021.sw.korea/117014
DOI
10.1016/j.artmed.2009.07.010
ISSN
0933-3657
Abstract
Objective: In recent years, several machine learning approaches have been applied to modeling the specificity of the human immunodeficiency virus type 1 (HIV-1) protease cleavage domain. However, the high dimensional domain dataset contains a small number of samples, which could misguide classification modeling and its interpretation. Appropriate feature selection can alleviate the problem by eliminating irrelevant and redundant features, and thus improve prediction performance. Methods: We introduce a new feature subset selection method, FS-MLP, that selects relevant features using multi-layered perceptron (MLP) learning. The method includes MLP learning with a training dataset and then feature subset selection using decompositional approach to analyze the trained MLP. Our method is able to select a subset of relevant features in high dimensional, multi-variate and non-linear domains. Results: Using five artificial datasets that represent four data types, we verified the FS-MLP performance with seven other feature selection methods. Experimental results showed that the FS-MLP is superior at high dimensional, multi-variate and non-linear domains. In experiments with HIV-1 protease cleavage dataset, the FS-MLP selected a set of 14 highly relevant features among 160 original features. On a validation set of 131 test instances, classifiers that used the 14 features showed about 95% accuracy which outperformed other seven methods in terms of accuracy and the number of features. Conclusions: Our experimental results indicate that the FS-MLP is effective in analyzing multi-variate, non-linear and high dimensional datasets such as HIV-1 protease cleavage dataset. The 14 relevant features which were selected by the FS-MLP provide us with useful insights into the HIV-1 cleavage site domain as well. The FS-MLP is a useful method for computational sequence analysis in general. (C) 2009 Elsevier B.V. All rights reserved.
Files in This Item
There are no files associated with this item.
Appears in
Collections
Graduate School > Department of Computer Science and Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Hyeon cheol photo

Kim, Hyeon cheol
컴퓨터학과
Read more

Altmetrics

Total Views & Downloads

BROWSE