hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Why neural networks should not be used for HIV-1 protease cleavage site prediction
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Intelligent systems (IS-lab).ORCID iD: 0000-0001-5163-2997
Lund University, Department of Theoretical Physics, Lund, Sweden.
2004 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 20, no 11, p. 1702-1709Article in journal (Refereed) Published
Abstract [en]

Several papers have been published where non-linear machine learning algorithms, e.g. artificial neural networks, support vector machines and decision trees, have been used to model the specificity of the HIV-1 protease and extract specificity rules. We show that the dataset used in these studies is linearly separable and that it is a misuse of nonlinear classifiers to apply them to this problem. The best solution on this dataset is achieved using a linear classifier like the simple perceptron or the linear support vector machine, and it is straightforward to extract rules from these linear models. We identify key residues in peptides that are efficiently cleaved by the HIV-1 protease and list the most prominent rules, relating them to experimental results for the HIV-1 protease. Motivation: Understanding HIV-1 protease specificity is important when designing HIV inhibitors and several different machine learning algorithms have been applied to the problem. However, little progress has been made in understanding the specificity because nonlinear and overly complex models have been used. Results: We show that the problem is much easier than what has previously been reported and that linear classifiers like the simple perceptron or linear support vector machines are at least as good predictors as nonlinear algorithms. We also show how sets of specificity rules can be generated from the resulting linear classifiers.

Place, publisher, year, edition, pages
Oxford: Oxford University Press, 2004. Vol. 20, no 11, p. 1702-1709
Keywords [en]
Neural networks
National Category
Biochemistry Molecular Biology
Identifiers
URN: urn:nbn:se:hh:diva-226DOI: 10.1093/bioinformatics/bth144ISI: 000223142000006PubMedID: 14988129Scopus ID: 2-s2.0-4444262024Local ID: 2082/521OAI: oai:DiVA.org:hh-226DiVA, id: diva2:237404
Available from: 2006-11-24 Created: 2006-11-24 Last updated: 2025-02-20Bibliographically approved
In thesis
1. Computational prediction models for proteolytic cleavage and epitope identification
Open this publication in new window or tab >>Computational prediction models for proteolytic cleavage and epitope identification
2007 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The biological functions of proteins depend on their physical interactions with other molecules, such as proteins and peptides. Therefore, modeling the protein-ligand interactions is important for understanding protein functions in different biological processes. We have focused on the cleavage specificities of HIV-1 protease, HCV NS3 protease and caspases on short oligopeptides or in native proteins; the binding affinity of MHC molecules with short oligopeptides and identification of T cell epitopes. we expect that our findings on HIV-1 protease, HCV NS3 protease and caspases generalize to other proteases. In this thesis, we have performed analysis on these interactions from different perspectives - we have extended and collected new substrate data sets; used and compared different prediction methods (e.g. linear support vector machines, neural networks, OSRE method, rough set theory and Gaussian processes) to understand the underlying interaction problems; suggested new methods (i.e. a hierarchical method and Gaussian processes with test reject method) to improve predictions; and extracted cleavage rules for protease cleavage specificities. From our studies, we have extended oligopeptide substrate data sets and collected native protein substrates for HIV-1 protease, and a new oligopeptide substrate data set for HCV protease. We have shown that all current HIV-1 protease oligopeptide substratde data sets and our HCV data set are linearly separable; for HIV-1 protease, size and hydrophobicity are two important physicochemical properties in the recognition of short oligopeptide substrates to the protease; and linear support vector mahine is the state-of-the-art for this protease cleavage prediction problem. Our hierarchical method combining protein secondary structure information and experimental short oligopeptide cleavage information an improve the prediction of HIV-1 protease cleavage sites in native proteins. Our rule extraction method provides simple an accurate cleavage rules with high fidelity for HIV-1 and HCV proteases. For MHC molecules, we showed that high binding affinities are not necessarily correlated to immunogenicity on HLA-restricted peptides. Our test reject method combined with Gaussian processes can simplify experimental design by reducing false positives for detecting potential epitopes in large pathogen genomes.

Place, publisher, year, edition, pages
Lund: Department of Theoretical Physics, Lund University, 2007. p. 84
Keywords
Binding affinity, Caspase, Cleavage predition, Cleavage specifictiy, Epitope, False positive, Gaussian process, HCV, Hierarchial method, HIV, Immunology, MHC, OSRE, Protease-peptide interaction, Rule extraction, Sequence analysis, SVM
National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:hh:diva-1981 (URN)2082/2376 (Local ID)978-91-628-7218-2 (ISBN)2082/2376 (Archive number)2082/2376 (OAI)
Available from: 2008-09-29 Created: 2008-09-29 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Rögnvaldsson, ThorsteinnYou, Liwen

Search in DiVA

By author/editor
Rögnvaldsson, ThorsteinnYou, Liwen
By organisation
Intelligent systems (IS-lab)
In the same journal
Bioinformatics
BiochemistryMolecular Biology

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 384 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf