hh.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Why neural networks should not be used for HIV-1 protease cleavage site prediction
Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Intelligenta system (IS-lab).ORCID-id: 0000-0001-5163-2997
Lund University, Department of Theoretical Physics, Lund, Sweden.
2004 (Engelska)Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 20, nr 11, s. 1702-1709Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Several papers have been published where non-linear machine learning algorithms, e.g. artificial neural networks, support vector machines and decision trees, have been used to model the specificity of the HIV-1 protease and extract specificity rules. We show that the dataset used in these studies is linearly separable and that it is a misuse of nonlinear classifiers to apply them to this problem. The best solution on this dataset is achieved using a linear classifier like the simple perceptron or the linear support vector machine, and it is straightforward to extract rules from these linear models. We identify key residues in peptides that are efficiently cleaved by the HIV-1 protease and list the most prominent rules, relating them to experimental results for the HIV-1 protease. Motivation: Understanding HIV-1 protease specificity is important when designing HIV inhibitors and several different machine learning algorithms have been applied to the problem. However, little progress has been made in understanding the specificity because nonlinear and overly complex models have been used. Results: We show that the problem is much easier than what has previously been reported and that linear classifiers like the simple perceptron or linear support vector machines are at least as good predictors as nonlinear algorithms. We also show how sets of specificity rules can be generated from the resulting linear classifiers.

Ort, förlag, år, upplaga, sidor
Oxford: Oxford University Press, 2004. Vol. 20, nr 11, s. 1702-1709
Nyckelord [en]
Neural networks
Nationell ämneskategori
Biokemi Molekylärbiologi
Identifikatorer
URN: urn:nbn:se:hh:diva-226DOI: 10.1093/bioinformatics/bth144ISI: 000223142000006PubMedID: 14988129Scopus ID: 2-s2.0-4444262024Lokalt ID: 2082/521OAI: oai:DiVA.org:hh-226DiVA, id: diva2:237404
Tillgänglig från: 2006-11-24 Skapad: 2006-11-24 Senast uppdaterad: 2025-02-20Bibliografiskt granskad
Ingår i avhandling
1. Computational prediction models for proteolytic cleavage and epitope identification
Öppna denna publikation i ny flik eller fönster >>Computational prediction models for proteolytic cleavage and epitope identification
2007 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The biological functions of proteins depend on their physical interactions with other molecules, such as proteins and peptides. Therefore, modeling the protein-ligand interactions is important for understanding protein functions in different biological processes. We have focused on the cleavage specificities of HIV-1 protease, HCV NS3 protease and caspases on short oligopeptides or in native proteins; the binding affinity of MHC molecules with short oligopeptides and identification of T cell epitopes. we expect that our findings on HIV-1 protease, HCV NS3 protease and caspases generalize to other proteases. In this thesis, we have performed analysis on these interactions from different perspectives - we have extended and collected new substrate data sets; used and compared different prediction methods (e.g. linear support vector machines, neural networks, OSRE method, rough set theory and Gaussian processes) to understand the underlying interaction problems; suggested new methods (i.e. a hierarchical method and Gaussian processes with test reject method) to improve predictions; and extracted cleavage rules for protease cleavage specificities. From our studies, we have extended oligopeptide substrate data sets and collected native protein substrates for HIV-1 protease, and a new oligopeptide substrate data set for HCV protease. We have shown that all current HIV-1 protease oligopeptide substratde data sets and our HCV data set are linearly separable; for HIV-1 protease, size and hydrophobicity are two important physicochemical properties in the recognition of short oligopeptide substrates to the protease; and linear support vector mahine is the state-of-the-art for this protease cleavage prediction problem. Our hierarchical method combining protein secondary structure information and experimental short oligopeptide cleavage information an improve the prediction of HIV-1 protease cleavage sites in native proteins. Our rule extraction method provides simple an accurate cleavage rules with high fidelity for HIV-1 and HCV proteases. For MHC molecules, we showed that high binding affinities are not necessarily correlated to immunogenicity on HLA-restricted peptides. Our test reject method combined with Gaussian processes can simplify experimental design by reducing false positives for detecting potential epitopes in large pathogen genomes.

Ort, förlag, år, upplaga, sidor
Lund: Department of Theoretical Physics, Lund University, 2007. s. 84
Nyckelord
Binding affinity, Caspase, Cleavage predition, Cleavage specifictiy, Epitope, False positive, Gaussian process, HCV, Hierarchial method, HIV, Immunology, MHC, OSRE, Protease-peptide interaction, Rule extraction, Sequence analysis, SVM
Nationell ämneskategori
Bioinformatik och beräkningsbiologi
Identifikatorer
urn:nbn:se:hh:diva-1981 (URN)2082/2376 (Lokalt ID)978-91-628-7218-2 (ISBN)2082/2376 (Arkivnummer)2082/2376 (OAI)
Tillgänglig från: 2008-09-29 Skapad: 2008-09-29 Senast uppdaterad: 2025-02-07Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextPubMedScopus

Person

Rögnvaldsson, ThorsteinnYou, Liwen

Sök vidare i DiVA

Av författaren/redaktören
Rögnvaldsson, ThorsteinnYou, Liwen
Av organisationen
Intelligenta system (IS-lab)
I samma tidskrift
Bioinformatics
BiokemiMolekylärbiologi

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetricpoäng

doi
pubmed
urn-nbn
Totalt: 386 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf