hh.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Why neural networks should not be used for HIV-1 protease cleavage site prediction
Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Intelligenta system (IS-lab).ORCID-id: 0000-0001-5163-2997
Lund University, Department of Theoretical Physics, Lund, Sweden.
2004 (engelsk)Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 20, nr 11, s. 1702-1709Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Several papers have been published where non-linear machine learning algorithms, e.g. artificial neural networks, support vector machines and decision trees, have been used to model the specificity of the HIV-1 protease and extract specificity rules. We show that the dataset used in these studies is linearly separable and that it is a misuse of nonlinear classifiers to apply them to this problem. The best solution on this dataset is achieved using a linear classifier like the simple perceptron or the linear support vector machine, and it is straightforward to extract rules from these linear models. We identify key residues in peptides that are efficiently cleaved by the HIV-1 protease and list the most prominent rules, relating them to experimental results for the HIV-1 protease. Motivation: Understanding HIV-1 protease specificity is important when designing HIV inhibitors and several different machine learning algorithms have been applied to the problem. However, little progress has been made in understanding the specificity because nonlinear and overly complex models have been used. Results: We show that the problem is much easier than what has previously been reported and that linear classifiers like the simple perceptron or linear support vector machines are at least as good predictors as nonlinear algorithms. We also show how sets of specificity rules can be generated from the resulting linear classifiers.

sted, utgiver, år, opplag, sider
Oxford: Oxford University Press, 2004. Vol. 20, nr 11, s. 1702-1709
Emneord [en]
Neural networks
HSV kategori
Identifikatorer
URN: urn:nbn:se:hh:diva-226DOI: 10.1093/bioinformatics/bth144ISI: 000223142000006PubMedID: 14988129Scopus ID: 2-s2.0-4444262024Lokal ID: 2082/521OAI: oai:DiVA.org:hh-226DiVA, id: diva2:237404
Tilgjengelig fra: 2006-11-24 Laget: 2006-11-24 Sist oppdatert: 2025-02-20bibliografisk kontrollert
Inngår i avhandling
1. Computational prediction models for proteolytic cleavage and epitope identification
Åpne denne publikasjonen i ny fane eller vindu >>Computational prediction models for proteolytic cleavage and epitope identification
2007 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

The biological functions of proteins depend on their physical interactions with other molecules, such as proteins and peptides. Therefore, modeling the protein-ligand interactions is important for understanding protein functions in different biological processes. We have focused on the cleavage specificities of HIV-1 protease, HCV NS3 protease and caspases on short oligopeptides or in native proteins; the binding affinity of MHC molecules with short oligopeptides and identification of T cell epitopes. we expect that our findings on HIV-1 protease, HCV NS3 protease and caspases generalize to other proteases. In this thesis, we have performed analysis on these interactions from different perspectives - we have extended and collected new substrate data sets; used and compared different prediction methods (e.g. linear support vector machines, neural networks, OSRE method, rough set theory and Gaussian processes) to understand the underlying interaction problems; suggested new methods (i.e. a hierarchical method and Gaussian processes with test reject method) to improve predictions; and extracted cleavage rules for protease cleavage specificities. From our studies, we have extended oligopeptide substrate data sets and collected native protein substrates for HIV-1 protease, and a new oligopeptide substrate data set for HCV protease. We have shown that all current HIV-1 protease oligopeptide substratde data sets and our HCV data set are linearly separable; for HIV-1 protease, size and hydrophobicity are two important physicochemical properties in the recognition of short oligopeptide substrates to the protease; and linear support vector mahine is the state-of-the-art for this protease cleavage prediction problem. Our hierarchical method combining protein secondary structure information and experimental short oligopeptide cleavage information an improve the prediction of HIV-1 protease cleavage sites in native proteins. Our rule extraction method provides simple an accurate cleavage rules with high fidelity for HIV-1 and HCV proteases. For MHC molecules, we showed that high binding affinities are not necessarily correlated to immunogenicity on HLA-restricted peptides. Our test reject method combined with Gaussian processes can simplify experimental design by reducing false positives for detecting potential epitopes in large pathogen genomes.

sted, utgiver, år, opplag, sider
Lund: Department of Theoretical Physics, Lund University, 2007. s. 84
Emneord
Binding affinity, Caspase, Cleavage predition, Cleavage specifictiy, Epitope, False positive, Gaussian process, HCV, Hierarchial method, HIV, Immunology, MHC, OSRE, Protease-peptide interaction, Rule extraction, Sequence analysis, SVM
HSV kategori
Identifikatorer
urn:nbn:se:hh:diva-1981 (URN)2082/2376 (Lokal ID)978-91-628-7218-2 (ISBN)2082/2376 (Arkivnummer)2082/2376 (OAI)
Tilgjengelig fra: 2008-09-29 Laget: 2008-09-29 Sist oppdatert: 2025-02-07bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstPubMedScopus

Person

Rögnvaldsson, ThorsteinnYou, Liwen

Søk i DiVA

Av forfatter/redaktør
Rögnvaldsson, ThorsteinnYou, Liwen
Av organisasjonen
I samme tidsskrift
Bioinformatics

Søk utenfor DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric

doi
pubmed
urn-nbn
Totalt: 386 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf