hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Computational prediction models for proteolytic cleavage and epitope identification
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS).
2007 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The biological functions of proteins depend on their physical interactions with other molecules, such as proteins and peptides. Therefore, modeling the protein-ligand interactions is important for understanding protein functions in different biological processes. We have focused on the cleavage specificities of HIV-1 protease, HCV NS3 protease and caspases on short oligopeptides or in native proteins; the binding affinity of MHC molecules with short oligopeptides and identification of T cell epitopes. we expect that our findings on HIV-1 protease, HCV NS3 protease and caspases generalize to other proteases. In this thesis, we have performed analysis on these interactions from different perspectives - we have extended and collected new substrate data sets; used and compared different prediction methods (e.g. linear support vector machines, neural networks, OSRE method, rough set theory and Gaussian processes) to understand the underlying interaction problems; suggested new methods (i.e. a hierarchical method and Gaussian processes with test reject method) to improve predictions; and extracted cleavage rules for protease cleavage specificities. From our studies, we have extended oligopeptide substrate data sets and collected native protein substrates for HIV-1 protease, and a new oligopeptide substrate data set for HCV protease. We have shown that all current HIV-1 protease oligopeptide substratde data sets and our HCV data set are linearly separable; for HIV-1 protease, size and hydrophobicity are two important physicochemical properties in the recognition of short oligopeptide substrates to the protease; and linear support vector mahine is the state-of-the-art for this protease cleavage prediction problem. Our hierarchical method combining protein secondary structure information and experimental short oligopeptide cleavage information an improve the prediction of HIV-1 protease cleavage sites in native proteins. Our rule extraction method provides simple an accurate cleavage rules with high fidelity for HIV-1 and HCV proteases. For MHC molecules, we showed that high binding affinities are not necessarily correlated to immunogenicity on HLA-restricted peptides. Our test reject method combined with Gaussian processes can simplify experimental design by reducing false positives for detecting potential epitopes in large pathogen genomes.

Place, publisher, year, edition, pages
Lund: Department of Theoretical Physics, Lund University , 2007. , p. 84
Keywords [en]
Binding affinity, Caspase, Cleavage predition, Cleavage specifictiy, Epitope, False positive, Gaussian process, HCV, Hierarchial method, HIV, Immunology, MHC, OSRE, Protease-peptide interaction, Rule extraction, Sequence analysis, SVM
National Category
Bioinformatics and Computational Biology
Identifiers
URN: urn:nbn:se:hh:diva-1981Local ID: 2082/2376ISBN: 978-91-628-7218-2 OAI: oai:DiVA.org:hh-1981DiVA, id: diva2:239199
Available from: 2008-09-29 Created: 2008-09-29 Last updated: 2025-02-07Bibliographically approved
List of papers
1. Why neural networks should not be used for HIV-1 protease cleavage site prediction
Open this publication in new window or tab >>Why neural networks should not be used for HIV-1 protease cleavage site prediction
2004 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 20, no 11, p. 1702-1709Article in journal (Refereed) Published
Abstract [en]

Several papers have been published where non-linear machine learning algorithms, e.g. artificial neural networks, support vector machines and decision trees, have been used to model the specificity of the HIV-1 protease and extract specificity rules. We show that the dataset used in these studies is linearly separable and that it is a misuse of nonlinear classifiers to apply them to this problem. The best solution on this dataset is achieved using a linear classifier like the simple perceptron or the linear support vector machine, and it is straightforward to extract rules from these linear models. We identify key residues in peptides that are efficiently cleaved by the HIV-1 protease and list the most prominent rules, relating them to experimental results for the HIV-1 protease. Motivation: Understanding HIV-1 protease specificity is important when designing HIV inhibitors and several different machine learning algorithms have been applied to the problem. However, little progress has been made in understanding the specificity because nonlinear and overly complex models have been used. Results: We show that the problem is much easier than what has previously been reported and that linear classifiers like the simple perceptron or linear support vector machines are at least as good predictors as nonlinear algorithms. We also show how sets of specificity rules can be generated from the resulting linear classifiers.

Place, publisher, year, edition, pages
Oxford: Oxford University Press, 2004
Keywords
Neural networks
National Category
Biochemistry Molecular Biology
Identifiers
urn:nbn:se:hh:diva-226 (URN)10.1093/bioinformatics/bth144 (DOI)000223142000006 ()14988129 (PubMedID)2-s2.0-4444262024 (Scopus ID)2082/521 (Local ID)2082/521 (Archive number)2082/521 (OAI)
Available from: 2006-11-24 Created: 2006-11-24 Last updated: 2025-02-20Bibliographically approved
2. Comprehensive Bioinformatic Analysis of the Specificity of Human Immunodeficiency Virus Type 1 Protease
Open this publication in new window or tab >>Comprehensive Bioinformatic Analysis of the Specificity of Human Immunodeficiency Virus Type 1 Protease
2005 (English)In: Journal of Virology, ISSN 0022-538X, E-ISSN 1098-5514, Vol. 79, no 19, p. 12477-12486Article in journal (Refereed) Published
Abstract [en]

Rapidly developing viral resistance to licensed human immunodeficiency virus type 1 (HIV-1) protease inhibitors is an increasing problem in the treatment of HIV-infected individuals and AIDS patients. A rational design of more effective protease inhibitors and discovery of potential biological substrates for the HIV-1 protease require accurate models for protease cleavage specificity. In this study, several popular bioinformatic machine learning methods, including support vector machines and artificial neural networks, were used to analyze the specificity of the HIV-1 protease. A new, extensive data set (746 peptides that have been experimentally tested for cleavage by the HIV-1 protease) was compiled, and the data were used to construct different classifiers that predicted whether the protease would cleave a given peptide substrate or not. The best predictor was a nonlinear predictor using two physicochemical parameters (hydrophobicity, or alternatively polarity, and size) for the amino acids, indicating that these properties are the key features recognized by the HIV-1 protease. The present in silico study provides new and important insights into the workings of the HIV-1 protease at the molecular level, supporting the recent hypothesis that the protease primarily recognizes a conformation rather than a specific amino acid sequence. Furthermore, we demonstrate that the presence of 1 to 2 lysine residues near the cleavage site of octameric peptide substrates seems to prevent cleavage efficiently, suggesting that this positively charged amino acid plays an important role in hindering the activity of the HIV-1 protease.

Place, publisher, year, edition, pages
Washington, DC: The American Society for Microbiology, 2005
Keywords
Bioinformatic Analysis, Human Immunodeficiency, Virus Type 1 Protease
National Category
Medicinal Chemistry
Identifiers
urn:nbn:se:hh:diva-268 (URN)10.1128/JVI.79.19.12477-12486.2005 (DOI)000231992500036 ()16160175 (PubMedID)2-s2.0-25144487698 (Scopus ID)2082/563 (Local ID)2082/563 (Archive number)2082/563 (OAI)
Note

Correction: You L, Garwicz D, Rögnvaldsson T. 2006. Comprehensive Bioinformatic Analysis of the Specificity of Human Immunodeficiency Virus Type 1 Protease. J Virol 80: https://doi.org/10.1128/jvi.80.8.4205.2006

Available from: 2006-11-27 Created: 2006-11-27 Last updated: 2024-08-06Bibliographically approved
3. Understanding Prediction Systems for HLA-Binding Peptides and T-Cell Epitope Identification
Open this publication in new window or tab >>Understanding Prediction Systems for HLA-Binding Peptides and T-Cell Epitope Identification
2007 (English)In: Pattern Recognition in Bioinformatics: Proceedings / [ed] Rajapakse, J C, Schmidt, B, Volkert, G, Berlin: Springer Berlin/Heidelberg, 2007, p. 337-348Conference paper, Published paper (Refereed)
Abstract [en]

Peptide binding to HLA molecules is a critical step in induction and regulation of T-cell mediated immune responses. Because of combinatorial complexity of immune responses, systematic studies require combination of computational methods and experimentation. Most of available computational predictions are based on discriminating binders from non-binders based on use of suitable prediction thresholds. We compared four state-of-the-art binding affinity prediction models and found that nonlinear models show better performance than linear models. A comprehensive analysis of HLA binders (A*0101, A*0201, A*0301, A*1101, A*2402, B*0702, B*0801 and B*1501) showed that non-linear predictors predict peptide binding affinity with high accuracy. The analysis of known T-cell epitopes of survivin and known HIV T-cell epitopes showed lack of correlation between binding affinity and immunogenicity of HLA-presented peptides. T-cell epitopes, therefore, can not be directly determined from binding affinities by simple selection of the highest affinity binders.

Place, publisher, year, edition, pages
Berlin: Springer Berlin/Heidelberg, 2007
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; Volume 4774/2007
Keywords
HLA-Binding Peptides, T-Cell Epitope Identification
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:hh:diva-2038 (URN)10.1007/978-3-540-75286-8_32 (DOI)000251314800032 ()2-s2.0-38349072025 (Scopus ID)2082/2433 (Local ID)978-3-540-75285-1 (ISBN)2082/2433 (Archive number)2082/2433 (OAI)
Conference
2nd International Workshop on Pattern Recognition in Bioinformatics, Singapore, Oct 01-02, 2007
Available from: 2008-10-14 Created: 2008-10-14 Last updated: 2020-05-11Bibliographically approved
4. Bioinformatic approaches for modeling the substrate specificity of HIV-1 protease: an overview
Open this publication in new window or tab >>Bioinformatic approaches for modeling the substrate specificity of HIV-1 protease: an overview
2007 (English)In: Expert Review of Molecular Diagnostics, ISSN 1473-7159, E-ISSN 1744-8352, E-ISSN 1744-8352, Vol. 7, no 4, p. 435-451Article in journal (Refereed) Published
Abstract [en]

HIV-1 protease has a broad and complex substrate specificity, which hitherto has escaped a simple comprehensive definition. This, and the relatively high mutation rate of the retroviral protease, makes it challenging to design effective protease inhibitors. Several attempts have been made during the last two decades to elucidate the enigmatic cleavage specificity of HIV-1 protease and to predict cleavage of novel substrates using bioinformatic analysis methods. This review describes the methods that have been utilized to date to address this important problem and the results achieved. The data sets used are also reviewed and important aspects of these are highlighted.

Place, publisher, year, edition, pages
Expert Reviews Ltd, 2007
Keywords
bioinformatics, cleavage rule, HIV, human immunodeficiency virus, physicochemical property, prediction, protease
National Category
Hematology
Identifiers
urn:nbn:se:hh:diva-2002 (URN)10.1586/14737159.7.4.435 (DOI)000248620700010 ()17620050 (PubMedID)2-s2.0-34447517453 (Scopus ID)2082/2397 (Local ID)2082/2397 (Archive number)2082/2397 (OAI)
Available from: 2008-10-06 Created: 2008-10-06 Last updated: 2020-05-19Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

You, Liwen

Search in DiVA

By author/editor
You, Liwen
By organisation
Halmstad Embedded and Intelligent Systems Research (EIS)
Bioinformatics and Computational Biology

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 335 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf