hh.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
You, Liwen
Publications (9 of 9) Show all publications
Rögnvaldsson, T., You, L. & Garwicz, D. (2015). State of the art prediction of HIV-1 protease cleavage sites. Bioinformatics, 31(8), 1204-1210
Open this publication in new window or tab >>State of the art prediction of HIV-1 protease cleavage sites
2015 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 31, no 8, p. 1204-1210Article in journal (Refereed) Published
Abstract [en]

Motivation: Understanding the substrate specificity of HIV-1 protease is important when designing effective HIV-1 protease inhibitors. Furthermore, characterizing and predicting the cleavage profile of HIV-1 protease is essential to generate and test hypotheses of how HIV-1 affects proteins of the human host. Currently available tools for predicting cleavage by HIV-1 protease can be improved.

Results: The linear support vector machine with orthogonal encod-ing is shown to be the best predictor for HIV-1 protease cleavage. It is considerably better than current publicly available predictor ser-vices. It is also found that schemes using physicochemical proper-ties do not improve over the standard orthogonal encoding scheme. Some issues with the currently available data are discussed.

Availability: The data sets used, which are the most important part, are available at the UCI Machine Learning Repository. The tools used are all standard and easily available. © 2014 The Author.

Place, publisher, year, edition, pages
Oxford: Oxford University Press, 2015
Keywords
Bioinformatics, HIV-1
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:hh:diva-27165 (URN)10.1093/bioinformatics/btu810 (DOI)000354453700007 ()25504647 (PubMedID)2-s2.0-84927720595 (Scopus ID)
Available from: 2014-12-04 Created: 2014-12-04 Last updated: 2018-03-22Bibliographically approved
Rögnvaldsson, T., Etchells, T. A., You, L., Garwicz, D., Jarman, I. & Lisboa, P. J. (2009). How to find simple and accurate rules for viral protease cleavage specificities. BMC Bioinformatics, 10, 149-156
Open this publication in new window or tab >>How to find simple and accurate rules for viral protease cleavage specificities
Show others...
2009 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, p. 149-156Article in journal (Refereed) Published
Abstract [en]

BACKGROUND:

Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way.

RESULTS:

A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods.

CONCLUSION:

A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.

Place, publisher, year, edition, pages
BioMed Central Ltd., 2009
Keywords
Amino Acid Sequence, Catalytic Domain, Cluster Analysis, Computer Simulation
National Category
Engineering and Technology
Identifiers
urn:nbn:se:hh:diva-61 (URN)10.1186/1471-2105-10-149 (DOI)000267595400003 ()19445713 (PubMedID)2-s2.0-67650914275 (Scopus ID)
Available from: 2009-09-01 Created: 2009-09-01 Last updated: 2018-03-23Bibliographically approved
You, L. & Rögnvaldsson, T. (2007). Almost Linear Biobasis Function Neural Networks. In: The 2007 International Joint Conference on Neural Networks: IJCNN 2007 conference proceedings : August 12-17, 2007, Resaissance Orlando Resort, Orlando, Florida, USA. Paper presented at The 2007 International Joint Conference on Neural Networks (pp. 1774-1778). Piscataway, N.J.: IEEE Press
Open this publication in new window or tab >>Almost Linear Biobasis Function Neural Networks
2007 (English)In: The 2007 International Joint Conference on Neural Networks: IJCNN 2007 conference proceedings : August 12-17, 2007, Resaissance Orlando Resort, Orlando, Florida, USA, Piscataway, N.J.: IEEE Press, 2007, p. 1774-1778Conference paper, Published paper (Other academic)
Abstract [en]

An analysis of biobasis function neural networks is presented, which shows that the similarity metric used is a linear function and that bio-basis function neural networks therefore often end up being just linear classifiers in high dimensional spaces. This is a consequence of four things: the linearity of the distance measure, the normalization of the distance measure, the recommended default values of the parameters, and that biological data sets are sparse.

Place, publisher, year, edition, pages
Piscataway, N.J.: IEEE Press, 2007
Series
International Conference on Neural Networks, ISSN 1098-7576 ; 2007
Keywords
biobasis function neural networks, biological data set, biology computing, data analysis, distance measure linearity, distance measure normalization, linear classifiers, linear function, neural nets, pattern classification, similarity metric
National Category
Engineering and Technology
Identifiers
urn:nbn:se:hh:diva-5021 (URN)10.1109/IJCNN.2007.4371226 (DOI)000254291101124 ()2-s2.0-51749115099 (Scopus ID)978-1-4244-1380-5 (ISBN)
Conference
The 2007 International Joint Conference on Neural Networks
Note

©2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Available from: 2010-06-28 Created: 2010-06-28 Last updated: 2018-03-23Bibliographically approved
Rögnvaldsson, T., You, L. & Garwicz, D. (2007). Bioinformatic approaches for modeling the substrate specificity of HIV-1 protease: an overview. Expert Review of Molecular Diagnostics, 7(4), 435-451
Open this publication in new window or tab >>Bioinformatic approaches for modeling the substrate specificity of HIV-1 protease: an overview
2007 (English)In: Expert Review of Molecular Diagnostics, ISSN 1473-7159, E-ISSN 1744-8352, E-ISSN 1744-8352, Vol. 7, no 4, p. 435-451Article in journal (Refereed) Published
Abstract [en]

HIV-1 protease has a broad and complex substrate specificity, which hitherto has escaped a simple comprehensive definition. This, and the relatively high mutation rate of the retroviral protease, makes it challenging to design effective protease inhibitors. Several attempts have been made during the last two decades to elucidate the enigmatic cleavage specificity of HIV-1 protease and to predict cleavage of novel substrates using bioinformatic analysis methods. This review describes the methods that have been utilized to date to address this important problem and the results achieved. The data sets used are also reviewed and important aspects of these are highlighted.

Place, publisher, year, edition, pages
Expert Reviews Ltd, 2007
Keywords
bioinformatics, cleavage rule, HIV, human immunodeficiency virus, physicochemical property, prediction, protease
National Category
Medical and Health Sciences
Identifiers
urn:nbn:se:hh:diva-2002 (URN)10.1586/14737159.7.4.435 (DOI)000248620700010 ()17620050 (PubMedID)2-s2.0-34447517453 (Scopus ID)2082/2397 (Local ID)2082/2397 (Archive number)2082/2397 (OAI)
Available from: 2008-10-06 Created: 2008-10-06 Last updated: 2018-03-23Bibliographically approved
You, L. (2007). Computational prediction models for proteolytic cleavage and epitope identification. (Doctoral dissertation). Lund: Department of Theoretical Physics, Lund University
Open this publication in new window or tab >>Computational prediction models for proteolytic cleavage and epitope identification
2007 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The biological functions of proteins depend on their physical interactions with other molecules, such as proteins and peptides. Therefore, modeling the protein-ligand interactions is important for understanding protein functions in different biological processes. We have focused on the cleavage specificities of HIV-1 protease, HCV NS3 protease and caspases on short oligopeptides or in native proteins; the binding affinity of MHC molecules with short oligopeptides and identification of T cell epitopes. we expect that our findings on HIV-1 protease, HCV NS3 protease and caspases generalize to other proteases. In this thesis, we have performed analysis on these interactions from different perspectives - we have extended and collected new substrate data sets; used and compared different prediction methods (e.g. linear support vector machines, neural networks, OSRE method, rough set theory and Gaussian processes) to understand the underlying interaction problems; suggested new methods (i.e. a hierarchical method and Gaussian processes with test reject method) to improve predictions; and extracted cleavage rules for protease cleavage specificities. From our studies, we have extended oligopeptide substrate data sets and collected native protein substrates for HIV-1 protease, and a new oligopeptide substrate data set for HCV protease. We have shown that all current HIV-1 protease oligopeptide substratde data sets and our HCV data set are linearly separable; for HIV-1 protease, size and hydrophobicity are two important physicochemical properties in the recognition of short oligopeptide substrates to the protease; and linear support vector mahine is the state-of-the-art for this protease cleavage prediction problem. Our hierarchical method combining protein secondary structure information and experimental short oligopeptide cleavage information an improve the prediction of HIV-1 protease cleavage sites in native proteins. Our rule extraction method provides simple an accurate cleavage rules with high fidelity for HIV-1 and HCV proteases. For MHC molecules, we showed that high binding affinities are not necessarily correlated to immunogenicity on HLA-restricted peptides. Our test reject method combined with Gaussian processes can simplify experimental design by reducing false positives for detecting potential epitopes in large pathogen genomes.

Place, publisher, year, edition, pages
Lund: Department of Theoretical Physics, Lund University, 2007. p. 84
Keywords
Binding affinity, Caspase, Cleavage predition, Cleavage specifictiy, Epitope, False positive, Gaussian process, HCV, Hierarchial method, HIV, Immunology, MHC, OSRE, Protease-peptide interaction, Rule extraction, Sequence analysis, SVM
National Category
Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:hh:diva-1981 (URN)2082/2376 (Local ID)978-91-628-7218-2 (ISBN)2082/2376 (Archive number)2082/2376 (OAI)
Available from: 2008-09-29 Created: 2008-09-29 Last updated: 2018-03-23Bibliographically approved
You, L., Zhang, P., Bodén, M. & Brusic, V. (2007). Understanding Prediction Systems for HLA-Binding Peptides and T-Cell Epitope Identification. In: Rajapakse, J C, Schmidt, B, Volkert, G (Ed.), Pattern Recognition in Bioinformatics: Proceedings. Paper presented at 2nd International Workshop on Pattern Recognition in Bioinformatics, Singapore, Oct 01-02, 2007 (pp. 337-348). Berlin: Springer Berlin/Heidelberg
Open this publication in new window or tab >>Understanding Prediction Systems for HLA-Binding Peptides and T-Cell Epitope Identification
2007 (English)In: Pattern Recognition in Bioinformatics: Proceedings / [ed] Rajapakse, J C, Schmidt, B, Volkert, G, Berlin: Springer Berlin/Heidelberg, 2007, p. 337-348Conference paper, Published paper (Refereed)
Abstract [en]

Peptide binding to HLA molecules is a critical step in induction and regulation of T-cell mediated immune responses. Because of combinatorial complexity of immune responses, systematic studies require combination of computational methods and experimentation. Most of available computational predictions are based on discriminating binders from non-binders based on use of suitable prediction thresholds. We compared four state-of-the-art binding affinity prediction models and found that nonlinear models show better performance than linear models. A comprehensive analysis of HLA binders (A*0101, A*0201, A*0301, A*1101, A*2402, B*0702, B*0801 and B*1501) showed that non-linear predictors predict peptide binding affinity with high accuracy. The analysis of known T-cell epitopes of survivin and known HIV T-cell epitopes showed lack of correlation between binding affinity and immunogenicity of HLA-presented peptides. T-cell epitopes, therefore, can not be directly determined from binding affinities by simple selection of the highest affinity binders.

Place, publisher, year, edition, pages
Berlin: Springer Berlin/Heidelberg, 2007
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; Volume 4774/2007
Keywords
HLA-Binding Peptides, T-Cell Epitope Identification
National Category
Engineering and Technology
Identifiers
urn:nbn:se:hh:diva-2038 (URN)10.1007/978-3-540-75286-8_32 (DOI)000251314800032 ()2-s2.0-38349072025 (Scopus ID)2082/2433 (Local ID)978-3-540-75285-1 (ISBN)2082/2433 (Archive number)2082/2433 (OAI)
Conference
2nd International Workshop on Pattern Recognition in Bioinformatics, Singapore, Oct 01-02, 2007
Available from: 2008-10-14 Created: 2008-10-14 Last updated: 2018-03-23Bibliographically approved
You, L. (2006). Detection of cleavage sites for HIV-1 protease in native proteins. In: Proceedings of LSS Computational Systems Bioinformatics Conference: Computational Systems Bioinformatics Conference (vol. 5). Paper presented at LSS Computational Systems Bioinformatics Conference (pp. 249-256). Imperial College Press
Open this publication in new window or tab >>Detection of cleavage sites for HIV-1 protease in native proteins
2006 (English)In: Proceedings of LSS Computational Systems Bioinformatics Conference: Computational Systems Bioinformatics Conference (vol. 5), Imperial College Press, 2006, p. 249-256Conference paper, Published paper (Refereed)
Abstract [en]

Predicting novel cleavage sites for HIV-1 protease in non-viral proteins is a difficult task because of the scarcity of previous cleavage data on proteins in a native state. We introduce a three-level hierarchical classifier which combines information from experimentally verified short oligopeptides, secondary structure and solvent accessibility information from prediction servers to predict potential cleavage sites in non-viral proteins. The best classifier using secondary structure information on the second level classification of the hierarchical classifier is the one using logistic regression. By using this level of classification, the false positive ratio was reduced by more than half compared to the first level classifier using only the oligopeptide cleavage information. The method can be applied on other protease specificity problems too, to combine information from oligopeptides and structure from native proteins.

Place, publisher, year, edition, pages
Imperial College Press, 2006
Keywords
Bayes Theorem, Binding Sites, Cluster Analysis, Computational Biology, HIV Protease, HIV-1, Oligopeptides, Peptides
National Category
Biological Sciences
Identifiers
urn:nbn:se:hh:diva-2121 (URN)2-s2.0-34250823439 (Scopus ID)2082/2516 (Local ID)2082/2516 (Archive number)2082/2516 (OAI)
Conference
LSS Computational Systems Bioinformatics Conference
Available from: 2008-11-11 Created: 2008-11-11 Last updated: 2018-03-23Bibliographically approved
You, L., Garwicz, D. & Rögnvaldsson, T. (2005). Comprehensive Bioinformatic Analysis of the Specificity of Human Immunodeficiency Virus Type 1 Protease. Journal of Virology, 79(19), 12477-12486
Open this publication in new window or tab >>Comprehensive Bioinformatic Analysis of the Specificity of Human Immunodeficiency Virus Type 1 Protease
2005 (English)In: Journal of Virology, ISSN 0022-538X, E-ISSN 1098-5514, Vol. 79, no 19, p. 12477-12486Article in journal (Refereed) Published
Abstract [en]

Rapidly developing viral resistance to licensed human immunodeficiency virus type 1 (HIV-1) protease inhibitors is an increasing problem in the treatment of HIV-infected individuals and AIDS patients. A rational design of more effective protease inhibitors and discovery of potential biological substrates for the HIV-1 protease require accurate models for protease cleavage specificity. In this study, several popular bioinformatic machine learning methods, including support vector machines and artificial neural networks, were used to analyze the specificity of the HIV-1 protease. A new, extensive data set (746 peptides that have been experimentally tested for cleavage by the HIV-1 protease) was compiled, and the data were used to construct different classifiers that predicted whether the protease would cleave a given peptide substrate or not. The best predictor was a nonlinear predictor using two physicochemical parameters (hydrophobicity, or alternatively polarity, and size) for the amino acids, indicating that these properties are the key features recognized by the HIV-1 protease. The present in silico study provides new and important insights into the workings of the HIV-1 protease at the molecular level, supporting the recent hypothesis that the protease primarily recognizes a conformation rather than a specific amino acid sequence. Furthermore, we demonstrate that the presence of 1 to 2 lysine residues near the cleavage site of octameric peptide substrates seems to prevent cleavage efficiently, suggesting that this positively charged amino acid plays an important role in hindering the activity of the HIV-1 protease.

Place, publisher, year, edition, pages
Washington, DC: The American Society for Microbiology, 2005
Keywords
Bioinformatic Analysis, Human Immunodeficiency, Virus Type 1 Protease
National Category
Medical and Health Sciences
Identifiers
urn:nbn:se:hh:diva-268 (URN)10.1128/JVI.79.19.12477-12486.2005 (DOI)000231992500036 ()16160175 (PubMedID)2-s2.0-25144487698 (Scopus ID)2082/563 (Local ID)2082/563 (Archive number)2082/563 (OAI)
Available from: 2006-11-27 Created: 2006-11-27 Last updated: 2018-03-23Bibliographically approved
Rögnvaldsson, T. & You, L. (2004). Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics, 20(11), 1702-1709
Open this publication in new window or tab >>Why neural networks should not be used for HIV-1 protease cleavage site prediction
2004 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 20, no 11, p. 1702-1709Article in journal (Refereed) Published
Abstract [en]

Several papers have been published where non-linear machine learning algorithms, e.g. artificial neural networks, support vector machines and decision trees, have been used to model the specificity of the HIV-1 protease and extract specificity rules. We show that the dataset used in these studies is linearly separable and that it is a misuse of nonlinear classifiers to apply them to this problem. The best solution on this dataset is achieved using a linear classifier like the simple perceptron or the linear support vector machine, and it is straightforward to extract rules from these linear models. We identify key residues in peptides that are efficiently cleaved by the HIV-1 protease and list the most prominent rules, relating them to experimental results for the HIV-1 protease. Motivation: Understanding HIV-1 protease specificity is important when designing HIV inhibitors and several different machine learning algorithms have been applied to the problem. However, little progress has been made in understanding the specificity because nonlinear and overly complex models have been used. Results: We show that the problem is much easier than what has previously been reported and that linear classifiers like the simple perceptron or linear support vector machines are at least as good predictors as nonlinear algorithms. We also show how sets of specificity rules can be generated from the resulting linear classifiers.

Place, publisher, year, edition, pages
Oxford University Press, 2004
Keywords
Neural networks
National Category
Biochemistry and Molecular Biology
Identifiers
urn:nbn:se:hh:diva-226 (URN)10.1093/bioinformatics/bth144 (DOI)000223142000006 ()14988129 (PubMedID)2-s2.0-4444262024 (Scopus ID)2082/521 (Local ID)2082/521 (Archive number)2082/521 (OAI)
Available from: 2006-11-24 Created: 2006-11-24 Last updated: 2018-03-23Bibliographically approved
Organisations

Search in DiVA

Show all publications