Motivation: Understanding the substrate specificity of HIV-1 protease is important when designing effective HIV-1 protease inhibitors. Furthermore, characterizing and predicting the cleavage profile of HIV-1 protease is essential to generate and test hypotheses of how HIV-1 affects proteins of the human host. Currently available tools for predicting cleavage by HIV-1 protease can be improved.
Results: The linear support vector machine with orthogonal encod-ing is shown to be the best predictor for HIV-1 protease cleavage. It is considerably better than current publicly available predictor ser-vices. It is also found that schemes using physicochemical proper-ties do not improve over the standard orthogonal encoding scheme. Some issues with the currently available data are discussed.
Availability: The data sets used, which are the most important part, are available at the UCI Machine Learning Repository. The tools used are all standard and easily available. © 2014 The Author.