Application of deep learning tends to outperform hand-crafted features in many domains. This study uses convolutional neural networks to explore effectiveness of various segments of a speech signal,? – text-dependent pronunciation of a short sentence, – in Parkinson’s disease detection task. Besides the common Mel-frequency spectrogram and its first and second derivatives, inclusion of various other input feature maps is also considered. Image interpolation is investigated as a solution to obtain a spectrogram of fixed length. The equal error rate (EER) for sentence segments varied from 20.3% to 29.5%. Fusion of decisions from sentence segments achieved EER of 14.1%, whereas the best result when using the full sentence exhibited EER of 16.8%. Therefore, splitting speech into segments could be recommended for Parkinson’s disease detection. © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2018.
Funding: Research Council of Lithuania (No. MIP-075/2015)