Random forests based monitoring of human larynx using questionnaire data
2012 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 39, no 5, p. 5506-5512Article in journal (Refereed) Published
Abstract [en]
This paper is concerned with soft computing techniques-based noninvasive monitoring of human larynx using subject’s questionnaire data. By applying random forests (RF), questionnaire data are categorized into a healthy class and several classes of disorders including: cancerous, noncancerous, diffuse, nodular, paralysis, and an overall pathological class. The most important questionnaire statements are determined using RF variable importance evaluations. To explore data represented by variables used by RF, the t-distributed stochastic neighbor embedding (t-SNE) and the multidimensional scaling (MDS) are applied to the RF data proximity matrix. When testing the developed tools on a set of data collected from 109 subjects, the 100% classification accuracy was obtained on unseen data in binary classification into the healthy and pathological classes. The accuracy of 80.7% was achieved when classifying the data into the healthy, cancerous, noncancerous classes. The t-SNE and MDS mapping techniques applied allow obtaining two-dimensional maps of data and facilitate data exploration aimed at identifying subjects belonging to a “risk group”. It is expected that the developed tools will be of great help in preventive health care in laryngology.
Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2012. Vol. 39, no 5, p. 5506-5512
Keywords [en]
Random forests, Variable importance, Variable selection, Classifier, Data proximity, Human larynx
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hh:diva-16645DOI: 10.1016/j.eswa.2011.11.070ISI: 000301155300087Scopus ID: 2-s2.0-84855868516OAI: oai:DiVA.org:hh-16645DiVA, id: diva2:461859
2011-12-052011-12-052018-01-12Bibliographically approved