hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A novel approach to estimate proximity in a random forest: An exploratory study
Viktoria Institute, Göteborg, Sweden.ORCID iD: 0000-0002-1043-8773
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Intelligent systems (IS-lab). Department of Electrical & Control Equipment, Kaunas University of Technology, Kaunas, Lithuania.ORCID iD: 0000-0003-2185-8973
2012 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 39, no 17, p. 13046-13050Article in journal (Refereed) Published
Abstract [en]

A data proximity matrix is an important information source in random forests (RF) based data mining, including data clustering, visualization, outlier detection, substitution of missing values, and finding mislabeled data samples. A novel approach to estimate proximity is proposed in this work. The approach is based on measuring distance between two terminal nodes in a decision tree. To assess the consistency (quality) of data proximity estimate, we suggest using the proximity matrix as a kernel matrix in a support vector machine (SVM), under the assumption that a matrix of higher quality leads to higher classification accuracy. It is experimentally shown that the proposed approach improves the proximity estimate, especially when RF is made of a small number of trees. It is also demonstrated that, for some tasks, an SVM exploiting the suggested proximity matrix based kernel, outperforms an SVM based on a standard radial basis function kernel and the standard proximity matrix based kernel. © 2012 Elsevier Ltd. All rights reserved.

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2012. Vol. 39, no 17, p. 13046-13050
Keywords [en]
Random forest, Proximity matrix, Support vector machine, Kernel matrix, Data mining
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:hh:diva-19380DOI: 10.1016/j.eswa.2012.05.094ISI: 000308449300031Scopus ID: 2-s2.0-84865043451OAI: oai:DiVA.org:hh-19380DiVA, id: diva2:548335
Available from: 2012-08-30 Created: 2012-08-30 Last updated: 2018-03-22Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Englund, CristoferVerikas, Antanas

Search in DiVA

By author/editor
Englund, CristoferVerikas, Antanas
By organisation
Intelligent systems (IS-lab)
In the same journal
Expert systems with applications
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 759 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf