hh.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A novel approach to estimate proximity in a random forest: An exploratory study
Viktoria Institute, Göteborg, Sweden.ORCID-id: 0000-0002-1043-8773
Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS), Intelligenta system (IS-lab). Department of Electrical & Control Equipment, Kaunas University of Technology, Kaunas, Lithuania.ORCID-id: 0000-0003-2185-8973
2012 (Engelska)Ingår i: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 39, nr 17, s. 13046-13050Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

A data proximity matrix is an important information source in random forests (RF) based data mining, including data clustering, visualization, outlier detection, substitution of missing values, and finding mislabeled data samples. A novel approach to estimate proximity is proposed in this work. The approach is based on measuring distance between two terminal nodes in a decision tree. To assess the consistency (quality) of data proximity estimate, we suggest using the proximity matrix as a kernel matrix in a support vector machine (SVM), under the assumption that a matrix of higher quality leads to higher classification accuracy. It is experimentally shown that the proposed approach improves the proximity estimate, especially when RF is made of a small number of trees. It is also demonstrated that, for some tasks, an SVM exploiting the suggested proximity matrix based kernel, outperforms an SVM based on a standard radial basis function kernel and the standard proximity matrix based kernel. © 2012 Elsevier Ltd. All rights reserved.

Ort, förlag, år, upplaga, sidor
Amsterdam: Elsevier, 2012. Vol. 39, nr 17, s. 13046-13050
Nyckelord [en]
Random forest, Proximity matrix, Support vector machine, Kernel matrix, Data mining
Nationell ämneskategori
Sannolikhetsteori och statistik
Identifikatorer
URN: urn:nbn:se:hh:diva-19380DOI: 10.1016/j.eswa.2012.05.094ISI: 000308449300031Scopus ID: 2-s2.0-84865043451OAI: oai:DiVA.org:hh-19380DiVA, id: diva2:548335
Tillgänglig från: 2012-08-30 Skapad: 2012-08-30 Senast uppdaterad: 2018-03-22Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Englund, CristoferVerikas, Antanas

Sök vidare i DiVA

Av författaren/redaktören
Englund, CristoferVerikas, Antanas
Av organisationen
Intelligenta system (IS-lab)
I samma tidskrift
Expert systems with applications
Sannolikhetsteori och statistik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 464 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf