hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Mind the Data, Measuring the Performance Gap Between Tree Ensembles and Deep Learning on Tabular Data
King, Stockholm, Sweden.
King, Stockholm, Sweden; Royal Institute Of Technology, Stockholm, Sweden.
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-7796-5201
Halmstad University, School of Information Technology.ORCID iD: 0000-0003-3272-4145
Show others and affiliations
2024 (English)In: Advances in Intelligent Data Analysis XXII: Proceedings, Part I / [ed] Ioanna Miliou; Nico Piatkowski; Panagiotis Papapetrou, Heidelberg: Springer Berlin/Heidelberg, 2024, Vol. 14641, p. 65-76Conference paper, Published paper (Refereed)
Abstract [en]

Recent machine learning studies on tabular data show that ensembles of decision tree models are more efficient and performant than deep learning models such as Tabular Transformer models. However, as we demonstrate, these studies are limited in scope and do not paint the full picture. In this work, we focus on how two dataset properties, namely dataset size and feature complexity, affect the empirical performance comparison between tree ensembles and Tabular Transformer models. Specifically, we employ a hypothesis-driven approach and identify situations where Tabular Transformer models are expected to outperform tree ensemble models. Through empirical evaluation, we demonstrate that given large enough datasets, deep learning models perform better than tree models. This gets more pronounced when complex feature interactions exist in the given task and dataset, suggesting that one must pay careful attention to dataset properties when selecting a model for tabular data in machine learning – especially in an industrial setting, where larger and larger datasets with less and less carefully engineered features are becoming routinely available. © The Author(s)

Place, publisher, year, edition, pages
Heidelberg: Springer Berlin/Heidelberg, 2024. Vol. 14641, p. 65-76
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14641
Keywords [en]
Gradient boosting, Tabular data, Tabular Transformers
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:hh:diva-53352DOI: 10.1007/978-3-031-58547-0_6Scopus ID: 2-s2.0-85192227414ISBN: 9783031585463 (print)OAI: oai:DiVA.org:hh-53352DiVA, id: diva2:1865781
Conference
22nd International Symposium on Intelligent Data Analysis, IDA 2024, Stockholm, Sweden, April 24–26, 2024
Available from: 2024-06-05 Created: 2024-06-05 Last updated: 2024-06-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Nowaczyk, SławomirPashami, Sepideh

Search in DiVA

By author/editor
Nowaczyk, SławomirPashami, SepidehAsadi, Sahar
By organisation
School of Information Technology
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 56 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf