hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
AI applications on healthcare data
Halmstad University, School of Information Technology.
Halmstad University, School of Information Technology.
2021 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
AI tillämpningar på Hälsovårdsdata (Swedish)
Abstract [en]

The purpose of this research is to get a better understanding of how different machine learning algorithms work with different amounts of data corruption. This is important since data corruption is an overbearing issue within data collection and thus, in extension, any work that relies on the collected data. The questions we were looking at were: What feature is the most important? How significant is the correlation of features? What algorithms should be used given the data available? And, How much noise (inaccurate or unhelpful captured data) is acceptable? 

The study is structured to introduce AI in healthcare, data missingness, and the machine learning algorithms we used in the study. In the method section, we give a recommended workflow for handling data with machine learning in mind.

The results show us that when a dataset is filled with random values, the run-time of algorithms increases since many patterns are lost. Randomly removing values also caused less of a problem than first anticipated since we ran multiple trials, evening out any problems caused by the lost values. Lastly, imputation is a preferred way of handling missing data since it retained many dataset structures. One has to keep in mind if the imputation is done on categories or numerical values.

However, there is no easy "best-fit" for any dataset. It is hard to give a concrete answer when choosing a machine learning algorithm that fits any dataset. Nevertheless, since it is easy to simply plug-and-play with many algorithms, we would recommend any user try different ones before deciding which one fits a project the best.

Place, publisher, year, edition, pages
2021. , p. 42
Keywords [en]
AI, Machine Learning, Data missingness, healthcare
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hh:diva-44752OAI: oai:DiVA.org:hh-44752DiVA, id: diva2:1567413
External cooperation
Carmona
Subject / course
Computer science and engineering
Educational program
Computer Science and Engineering, 300 credits
Supervisors
Examiners
Available from: 2021-06-06 Created: 2021-06-16 Last updated: 2021-06-23Bibliographically approved

Open Access in DiVA

fulltext(4436 kB)984 downloads
File information
File name FULLTEXT02.pdfFile size 4436 kBChecksum SHA-512
6a7cf5639e90b013b87392404ad70f0f2d04e90ac3c8caa2022311d0816803ca2054534b4ed860ab9d13430927ebb0d66dc7e5ef3b45ed5f8f7be5658d6fbd18
Type fulltextMimetype application/pdf

By organisation
School of Information Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 985 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 397 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf