AI applications on healthcare data
2021 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesisAlternative title
AI tillämpningar på Hälsovårdsdata (Swedish)
Abstract [en]
The purpose of this research is to get a better understanding of how different machine learning algorithms work with different amounts of data corruption. This is important since data corruption is an overbearing issue within data collection and thus, in extension, any work that relies on the collected data. The questions we were looking at were: What feature is the most important? How significant is the correlation of features? What algorithms should be used given the data available? And, How much noise (inaccurate or unhelpful captured data) is acceptable?
The study is structured to introduce AI in healthcare, data missingness, and the machine learning algorithms we used in the study. In the method section, we give a recommended workflow for handling data with machine learning in mind.
The results show us that when a dataset is filled with random values, the run-time of algorithms increases since many patterns are lost. Randomly removing values also caused less of a problem than first anticipated since we ran multiple trials, evening out any problems caused by the lost values. Lastly, imputation is a preferred way of handling missing data since it retained many dataset structures. One has to keep in mind if the imputation is done on categories or numerical values.
However, there is no easy "best-fit" for any dataset. It is hard to give a concrete answer when choosing a machine learning algorithm that fits any dataset. Nevertheless, since it is easy to simply plug-and-play with many algorithms, we would recommend any user try different ones before deciding which one fits a project the best.
Place, publisher, year, edition, pages
2021. , p. 42
Keywords [en]
AI, Machine Learning, Data missingness, healthcare
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hh:diva-44752OAI: oai:DiVA.org:hh-44752DiVA, id: diva2:1567413
External cooperation
Carmona
Subject / course
Computer science and engineering
Educational program
Computer Science and Engineering, 300 credits
Supervisors
Examiners
2021-06-062021-06-162021-06-23Bibliographically approved