hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sentence based risk classifier using NLP and machine learning
Halmstad University.
Halmstad University.
2023 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

This project was inspired by the company Dizparc and has a focus onclassification systems together with certain applications of natural languageprocessing. Classification systems are a very extensively researched areadating back to the latter half of the 1900s with multiple different ways of theproblems presented up until its more modern takes in today’s age. There aremany approaches to classification systems with applications of naturallanguage processing, some already existing ones are the combination ofword vectorization methods together with various algorithms such asWord2Vec merged with Transformers or Convolution Neural Networks.Most of the classification systems with applications of natural languageprocessing usually reside within medical research, and therefore access todata is strictly limited. This project was designed to classify inputs using themachine learning algorithms Multinomial Logistic Regression, DecisionTree, and Random Forest, and to compare the models to see which of themwould yield the best results. These results were tested based on the overallaccuracy, and difference in lowest and highest accuracy. Confusion matriceswere also used to check which classes were the easiest to predict. Thatshowed a better result for Random Forest when using certain numbers ofclasses, while Decision Tree was able to reach similar results when usingfewer classes. The quantity and quality of data accumulated may not servesufficient to correctly classify inputs through certain methods.

Place, publisher, year, edition, pages
2023.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:hh:diva-51098OAI: oai:DiVA.org:hh-51098DiVA, id: diva2:1776619
External cooperation
Dizparc
Subject / course
Computer science and engineering
Educational program
Computer Science and Engineering, 300 credits
Supervisors
Examiners
Available from: 2023-06-28 Created: 2023-06-28 Last updated: 2023-06-28Bibliographically approved

Open Access in DiVA

fulltext(1296 kB)69 downloads
File information
File name FULLTEXT02.pdfFile size 1296 kBChecksum SHA-512
ea07e280aa46ff16bfd85af06292ac949734cf89b6f6287a67146780e4e446895f6c8c6d8e607f356e220b9abdb3e0e5e3c837a2254ea6c6b4f54e876c9ba517
Type fulltextMimetype application/pdf

By organisation
Halmstad University
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 69 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 251 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf