Unsupervised anomaly detection for structured data - Finding similarities between retail products
2021 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Data is one of the most contributing factors for modern business operations. Having bad data could therefore lead to tremendous losses, both financially and for customer experience. This thesis seeks to find anomalies in real-world, complex, structured data, causing an international enterprise to miss out on income and the potential loss of customers. By using graph theory and similarity analysis, the findings suggest that certain countries contribute to the discrepancies more than other countries. This is believed to be an effect of countries customizing their products to match the market’s needs. This thesis is just scratching the surface of the analysis of the data, and the number of opportunities for future work are therefore many.
Place, publisher, year, edition, pages
2021. , p. 82
Keywords [en]
relational data, similarity analysis, data analysis, SQL, NetworkX, graph theory, anomaly detection, unsupervised, retail products, real-world data, AWS, amazon web services, similarity learning, data statistics, data preprocessing, similarity analysis algorithm, data validation
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hh:diva-44756OAI: oai:DiVA.org:hh-44756DiVA, id: diva2:1567459
External cooperation
Jayway
Subject / course
Computer science and engineering
Educational program
Computer Science and Engineering, 300 credits
Supervisors
Examiners
2021-06-022021-06-162021-06-17Bibliographically approved