hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Big Data Analytics Using Apache Flink for Cybercrime Forensics on X (formerly known as Twitter)
Halmstad University, School of Information Technology.
2023 (English)Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesisAlternative title
Big Data Analytics Using Apache Flink for Cybercrime Forensics on X (formerly known as Twitter) (English)
Abstract [en]

The exponential growth of social media usage has led to massive data sharing, posing challenges for traditional systems in managing and analyzing such vast amounts of data. This surge in data exchange has also resulted in an increase in cyber threats from individuals and criminal groups. Traditional forensic methods, such as evidence collection and data backup, become impractical when dealing with petabytes or terabytes of data. To address this, Big Data Analytics has emerged as a powerful solution for handling and analyzing structured and unstructured data.

This thesis explores the use of Apache Flink, an open-source tool by the Apache Software Foundation, to enhance cybercrime forensic research. Unlike batch processing engines like Apache Spark, Apache Flink offers real-time processing capabilities, making it well-suited for analyzing dynamic and time-sensitive data streams. The study compares Apache Flink's performance against Apache Spark in handling various workloads on a single node.

The literature review reveals a growing interest in utilizing Big Data Analytics, including platforms like Apache Flink, for cybercrime detection and investigation, especially on social media platforms like X (formerly known as Twitter). Sentiment analysis is a vital technique, but challenges arise due to the unique nature of social data. X (formerly known as Twitter), as a valuable source for cybercrime forensics, enables the study of fraudulent, extremist, and other criminal activities. This research explores various data mining techniques and emphasizes the need for real-time analytics to combat cybercrime effectively.

The methodology involves data collection from X, preprocessing to remove noise, and sentiment analysis to identify cybercrime-related tweets. The comparative analysis between Apache Flink and Apache Spark demonstrates Flink's efficiency in handling larger datasets and real-time processing. Parallelism and scalability are evaluated to optimize performance. The results indicate that Apache Flink outperforms Apache Spark regarding response time, making it a valuable tool for cybercrime forensics.

Despite progress, challenges such as data privacy, accuracy improvement, and cross-platform analysis remain. Future research should focus on refining algorithms, enhancing scalability, and addressing these challenges to further advance cybercrime forensics using Big Data Analytics and platforms like Apache Flink.

Place, publisher, year, edition, pages
2023. , p. 35
Keywords [en]
Apache Flink, Apache Spark, Big Data, Twitter, X
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hh:diva-51848OAI: oai:DiVA.org:hh-51848DiVA, id: diva2:1806421
Subject / course
Digital Forensics
Educational program
Master's Programme in Network Forensics, 60 credits
Supervisors
Examiners
Available from: 2023-10-23 Created: 2023-10-20 Last updated: 2023-10-23Bibliographically approved

Open Access in DiVA

fulltext(1324 kB)254 downloads
File information
File name FULLTEXT02.pdfFile size 1324 kBChecksum SHA-512
7b83924ea5a2232b46d783cd0787321ecf9b76b504c753d742c70468503a9f5468c2cfeae41f226aa11d4798b5229aa4900a4967bf2a6ef3a6bdd33d7e21372e
Type fulltextMimetype application/pdf

By organisation
School of Information Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 254 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 639 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf