Forensic Authorship Analysis of Microblogging Texts Using N -Grams and Stylometric Features
2020 (English)In: 2020 8th International Workshop on Biometrics and Forensics (IWBF), Piscataway: IEEE, 2020, article id 9107953Conference paper, Published paper (Refereed)
Abstract [en]
In recent years, messages and text posted on the Internet are used in criminal investigations. Unfortunately, the authorship of many of them remains unknown. In some channels, the problem of establishing authorship may be even harder, since the length of digital texts is limited to a certain number of characters. In this work, we aim at identifying authors of tweet messages, which are limited to 280 characters. We evaluate popular features employed traditionally in authorship attribution which capture properties of the writing style at different levels. We use for our experiments a self-captured database of 40 users, with 120 to 200 tweets per user. Results using this small set are promising, with the different features providing a classification accuracy between 92% and 98.5%. These results are competitive in comparison to existing studies which employ short texts such as tweets or SMS. ©2020 IEEE
Place, publisher, year, edition, pages
Piscataway: IEEE, 2020. article id 9107953
Series
International Workshop on Biometrics and Forensics IWBF, ISSN 2381-6120
Keywords [en]
Authorship Identification, Authorship Attribution, Stylometry, N-Grams, Microblogging, Forensics
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:hh:diva-41798DOI: 10.1109/IWBF49977.2020.9107953ISI: 000589628500009Scopus ID: 2-s2.0-85087086446Libris ID: z9c9jg03w9ptvvwpISBN: 9781728162331 (print)OAI: oai:DiVA.org:hh-41798DiVA, id: diva2:1416649
Conference
8th International Workshop on Biometrics and Forensics (IWBF 2020), Porto, Portugal, April 29-30, 2020
Funder
Swedish Research CouncilKnowledge Foundation
Note
Other funding: European Social Fund via IT Academy programme.
2020-03-242020-03-242020-12-18Bibliographically approved