This report is a result of master thesis in network forensics at Halmstad University during spring term 2018.
The focal point of the thesis was to analyse, categorize the websites into few categories i.e., drugs, explicit content and others based on text mining.
In today’s world, there is huge amount of data available on the world wide web. With the increase of the internet users, there is also an increase in websites dramatically. The data consists of different content related to education or illegal activities. Website categorization is used to categorize websites from unorganized data and the main purpose of website categorization, it is used to place an extensive number of websites into appropriate categories and/or manage security personnel to manage the user activity. Why using text? There is a hitch while categorizing websites using the image(s). There are many images that might fall into various categories, for example, an image of white powder. It might be baby powder or cocaine or something else. The idea of website categorization using text mining is to avoid such a problem. Through this paper, we generate a method to better categorize these websites.