The present study aims to determine whether a privacy-preserving data mining method can be effectively applied in data mining for a social networking service (SNS). Data mining with privacy protection is a technology that is used to discover relevant knowledge from large datasets while protecting users’ personal and sensitive information. The growing popularity of SNSs in recent years has raised concerns about user privacy, as SNS collects personal data from users such as address and birth date. It is now possible to provide secure personalized services to SNS users by implementing privacy-preserving data mining on the personal information collected by an SNS. In a previous study, we considered using anonymized data mining to protect people’s privacy. By this approach, all input information is anonymized while performing data mining. We examined whether the anonymization approach can be applied to data that can be partially anonymized, such as the SNS data, and how many users can be identified by the anonymization approach. However, previous research did not include an analysis of actual SNS data. In the current study, we examine tweets about COVID-19 and extract personal information from the content. We investigated whether the posting location could be estimated by examining the frequency of words in the posted content, with the correct answer data being the posting position of the tweet with location information. According to the survey results, the top keywords in the posted content are place names. We confirmed the necessity of privacy protection data mining for SNS that we are proposing.

Author: Ayahiko Niimi

Published in: International Conference for Internet Technology and Secured Transactions (ICITST-2021)

  • Date of Conference: 7-9 December 2021
  • DOI: 10.20533/ICITST.2021.0015
  • ISBN: 978-1-913572-39-6
  • Conference Location: Virtual (London, UK)