This paper proposes a framework for preparing and using corpora from online social networks and review sites for sentiment analysis task. The framework consists of three phases. The first phase is the preprocessing and cleaning of data collected, then data annotation. The second phase is applying various text processing techniques including: removing stopwords, replacing the negation words and the following negated words with the antonyms of the negated words, and using selective words of part-of-speech tags (adjectives and verbs) on the prepared corpora. The third phase is text classification using Naïve Bayes and Decision Tree classifiers and two feature selection approaches, unigrams and bigrams. The experiments show that the data is extremely unbalanced. The results show that applying text processing techniques improve the classification accuracy of the Naïve Bayes classifier and reduce the training time of both classifiers. The results also show that Decision tree classifier is more suitable for imbalance data.

Published in: International Conference on Information Society (i-Society 2014)

  • Date of Conference: 10-12 November 2014
  • DOI: 10.2053/iSociety.2014.0004
  • ISBN: 978-1-908320-36-0
  • Conference Location: London, United Kingdom