Sentiment Analysis Methods

Abstract

As more and more information becomes freely available on the Internet, there appears to be an increasing demand for tools that would analyze information and provide valuable insights. One of such tools is sentiment analysis. The goal of sentiment analysis is to extract an opinion from text data and classify it into positive, neutral or negative. Millions of opinions, which are expressed on the Internet, about a product can be analyzed in a matter of seconds, providing insights about the appealing of the product to the general public based on the ration of positive to negative feedbacks. It is not feasible to complete such classification manually until, so the help of computer aid is needed. The goal of this paper is to overview available methods and algorithms that are commonly used for solving the problem of sentiment analysis. A basic machine learning algorithm has been built based on Naïve Classifier and Bag of Word model. The results was satisfying with a relatively height accuracy of classification.

I. Introduction

Web 2.0, known for a large amount of user generated data, brought an ability for Internet users to work with the web content in a new way. Internet users don’t just read web pages, they interact with them by leaving comments, sharing with social network friends, etc. Multiple website like collect product and reviews from Internet users. Social networks allow everyone freely express opinions on current events. One might start questioning if this data could be put to better use? Could new meaning extracted from abundance of already available data? Sentiment analysis takes upon this very challenge Sentiment analysis is a type of subjective analysis that focuses on identifying positive and negative opinions, emotions, and evaluations expressed in natural language [1]. Sentiment analysis is a finer grain analysis compared to subjectivity analysis, since the purpose of subjectivity analysis is only to find out whether a statement is subjective or objective. In other words, subjectivity analysis classifies text data into facts and opinions. Facts are objective statements whereas opinions are subjective statements that reflect on a person’s perception about an event or entity. Sentiment analysis classifies all subjective statements into positive and negative. Note, that all objective statements are neutral.

Sentiment Analysis vs Subjectivity AnalysisFigure 1: Sentiment Analysis vs Subjectivity Analysis

Sentiment analysis is concerned with personal feelings, interpretations, prejudice, opinions, sentiments, emotions, evaluations, beliefs, speculations, etc. These emotions don’t have to be based on facts and can be biased. Given a set of text data D, the task of sentiment analysis is to find whether each document d ∈ D expresses a positive, or negative opinion on a specific object. For example, given a set of blogs on movie reviews, the system classifies them into positive reviews and negative reviews. This is similar to a supervised classification method but different from the regular topic based text classification, which classifies documents into predefined topic classes, e.g., sports, art etc. In topic-based classification, topic related words are important. However, in opinion classification, topic-related words are not very important but, opinion words that indicate positive or negative opinions are important, e.g., great, excellent, amazing, horrible, bad, worst, etc. Most of the methodologies for opinion mining apply some forms of machine learning techniques for classification. Customized-algorithms specifically for opinion classification have also been developed, which exploit opinion words and phrases together with some scoring functions [5]. Sentiment analysis is often referred to as opinion mining. It is considered to be a recent discipline that uses information retrieval processes and computational linguistics. A sub-task of sentiment analysis is finding individual words that tell us about option that is being expressed. The result of this research turned into the following  Web Application.

II. Applications

Scope of new applications is one of the reasons of the growing attention to the challenges that sentiment analysis poses [8]. Applications include determining critics’ opinions about a given product by classifying online product reviews, or tracking the shifting attitudes of the general public towards a movie star by mining online forums or blogs [5], prediction of election results, alternative method of polling a public without specifically asking for it, and more. Amazon uses a form of sentiment analysis to identify positive qualities of a product under review (Figure 2).

Image Figure 2: Amazon identifies the most positive reviews of a product and uses it to encourage customers to buy the product

USA Today presented a project that includes analysis of millions of Tweets to extract data about success of the candidates during every day of election campaign (Figure 3) [3]. This would be impossible for a dedicated set of people to classify an average of 2 million messages to make a classification. Clearly, the task needs to be automated

USA Today analysis of Tweets about presidential candidates during elections.

Figure 3: USA Today analysis of Tweets about presidential candidates during elections.

Companies are known to use sentiment analysis to mine data about which particular feature is the most liked by customers and which needs to be improved. This knowledge gives them a string selling point and a sense of direction of what needs to be worked on next. Measuring a trend is another possible application. For example, after a service interruption that made customers angry a company might want to know if anger rising up or cooling down? Did the anger cool down after customers were given a month of free service? Answers to these questions can be easily provided by sentiment analysis if the number of positive and negative feedbacks are plotted along with the time. Most applications have the following process to it: text data is collected from one or multiple source with a relevance to a given brand or product; text data is sent to sentiment analyzer and classified to positive, negative and neutral; results of the sentiment analysis are presented to a user as text or graphical data, summarizing trends; by looking at the presented data, user or a company can commit an action based on the insights gained from presented data [9]. All of this can happen in real time, almost immediately after new text data arrives in the data source.

III. Data

Data Availability

Multiple data sources that are suitable for sentiment analysis are publicly available. Some of them include: Facebook with an OpenGraph API data access, Twitter with Twitter API, Internet Movie Database with movie reviews, and more. Also composed datasets with labels assigned to data are composed by other researcher likes Bo Pang and Lillian Lee [8]. Performance of sentiment analysis algorithms depends on a chosen datasets. Some classifications are harder to make than others. According to Turney [10], “It appears that movie reviews are difficult to classify, because the whole is not necessarily the sum of the parts; thus the accuracy on movie reviews is about 66%”. Presents of jargon, variation in vocabulary, and the amount of misspelled words differ between datasets. It is important to use the same data source for both training and classification phases of sentiment analysis.

Data Quality

Data quality might be a big problem. We cannot rely on source like Twitter or Facebook as a scientific sample of voters as we know almost nothing about people who are publishing their opinions online. On another hand, ignoring all these data would be foolish since it can provide valuable insights. Democratic pollster Mark Mellman isn’t sure if Twitter sentiment analysis capable of reflecting common public opinion or predict the result of elections. Yet, he says “anybody who’s really interested in understanding political dynamics is going to be interested in the ebb and flow of these numbers. They do reflect something about the tone and intensity of the political conversation that is going on in this country.” the index is not a substitute for polling, he adds, likening it to a “barometer” of political opinion rather than the “thermometer” that polls provide because as Rishab Ghosh puts it “we aren’t asking anybody anything. People are saying things on their own.” [4] We can conclude that by using publicly available data from social networks, we cannot guarantee to obtain absolutely accurate results, yet not using this freely available data is also not acceptable. Results obtained with sentiment analysis tools can provide important insights, yet they have to be looked at with a skepticism.

IV. Algorithms

Most of the methodologies for sentiment analysis apply some forms of machine learning techniques for classification [5]. Some of the most commonly used include Singular Value Decomposition (SVD), decision trees, Naïve Bayesian Classifier, neural networks, random forests, and more.

Image
Figure 4: Decision tree structure

Decision tree are often chosen because they provide an easy way to see why the algorithm arrived to the conclusion that it arrived to. With decision tree, it is easier to debug or spot a mistake in the model. Implementation of a decision tree is relatively easy too. We start from an empty tree, then split out data set on the next best feature and recurs on each leaf. Decision trees are well researched and techniques that allow training on extremely large data sets are developed. Node of the decision trees contain the word that must be present in the text document in order for the algorithm to take that branch of the tree. The leaf of the tree contains an answer to our classification problem – it states whether the text carries positive, negative or neutral sentiment.

Image
Figure 4: Naïve Bayesian Network

A Naive Bayes Classifier (Figure 4) is a simple probabilistic classifier based on applying Bayes’ theorem (Figure 5) with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be “independent feature model”. Meaning that appearance of one word doesn’t increase the likelihood of presence of another word. This is an idealized condition and rarely holds in real life. Nonetheless, Naive Bayes Classifier proven to be an accurate enough for the given task. Image Figure 5: Bayes’ theorem

V. Models

Bag of words is a simplified representation of text used in natural language processing. Bag of words model represents text as an unordered collection of words. The bag-of-words model is commonly used in methods of classification, where the frequency of occurrence of each word is used as a feature for training a classifier. Bag of words has no sense of context: “cool” is positive if used to describe a car, but negative if used to describe someone’s demeanor. “Environmental Trust” vs “He has won their trust.” Even problems that seem to be easy to solve turned out to be challenging. Consider this local case of negation: “not good.” It could be solve by replacing all occurrences of the word “not” with the next word that is following it. But this still doesn’t solve our problem completely, since natural language allows us to use long distance dependency as such: “does not look very good.” Previous solution is not relevant in this case. Bag of words proven to work in majority of the cases, but because of the collection of words is unordered in this overly simplified representation, problems may arise. For example, the order of words in this sentence is very important “Company A is a much more successful company than Company B.” Because one company is compared to another, it’s important to preserve the order. In this case, a more complicated model is used. Every word in the sentence gets annotated (Figure 6). If properly annotated, the results of classifier will greatly increase. Yet, the task of automating data annotation is not trivial. Rules for every natural language under analysis must be developed. Manual annotation is also possible, but not practical. Data annotation is a separate problem in computation linguistics. Image Figure 6: Example of data annotation. The dependency tree for the sentence: “The human rights report poses a substantial challenge to the U.S. interpretation of good and evil.” [7]

VI. Challenges

Commonly used and irrelevant words Commonly used words do not contribute to the classification and increase processing time. For example: “the”, “an”, “to”, “of”, “in” … Some words might be irrelevant for the datasets under study. For example, the word “movie” is irrelevant for classification of movie reviews since this word will likely to be in every document that we are classfying. It is helpful to combine different grammatical forms of the same words such as “traveling,” “traveled,” “travel,” etc.[5] If a word was encountered only a few times, we shouldn’t allow it to determine the outcome of a classification. Also, it a word with doesn’t appear in our training data set would have the probability of zero. If Naive Bayes Classifier is used, this would turn denominator to zero. One of the proposed solutions is to build a word frequency list and remove top and bottom 15% of the words. Image Figure 7: Rule to remove the most frequent and rare words [5]

VII. Implementation

As an experiment a Naïve Bayesian Classifier using Bag of Words Model has been based on the material described above. First of all, the top common words were excluded to speed up processing: “the”, “be”, “to”, “of”, “and”, “a”, “in”, “that”, “have”, “I”, “it”, “for”, “on”, “with”, “he”, “as”, “you”, “do”, “at”, “this”, “but”, “his”, “by”, “from”, “they”, “we”, “say”, “her”, “she”, “or”, “an”, “will”, “my”, “one”, “all”, “would”, “there”, “their”, “what”, “so”, “up”, “out”, “if”, “about”, “who”, “get”, “which”, “go”, “me”, “when”, “make”, “can”, “like”, “time”, “no”, “just”, “him”, “know”, “take”, “people”, “into”, “year”, “your”, “some”, “could”, “them”, “see”, “other”, “than”, “then”, “now”, “look”, “only”, “come”, “its”, “over”, “think”, “also”, “back”, “after”, “use”, “two”, “how”, “our”, “work”, “well”, “way”, “even”, “new”, “want”, “because”, “any”, “these”, “give”, “day”, “most”, “us” The next step was to analyze a dataset of known positive and negative documents and frequency dictionary. Training set of 1300 positive and 1390 negative Tweets was for training set. Figure 8 presents top 10 most frequent words that are found in positive and negative documents.

Figure 8: Top 10 most frequent words that are found in positive and negative documents.

Next step would be to apply Bayesian theorem to find a probability of given word being in a positive document:

Image

Figure 9: Bayesian theorem as it relates to sentiment analysis

Once the probabilities of a word appearing in a positive document found for each word in the document, we need to combine those probabilities together (Figure 10).

Formula for combining individual probabilities

Figure 10: Formula for combining individual probabilities

Define a document to be neutral if it has a probability of 0.5. Define it to be negative if it has a probability less than 0.5, and positive it has the probability of more than 0.5.

VIII. Results

Let algorithm classify 100 messages that are known to be negative and 100 documents that are known to be positive.

Predicted
Negative Neutral Positive
Actual
Positive 5 43 52
Negative 69 24 7

IX. Conclusion By using Naïve Bayesian Classifier based on Bag of Words Model, we achieved a relatively height accuracy in our classification. Implementation based on more complex machine learning algorithms (such as neural networks) and using more advanced models for data representation (such as a model with data annotation) is expected to produce even more accurate results. Twitter was used as a source of documents for classification and training set. This data source uses a lot of jargon and misspellings are common. Different data source is expected to produce a different result.

References

[1] Wiebe, Janyce. 1994. Tracking point of view in narrative. Computational Linguistics, 20(2):233–287.

[2] Turney, P. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. ACL’02, 2002.

[3] “Twitter Election Meter.” USA TODAY. Web. 12 April. 2013. <http://usatoday30.usatoday.com/news/politics/twitter-election-meter>.

[4] Martha T., Moore, and TODAY USA. “Index taps Twitter sentiments.” USA Today n.d.: Academic Search Premier. Web. 1 Apr. 2013.

[5] Valarmathi, B., and V. Palanisamy. “Opinion Mining Classification Using Key Word Summarization Based On Singular Value Decomposition.” International Journal on Computer Science & Engineering 3.1 (2011): 212-215. Academic Search Premier. Web. 1 Apr. 2013.

[6] http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-python-and-nltk/

[7] Wilson, Theresa, Janyce Wiebe, and Paul Hoffmann. “Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis.” Computational Linguistics 35.3 (2009): 399-433. Academic Search Premier. Web. 1 Apr. 2013.

[8] Pang, Bo, and Lillian Lee. “Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales.” Carnegie Mellon University. Web. 12 May 2013. <http://www.cs.cornell.edu/home/llee/papers/pang-lee-stars.pdf>.

[9] Kennedy, Helen. “Perspectives on Sentiment Analysis.” Journal of Broadcasting & Electronic Media 56.4 (2012): 435-450. Academic Search Premier. Web. 25 Apr. 2013.

[10] Turney, P. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. ACL’02, 2002.