Pages

Showing posts with label opinion mining. Show all posts
Showing posts with label opinion mining. Show all posts

Tuesday, August 25, 2009

Opinion Mining and Sentiment Analysis

Textual information in the world can be broadly categorized into two main types: facts and opinions. Facts are objective expressions about entities, events and their properties. Opinions are usually subjective expressions that describe people’s sentiments, oppinions or feelings toward entities, events and their properties.

This area has grown in part as a recation to a surge of interest in opinions as a first-class kind of object of analysis, along with the huge increase in the web textual content, mainly produced by social network users.

Subjective information analysis systems answer questions about feelings and opinions. A crucialstep towards this goal is identifying the words and phrases that express opinions in text. The simplest algorithms work by scanning keywords to categorize a statement as positive or negative, based on a simple binary analysis (“love” is good, “hate” is bad). But that approach fails to capture the subtleties that bring human language to life: irony, sarcasm, slang and other idiomatic expressions. Reliable sentiment analysis requires parsing many linguistic shades of gray.

More sofisticated analysis used include the following tools:


Part of speech taggers:
they identify whether a world that belongs to a sentence is a noun, verb, adverb, etc.. It was found in many researches that adjectives are important indicators of subjectivities and opinions. Thus, adjectives have been treated as special features.
Opinion words and phrases: Opinion words are words that are commonly used to express positive or negative sentiments. For example, beautiful and wonderful are positive opinion words, and negative opinion ones include horrible and terrible. Although many opinion words are adjectives and adverbs; some nouns (rubbish and junk) and verbs (hate and like) can also indicate opinion. Besides, there are also opinion phrases and idioms, like “cost someone a leg.”
Negation: They are important because their presentece often change the opinion orientation. For example, the sentence “I don’t like this camera” is negative. However, negation words must be handled with care because not all occurrences of such words mean negation. For example, “not” in “not only … but also” doesn't change the orientation.
Syntactic dependency: a tree is built from the analized sentence in order to represent it. here we can see "John hit the ball" as an example.
For casual web surfers, simpler incarnations of sentiment analysis are sprouting up in the form of lightweight tools like TweetSentiments and Twitteratr. These sites allow users to take the pulse of Twitter users about particular topics. But the accuracy of their results are not very comparer to the precission Opinion Mining researchers obtained so far (between 70% and 80% of correctly classified sentences of texts).

My favourite application of this kind was made many years ago, long before the hype, by Jonathan Harris, it was WeFeelFine. This application has a very simple opinion mining processing method, but I think, he was able to see what the future mainstream applications will be like before most of us did; he also realized the importance of a good and flexible data visiualization.

Some of the challenges sentiment ming presents include ansewring the following questions:

1. What makes an opinion positive or negative?
2. How can we rank opinions according to their strength?
3. Can we define an objective measure for ranking opinions?
4. How does the context change the polarity and strength of an opinion and how can we take it into consideration?

In order to keep exploring this topics I would recomend you to read Peter Turney's papers and blog and also Maite Taboada's papers and webpage.