Pages

Tuesday, August 25, 2009

Opinion Mining and Sentiment Analysis

Textual information in the world can be broadly categorized into two main types: facts and opinions. Facts are objective expressions about entities, events and their properties. Opinions are usually subjective expressions that describe people’s sentiments, oppinions or feelings toward entities, events and their properties.

This area has grown in part as a recation to a surge of interest in opinions as a first-class kind of object of analysis, along with the huge increase in the web textual content, mainly produced by social network users.

Subjective information analysis systems answer questions about feelings and opinions. A crucialstep towards this goal is identifying the words and phrases that express opinions in text. The simplest algorithms work by scanning keywords to categorize a statement as positive or negative, based on a simple binary analysis (“love” is good, “hate” is bad). But that approach fails to capture the subtleties that bring human language to life: irony, sarcasm, slang and other idiomatic expressions. Reliable sentiment analysis requires parsing many linguistic shades of gray.

More sofisticated analysis used include the following tools:


Part of speech taggers:
they identify whether a world that belongs to a sentence is a noun, verb, adverb, etc.. It was found in many researches that adjectives are important indicators of subjectivities and opinions. Thus, adjectives have been treated as special features.
Opinion words and phrases: Opinion words are words that are commonly used to express positive or negative sentiments. For example, beautiful and wonderful are positive opinion words, and negative opinion ones include horrible and terrible. Although many opinion words are adjectives and adverbs; some nouns (rubbish and junk) and verbs (hate and like) can also indicate opinion. Besides, there are also opinion phrases and idioms, like “cost someone a leg.”
Negation: They are important because their presentece often change the opinion orientation. For example, the sentence “I don’t like this camera” is negative. However, negation words must be handled with care because not all occurrences of such words mean negation. For example, “not” in “not only … but also” doesn't change the orientation.
Syntactic dependency: a tree is built from the analized sentence in order to represent it. here we can see "John hit the ball" as an example.
For casual web surfers, simpler incarnations of sentiment analysis are sprouting up in the form of lightweight tools like TweetSentiments and Twitteratr. These sites allow users to take the pulse of Twitter users about particular topics. But the accuracy of their results are not very comparer to the precission Opinion Mining researchers obtained so far (between 70% and 80% of correctly classified sentences of texts).

My favourite application of this kind was made many years ago, long before the hype, by Jonathan Harris, it was WeFeelFine. This application has a very simple opinion mining processing method, but I think, he was able to see what the future mainstream applications will be like before most of us did; he also realized the importance of a good and flexible data visiualization.

Some of the challenges sentiment ming presents include ansewring the following questions:

1. What makes an opinion positive or negative?
2. How can we rank opinions according to their strength?
3. Can we define an objective measure for ranking opinions?
4. How does the context change the polarity and strength of an opinion and how can we take it into consideration?

In order to keep exploring this topics I would recomend you to read Peter Turney's papers and blog and also Maite Taboada's papers and webpage.

50 comments:

Lord of Erewhon said...

My dear, O Bar do Ossian on the 110th Anniversary of Borges' birth!

How come you forget him?
Cheers!

patientanonymous said...

When my hard drive crashed I lost the paper you sent me regarding this as we had talked a bit about it before etc...

I had made some notes but I don't even know if they are still around. I think since you said you were heading in a different direction than we discussed, I threw them out--also because the paper was gone as well!

I'll maybe have a think about this and see what I can come up with. But yes, WeFeelFine is great fun, isn't it? I don't know if that is demeaning at all; calling it "fun" but I think it is.

However, you know me, an Aspie and hypnotizing objects. And speaking of, an Aspie with Alexithymia. Opinions and Sentiments? A bit of difficulty there for me so that may prove problematic for this post.

Anon, my dear, nonetheless.

Charles Gramlich said...

I see more and more opinion in the media, and fewer and fewer facts.

Mariana Soffer said...

Lord of Erewhon
I always forget every single birthday, for example my sister use to get furious at me cause I could not remember the day.
But I did celebrate in a way I spent half the day with my friend who wrote a couple of books about him www.internetaleph.com.
Thanks for reminding me doug

Mariana Soffer said...

patientanonymous
I am so sorry you lost them I did not know cause I have not been reading my emails. I can send you the paper again dough, It was
pretty succesfull in the stetes, but it will be really boring for you, do not want to read fuzzy logic multivariate clustering algorithms and validation of it.

I feel fine was made arround 5 years ago, so of course it will be technologically old right now, nevetheless it still amazes me that project, but I always think of it in the time and context it was created.

Yes I think that I told you that it is kind of ironic that I am doing research in linguistic and specializing about how to express sentiments, while I have a friend who can not handle it verbaly even dough is a human being.

I do not know what is anon, but I will take it as a positive opinionated sentence.

Tale care my friend.

Mariana Soffer said...

Charles Gramlich
He, interesting comment, It reminds me about something I always say which is that there are more and more critics to the new proposal or theories but nathing new proposed for solving the thing they are critisizing.

Hugs my friend

Paul said...

What is the difference between sentiment and opinion mining? Can you clarify other terms related to emotions that are used in this discipline?

Mariana Soffer said...

Paul
This field deals with the computational treatment of opinion, sentiment, and subjectivity in text. Such work has come to be known as opinion mining, sentiment analysis, and/or subjectivity analysis.The phrases review mining and appraisal extraction have been used, too, and there are some connections to affective computing, where the goals include enabling computers to recognize and express emotions. This proliferation of terms reflects differences in the connotations that these terms carry, both in their original general-discourse usages4 and in the usages that have evolved in the technical literature of several communities.

u said...

What does exactely the world subjectivity refer to in this field, does it have any special function in it?

Mariana Soffer said...

U

According to Liu,In 1994, Wiebe, influenced by the writings of the literary theorist Banfield, centered the idea of subjectivity around that of private states, defined by Quirk as states that are not open to objective observation or verification. Opinions, evaluations, emotions,
and speculations all fall into this category; but a canonical example of research typically described as a type of subjectivity analysis is the
recognition of opinion-oriented language in order to distinguish it from objective language. While there has been some research self-identified as subjectivity analysis on the particular application area of determining the value judgments (e.g., “four stars” or “C+”) expressed in the
evaluative opinions that are found, this application has not tended to be a major focus of such work.

Uncle Tree said...

Thank you for letting us into your department, and showing us just what it is you do to make a living, career-wise.

I believe you are saying that I(we all) have opinions. If I were to write an account about an event, or review a certain topic objectively, without directly answering your question, you can still guess my opinion on the matter. Do I have that right?

And the reasons for doing this have to do with finding out which way the general public is leaning, or who's on who's side, or where do they stand? Correct?

One of these days...maybe you can use something I have written as an example, and show me how you break it down, and do the math. Then, perhaps, I could better understand how all this works. Sometimes, even I don't know what I think.

Have a great day, teacher! ~Hugz~

Mariana Soffer said...

Uncle Tree

My pleasure to share this, I am warried indeed that most people do not care about it.

Well what you are telling me it can be done but using another method of natural language processing that is called
entaliment, with this method you can tell if one textual expression implies the other. Because this discipline is usually
handled with easier methods of nlp, which do not handle methapors or do logical inference according to fact of the reality.
Instead it analizes just the plain textual discourse about something, which due to the complexity of language is pretty complex by itself, and then tells you what the opinion orientation was of that thing.

I would love to do that, to explain you a couple of fantastic methods that exist for doing textual procesing, for example there is one I like a lot
which can detect or produce metaphors within a certain topic that is being discussed on the text. But I do not think that will help you to know what you think because it is not a realistic model of the brain functioning.
But I have an other project which is much more complex and attempts to do that. For that project we do have We do have a cognitive theory of the brain, is really concept our design, it is not just for opinion mining, but for any kind of language task, the main idea is that 3 different kind of memories, middle short and long range, and we have repositories of
morphosyntactic structures, and also repositories of text that relate to those structures. So it is a pretty big an complex diagram but the idea is that the parts which are connected among themselves, are generally mediated trough a filter that chooses what to communicate, this filters are of several tiypes, currently we are implementing wavelets (Some strange kind of functions there are used for some things). And the functions of this wavelets adapt given certain learning rates and use metric generated by a metric generator module that distributes those metrics to all the parts where there are needed in the "brain".

Thank you my strong tree

geek said...

Hey, Mariana. How have you been doing? Long time no talk. :)

I was just wondering, that since we are looking to breaking down intro structures the statements with opinions, could we create a system or a device to do this? Also, what about implied opinions? I mean, opinions that are not clearly stated.

Off the topic: the illustration of the syntactic dependency reminds me of a horrible exam I took on it almost 2 years ago.. - tears -

patientanonymous said...

Anon is almost a form of older English. Possibly used in Shakespeare?

It basically means I will or we speak later or I will see you later.

Rayuela said...

Qué buena entrada,Mariana!
Yo opinaré desde el punto de vista del Canal Comunicacional.El lenguaje,además de las palabras, utiliza funciones paralingüísticas para establecer correctamente dicha comunicación.Y, a pesar de estas funciones, muchas,innumerables veces, la comunicación falla.Esta es una aseverción de la lingüística actual.Cuánto más difícil se hará la emisión,recepción y decodificación de un mensaje en las redes sociales, donde la función paralingüística no está presente.

Un beso.(ya responderé tu mail)

Paul said...

At some point every scientist should be asked why? This is very sophisticated monitoring equipment that you are developing.

Rick said...

Mariana, may I ask what you think the impact of this mining has on text patterns and expressions? I'm thinking along the lines of trying to locate the position of an electron modifying its location, if you see what I'm saying. Or, perhaps, the way behavior changes in children when they know that they are being closely watched.

Dave King said...

My first reaction was that this is too academic for me, but further reflection makes me wonder, so I may explore it a bit. Thanks for the intro.

Mariana Soffer said...

geek
Nice to hear from you, thanks for stepping by, well I wrote you a little bit about my early, but basically busy with stuff, and all right. I would like you to tell me a little bit about how you ve been.

Of course we can, there are programms that make this for you, well actually you have to programm them and feed them with examples of proper structures and what they mean in order to work. There is also the choice of writting the structure all by yourself (with a group of linguists), and use this hand made info to parse (thank is how it is called) the structued stuff, pleas do try the following online demo for doing this, it will give you a clearer example for you to understand.
http://garraf.epsevg.upc.es/freeling/demo.php change the language to english, input an english text, and select shallow or dependency parsing as output, and you will see the structure made. It is important also what you say about implicit opinions about things, that is a field that is beeing studied and is very difficult indeed to detect, there is a tool for example that is called LSA (latent semantic analysis) that can be used for that, cause it automatically detects hidden information from the text by doing matrix operations, and statistical tests.

I am sorry I brought you back bad memories, but you have to resignificate them. I myself hated syntax, I could never learn anything about it until a couple of years.

Take care

Mariana Soffer said...

Patientanonimous

Thanks for the informatin, it is very intereresting to see also how did language evolved, or even a new is created by branching from a main language current and gradually differentiating itself.

Mariana Soffer said...

Rayuela says
Good entry mariana!
I will give my opinion standing from the communicational

Language, and words, use paralinguistic functions to establish correctly a communication. And even dough we use those functions many times, tons of, communiations fails. This is what actual linguistics sustains.It is made harder harder the emision, reception and decoding of a message in the social networks, where the paralinguistic function is not present.

A kiss

Mariana Soffer said...

Rayuela
Thanks a lot for the compliment my friend.
I tend to think a lot about the topic you are mentioning here, what I always reflect is that nowadays misunderstandings are much more common to happen among people trying to transmit information among them, because we ar using much less expressive ways of transmitting it, for example the phone was much better because you could infer many things from the tone of voice, speed, emphasis in things and so on. But now for example if you use twitter you only have 140 characters to say something, whitout any extra content being transmited than the words and punctuation mark characters.
Besides I think that language is highly ambiguous, some sentences can have many different meanings according to the subject they are talking about, who is saying it, the emotional state of the person, the use of irony and many other tings, therefore we need methods to desambiguate the message, for which paralinguistic functions where pretty usefull.
Also nowaday there is probably a higher rate of noise/information being transmited (this commes from shannon entrophy theory) than there was before. Which also indicates less understanding among transmitter and receptor.

On the other side I think we are developing a more uniersal language for SN, which is less ambiguous but also less rich in information quality being transmited.

Hugs

Mariana Soffer said...

Paul:
I absolutely agree with you it is fundamental, otherwise things hae no sense, no meaning. Generally this is used for marketing purposes, for helping companies make more money with the products of serives they sell, but of course I am not really interested in it by itself. I do use resources that have been made for this purpose dough, but I use them as a way of training myself for the real interesting objective this things can be used for.
I can not disclose what I intend to do with this techniques and tools, first because I have an idea, but I am not exactely sure how to implemented, when, where, and other doubts. But I will really apreciate any suggestions about what you think that could be done, that is usefull and beneficial for society, using this new NLP techniques and the data/information that SN and blogs alow us to extract from them.

Take care my friend, and very good point to mention.

Mariana Soffer said...

Rick
Well, I do not know if you remember but there was a peak of paranoia a few years ago about emails being analized with this techniques, because everybody was suspected to be a terrorist, therefore people where highly indignated about the government applying this new trends. Anyway the same happens with cameras or microphones that are everywhere and are recording a lot of details of our life.
I do not see people being much worried or protesting much about this, eventually people start complaining and then they stop doing it. But I do not think they change they behaviour because of being watch indeed. Unlike the particle trail that is altered by the observer of it.

Take care rick

Mariana Soffer said...

Dave King
I know, I was haing serious doubts myself about publishing it, but anyway I thought I will give it a try, maybe people will be interesting about it.

My pleasure to explain it to you, check it out maybe there are other texts that might be of more interest to you, since natural language processing has tons of sufields

Cheers

Ted Bagley said...

The essence of communication is misunderstanding no matter what form it's in.

Mariana Soffer said...

ted bagley
I am not sure what to tell you about it, it is interesting what you proposed, but on the other hands artificial communication can work pretty well, with the communication protocolos and all its stuff, on the other side human communication can be like you say, but I guess there is a part which is understood and there is a part that never is.

Snowbrush said...

Certainly, the answers pollsters get depend upon how the questions were framed.

Ted Bagley said...

Artificial as apposed to what? Natural? If so,what would be the difference?

Ted Bagley said...

What your essay seems to be discribing is a cooperation between the University and Master Discourses.

Mariana Soffer said...

Snowbrush
Hi snow brush, thanks for stepping by. I enjoy your company.

And you are right about surveys results being completely biased by the way questions are made, but they try to do it the most impartial that they can, and also to measure responses with metric that helps them analize also in the most objectively possible way.

bye bye

Mariana Soffer said...

Ted bagley
I refer to artificial as for example how machines transmit data among themselves, how in one machine is stored the webpage and trough the tcpip protocol it transmits the information towards it, and they understand perfectly.

And by natural I was refering communicating information among human beings wich always has misunderstandings given the different interepresentation, subjectivities and cultural background of the people

Mariana Soffer said...

Ted Bagley
I do not know what you do refer about master discourses, I can not think what you mean, is it about the main discourses theories, if so which ones, do you have an example about what a master discourse is?

Jenny said...

Hi there,

Very interesting post! The differences between what is generally regarded as positive/negative opinions and subjective/objective seem to be forever changing rapidly like any trends.

Opionion words and phrases is a very interesting field. For instance, words like "gay" or "fairy" contain so many meanings at the same time.

Ted Bagley said...

Mariana,
Machines, then, actually do not communicate as much as exchange coding which has no amgiguous meaning, so there is no need for understanding. Understanding would imply a reasoning out pertaining to your natural definition.

Second- Reference to Lacan's Four Dicourses.

I made myself more clear to your comment on my post.

Nice to talk with you!

Mariana Soffer said...

Jenny
Thank you very much for your nice compliments.

It is exactelly like you say that things are changing rapidly, very interesting insight about the subject you made, I have not considered it the point yet, I guess now I will have to take that into account for my programs to work better.

In generall there are many words taht have many meanings at the same time, but the way to solve that ambiguity is to look at the context and voila, for example if you use the word vet, it could be relativelly to the war or to a veterinary, you know which meaning it is refearing to according to the context the word is located in, if you wanna check some more about it, there is a proyect called wordnet that has all the english words groups in things called synsets which are synonims, and many of them appear in several synsets with different syntactical functions (noun, verb).

It is a very difficult task to disambiguate meaninings indeed.

If you are courious about other stuff let me know and I will be happy to explain

Take care dear

Mariana Soffer said...

Ted Bagley
Well you are right that in a way they do not communicate but the transmission of digital information can also be called telecomunications, that is why I used that world. And although they do not reason as we do, they do have intelligence, protocols for transmissions can be pretty complex (a protocol is an agreement about how you are going to transmit and recive the information) For example you can send packages of data of different length, which can have redundancy checks and if they are not transmitted okey they have methods for being auto-corrected by the receptor.

Regarding lacan I found this good resume about what he refers to:"Lacanian Theory of Discourse provides an account of how language both interacts with and constitutes structures of subjectivity, producing specific attitudes and behaviors as well as significant social effects."
Which I think derives in very interesting stuff that I have been reading a little.

Anyway the theory of discourse it is amazing by itself, there is a field in which I have been working at which is the study of the coherence of the discourse in texts, which is interesting to myself. You make me want to read more about lacan and discourse theory, thanks!

I will check your post in a sec.

Thanks for your interesting comments

Ted Bagley said...

Your example is based on programming.
I think it's a stretch to hook coding with intelligence since the protocols for transmission are supplied to them and it's called telecommunications because it humans doing the commnicating through such and such..., but we can leave it there, if you wish.

Yes, Lacan is very insightful. :)

Good night Mariana.

Mariana Soffer said...

Ted Bagley

I know ted that it is not the same, I am not that stupid not to understand it myself, the thing is that humans called those things with the same name, even dough they are different concepts and things the ones they do represent.

Good night for you to friend

Ted Bagley said...

I wasn't implying you were stupid, Mariana. Far from it. I just figured that the key word in your post was "communication", said or unsaid, based on your comments.
By the questions at the end I couldn't help feeling that the analysis tools are used to support a manipulation of communication. I'm kinda freaky that way,though.
Now I'm really going to bed!

Mariana Soffer said...

Ted Bagley
I knew my friend you did not mean it it is just an expresion, maybe it sounds to strong in english, my apologies if that is the case for it.

It is very interesting what you say about the key being communication, and I think it is also really accurate.

sadly they are use for manipulation prety often, but that is one thing that if you know enough about the field you can fight again that manipulating companies, and help society deal with it.

Ted Bagley said...

That's such a cute thought at the end I could just pinch your cheeks.
No really, now I have to go to bed.

Mariana Soffer said...

Ted Bagley
Have a good night sleep (Although I know I am whishing it 3 days laters, I whish it all the same for you).

bookmanie said...

Thank you for your comment on my blog. I hope it (my blog) in lensemble you want too. Bookman.

Mariana Soffer said...

Bookmanie

Thanks dude, but I do not understand what you mean by the words in lensemble you want too. I do not know if it s my english or you mispelled a word or two.
Anyway nice to say hi to you

Anquirens said...

Hi Mariana. Very good post. Do you know any papers about spanish opinion mining?

Mariana Soffer said...

Sure, severals, what are you looking for?

Anquirens said...

I am beginning a research in opinion mining for texts written in spanish, so any hints of where to begin will be great.

Mariana Soffer said...

Ok, here is the first tip, check maite taboadas work and corpus, she worked on it, she is really friendly, but you should understand there are not many spanish resources, so if you want to make something for real you have to collect the corpus and somehow tag it yourself. Wny don t you send me you information to my marianasoffer at gmail.com, and we can chat there, and I can understand where do you want to go more, because there are not books about opinion mining, no nothing in spanish language, so you have to understand it more thoroughly, and probably use the methods english speakers uses, altnough some of them adapted, like the porter stemmer.

Cheers
M

Anonymous said...

I would like to exchange links with your site www.blogger.com
Is this possible?