Pages

Friday, March 27, 2009

Intresting facts about NLP

-The Turing test is a proposal for a test of a machine's ability to demonstrate intelligence. Described by Alan Turing in the 1950 paper "Computing Machinery and Intelligence," it proceeds as follows: a human judge engages in a natural language conversation with one human and one machine, each of which tries to appear human. All participants are placed in isolated locations. If the judge cannot reliably tell the machine from the human, the machine is said to have passed the test. In order to test the machine's intelligence rather than its ability to render words into audio, the conversation is limited to a text-only channel such as a computer keyboard and screen. This test was the first mainstream experiment related to NLP.

-Text is the largest repository of human knowledge and is growing quickly, there are emails, news articles, web pages, chat archives, scientific articles, insurance claims, customer complaints letters, transcripts of phone calls, technical documents, government documents, patent portfolios, court decisions, contracts, and so on.

-Nowadays we have access to huge amounts of information, much more than in the past decades, one of the problems with this is that we are not reading any faster than before, therefore we can not take full advantage of this new situation. NlP tries to optimize the human usage of information.

-Dealing with natural language is a difficult task. We need to understanding multiple disciplines including multivariate statistics, learning algorithms, clustering, hidden Markov models and part of speech tagging. We need to have knowledge about language, grammar, ontology and folksonomy.

-Processing of a huge amount of data in a limited amount of time is required so special algorithms are needed. We generally apply algorithms that have low computational cost or algorithms that allow reducing the amount of computational processing needed by pre-processing the data we have. To do this there are techniques for reducing the size of the text by extracting stop words, removing words that appear too often and also words that appear very few times.

-The applications of NLP include answering queries, identifying spam, recognizing what is the main theme of a document, grouping similar texts, obtaining the main keywords of a document, detecting syntactic errors and identifying the secondary themes of a document.

No comments: