To learn rich and deeper levels of analysis and its resources, we will go beyond the scope of bag of words approach.
The idea of sentiment analysis is identifying polarity of the moods. We train the moods and model them as supervised algorithm. The moods are defined by multiple words and the frequency of occurrences of already known representations in words or phrases of such polarity that represents the culture are used to train the model, as a supervised learning.
The common methods and an example application area:
- Polarity analysis: Analysis of reviews is an important application here. Even though this contains latent class ideas, the well defined polarized independent variables (words) helps us to avoid the challenges of deep latent variable challenges
- Subjectivity/objectivity identification: Classifying corpus into one of subjective or objective content. This has implied challenges because the collection of words that defines the subjective/objective differentiation are themselves influenced by the classification we intend to do, This is because the polarity of subjectivity/objectivity differences possibly are not well defined.
- Features/Aspects analysis: This offers analysis of what matters most on sub-areas of polarity identification. For example, in a review of a restaurant, the food is great but service and hygiene could be better. Here we use neighboring phrases as important contributor for sub-polarities.
The methods combine the knowledge base of meaning of sentiments/likeability and objective/subjective interpretations, statistical and machine learning approaches, and grammar and culture in a language; the last topic is the most difficult one to achieve a human level proficiency, because the levels of depth in the nature of usage of words can involve many levels of latent intelligence and their interactions.
Some key references:
RTextTools: A Supervised Learning Package for Text Classification
Introduction to text analysis with an application to cluster analysis
Twitter analysis for stock market sentiment
Measuring the Happiness of Large-Scale Written Expression: Songs, Blogs, and Presidents
Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment
https://www.google.com/patents/US20090282019 - A patent on how to implement text analytics server