Thursday, October 9, 2014

                                              
Michalis founded DigitalMR in 2010 following a corporate career in market research with Synovate and MEMRB since 1991. This post was first published on the DigitalMR blog. Explore the blog here: www.digital-mr.com/blog

It took a bit longer than anticipated to write Part 3 of a series of posts about the content proliferation around social media research and social media marketing. In the previous two parts, we talked about Enterprise Feedback Management (December 2013) and Short -event-driven- Intercept Surveys (February 2014). This post is about sentiment and semantic analysis: two interrelated terms in the “race” to reach the highest sentiment accuracy that a social media monitoring tool can achieve. From where we sit, this seems to be a race that DigitalMR is running on its own, competing against its best score.
 
The best academic institution in this field, Stanford University, announced a few months ago that they had reached 80% sentiment accuracy; they since elevated it to 85% but this has only been achieved in the English language, based on comments for one vertical, namely movies -a rather straight-forward case of: “I liked the movie” or “I did not like it and here is why…”. Not to say that there will not be people sitting on the fence with their opinion about a movie, but even neutral comments in this case, will have less ambiguity than other product categories or subjects. The DigitalMR team of data scientists has been consistently achieving over 85% sentiment accuracy in multiple languages and multiple product categories since September 2013; this is when a few brilliant scientists (engineers and psychologists mainly) cracked the code of multilingual sentiment accuracy!
Let’s dive into sentiment and semantics in order to have a closer look on why these two types of analysis are important and useful for next-generation market research.
 
Sentiment Analysis
 
The sentiment accuracy from most automated social media monitoring tools (we know of about 300 of them) is lower than 60%. This means that if you take 100 posts that are supposed to be positive about a brand, only 60 of them will actually be positive; the rest will be neutral, negative or irrelevant. This is almost like the flip of a coin, so why do companies subscribe to SaaS tools with such unacceptable data quality? Does anyone know? The caveat around sentiment accuracy is that the maximum achievable accuracy using an automated method is not 100% but rather 90% or even less. This is so, because when humans are asked to annotate sentiment to a number of comments, they will not agree at least 1 in 10 times. DigitalMR has achieved 91% in the German language but the accuracy was established by 3 specific DigitalMR curators. If we were to have 3 different people curate the comments we may come up with a different accuracy; sarcasm -and in more general ambiguity- is the main reason for this disagreement. Some studies (such as the one mentioned in the paper Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews) of large numbers of tweets, have shown that less than 5% of the total number of tweets reviewed were sarcastic. The question is: does it make sense to solve the problem of sarcasm in machine learning-based sentiment analysis? We think it does and we find it exciting that no-one else has solved it yet.
Automated sentiment analysis allows us to create structure around large amounts of unstructured data without having to read each document or post one by one. We can analyse sentiment by: brand, topic, sub-topic, attribute, topic within brands and so on; this is when social analytics becomes a very useful source of insights for brand performance. The WWW is the largest focus group in the world and it is always on. We just need a good way to turn qualitative information into robust contextualised quantitative information.
 
Semantic Analysis
 
Some describe semantic analysis as “keyword analysis” which could also be referred to as “topic analysis”, and as described in the previous paragraph, we can even drill down to report on sub-topics and attributes.
 
Semantics is the study of meaning and understanding language. As researchers we need to provide context that goes along with the sentiment because without the right context the intended meaning can easily be misunderstood. Ambiguity makes this type of analytics difficult, for example, when we say “apple”, do we mean the brand or the fruit? When we say “mine”, do we mean the possessive proposition, the explosive device, or the place from which we extract useful raw materials?
Semantic analysis can help:
  • extract relevant and useful information from large bodies of unstructured data i.e. text.
  • find an answer to a question without having to ask anyone!
  • discover the meaning of colloquial speech in online posts and
  • uncover specific meanings to words used in foreign languages mixed with our own
What does high accuracy sentiment and semantic analysis of social media listening posts mean for market research? It means that a 50 billion US$ industry can finally divert some of the spending- from asking questions to a sample, using long and boring questionnaires- to listening to unsolicited opinions of the whole universe (census data) of their product category’s users.
 
This is big data analytics at its best and once there is confidence that sentiment and semantics are accurate, the sky is the limit for social analytics. Think about detection and scoring of specific emotions and not just varying degrees of sentiment; think, automated relevance ranking of posts in order to allocate them in vertical reports correctly; think, rating purchase intent and thus identifying hot leads. After all, accuracy was the only reason why Google beat Yahoo and became the most used search engine in the world. 

0 komentar:

Post a Comment

LightBlog

BTemplates.com

Categories

#BigData (1) #bookofblogs (6) #einterview (5) #nsmnss (21) #SoMeEthics (2) AHRC (1) Amy Aisha Brown (2) analysis (2) analytics (1) API (1) auxiliary data source (1) Big Data (8) big data analytics (1) blog (14) blogging (7) blogs (8) Book of blogs (3) book review (8) case studies (1) Christian Fuchs (1) coders (1) cognition (1) community (2) community of practice (1) computer mediated (1) conference (3) content analysis (1) crowdsourcing (3) data (1) data access (1) Data Base Management System (1) data linkage (1) data protection (1) definitions (4) demographics (1) Dhiraj Murthy (1) digital (3) digital convergence (1) Digital debate (7) digital humanities (1) dissemination (1) Dr Chareen Snelson (2) Dr Sarah-Louise Quinnell (1) Dr Steve Jones (1) e interviews (2) e-privacy (1) ECR (1) einterview (2) empathy (1) Eran Fisher. (1) ESRC (2) ethics (13) event (3) facebook (3) fanfiction (1) funding (2) Geert Lovink (1) graduate (3) guidelines (5) hootsuite (1) HR (1) identity (3) impact (1) imputation (1) international research (2) janet salmons (7) Japanese (1) Jenna Condie (1) jobs (1) Katheleen McNiff (2) Language (1) learning (1) linguistic anthropology (1) Make Money (2) Mark Carrigan (1) market research (2) media (2) methods (1) mixed methods (1) natcen (1) NCapture (1) netnography (2) network (3) Networked Researcher (1) networked spaces (2) new media (2) NVivo (2) Online (2) online communities (1) online footprint (2) online interview research (2) online personas (2) online research (2) organisational management (1) ownership (1) Paolo Gerbaudo (1) phd (2) PhDBlogger (2) politics (1) power (1) privacy (4) QSR International (1) Qualitative (4) qualitative research methods (6) Quantitative (4) Recruitment (1) research (8) research methods (8) researcher (2) RSS (1) RTI International (3) rumours (1) SAGE (1) Sampling (3) semantic analysis (1) semantics (1) sentiment (1) sentiment accuracy (1) Sherry Turkle (1) small data (1) small datasets (1) social media (36) Social Media MA (10) Social Media Managment System (1) social media monitoring tools (2) social media research (12) social science (4) Social Science Space (2) social scientists (6) social tensionn (1) sociolinguistics (1) sociology (3) software (2) statistics (1) Stories (1) storify (1) surveillance (2) survey (4) teaching (2) technologies (4) tools (2) trust (1) tweet chat (11) Twitter (20) University of Westminster (13) user views (1) video interview (7) vlogging (9) web team (4) webinar (2) weighting (1) YouTube (10)
Responsive Ads Here

Recent

Recent Posts

Navigation List

Popular Posts

Blog Archive