Friday, April 22, 2016


Dr Luke Sloan is a Senior Lecturer in Quantitative Methods, Deputy Director of Cardiff Q-Step and a member of the Collaborative Online Social Media Observatory (COSMOS: www.cosmosproject.net). He is based in the School of Social Sciences at Cardiff University and his research focuses on the development of demographic proxies for Twitter data and understanding how social media data can augment traditional modes ofsocial scientific analysis. @drlukesloan

A perennial criticismof Twitter data is that it’s missing many of the variablesthat we find interesting as social scientistsand, because of this, it will never be a viable source of data for social scientific analysis. We are anchored to the practicesof survey methodology in which a question is asked and answered, thus we ensure that the researcher collectsthe relevant demographic information allowing us to comparegender/ethnic/socio-economic groups. This is the bread and butter of social science.
In contrast, social media data is naturallyoccurring it is not elicited! Becauseof this it is unfocused, messy and does not neatly address a pre-conceived researchquestion. But it is a rich source of information on attitudes and provides insightsinto immediate reactionsfollowing key events. It’s been used to predict elections, box office revenue and even to calculatethe epicentre of an earthquake. So clearly we shouldn’t be so quick to dismissthis data as useless, particularly if we are creative and innovative in how we conceptualise the manner in which demographic data may manifest and thus open this data up to social scientific analysis.
Imagine that you are walking down the street and have decided that today you are going to guess the demographics characteristics of the peoplethat you see the only rule is that you cannot ask them outright, you must observetheir behaviour withoutbeing obtrusive. How might you work out someone’s gender?Well, perhapsyou overhear someoneshouting his or her name. What about their occupation? Maybe they have an ID badge or are carrying tools. What about their age? Well we all makeguesses about age based on appearance, often at the risk of offending someone.The point is that through the passiveuptake of incidentalinformation which is there to be analysed (and which you have not elicited!) you can tell quite a bit about a person.

Now let’s considerthis in the context of Twitter. Peopleput their name on Twitter,thus allowing us to derive a proxy for their gender. For those who have geo-tagging switched on we can tell where they were when they tweeted,or we can use profile information to workouttheir home town. If we have enoughtime we can even look at the place which they make reference to in their tweets. We know about their hobbies as they report on  theirleisure activities and we know a bit about their work if they report on it via socialmedia. Are they employed? Well we can have a look at whether they’re complaining about work, about colleagues or about the printer breakingdown (‘again!’). When we look close enough we are flooded with ‘signatures’ that offer us an indication of characteristics that that would typically be found in the demographics sectionof a survey.

The sticking pointis that we can’t derivethis information for all tweetersand not all the proxiesare as reliable as others.First names are actually quite an accurateproxy for gender as identityplay is a minority pursuit.As long as you have stringent classification rules and understand that around 52% of UK users can’t be classified (this still results in successfulidentification of around 600,000 users),then you still have information for 48%*. You could think of this 48% as a sample of Twitter users which is synonymous to a survey sample, althoughnot randomly sampled…but even then do we have any reason to think that the users we have been able to identifyare substantively differentto those we can’t?

The bottom line is that it is possible to derive importantdemographic information from Twitter data if we’re prepared to think creatively. The methods will get better and programmes of work will emerge which allow the confirmation of proxy demographic reliability. We’re only a few metres off the groundon our climb up this new methodological edifice, but seekingout a viable trail enables others to follow and establish safer, more secure routes.

0 komentar:

Post a Comment

LightBlog

BTemplates.com

Categories

#BigData (1) #bookofblogs (6) #einterview (5) #nsmnss (21) #SoMeEthics (2) AHRC (1) Amy Aisha Brown (2) analysis (2) analytics (1) API (1) auxiliary data source (1) Big Data (8) big data analytics (1) blog (14) blogging (7) blogs (8) Book of blogs (3) book review (8) case studies (1) Christian Fuchs (1) coders (1) cognition (1) community (2) community of practice (1) computer mediated (1) conference (3) content analysis (1) crowdsourcing (3) data (1) data access (1) Data Base Management System (1) data linkage (1) data protection (1) definitions (4) demographics (1) Dhiraj Murthy (1) digital (3) digital convergence (1) Digital debate (7) digital humanities (1) dissemination (1) Dr Chareen Snelson (2) Dr Sarah-Louise Quinnell (1) Dr Steve Jones (1) e interviews (2) e-privacy (1) ECR (1) einterview (2) empathy (1) Eran Fisher. (1) ESRC (2) ethics (13) event (3) facebook (3) fanfiction (1) funding (2) Geert Lovink (1) graduate (3) guidelines (5) hootsuite (1) HR (1) identity (3) impact (1) imputation (1) international research (2) janet salmons (7) Japanese (1) Jenna Condie (1) jobs (1) Katheleen McNiff (2) Language (1) learning (1) linguistic anthropology (1) Make Money (2) Mark Carrigan (1) market research (2) media (2) methods (1) mixed methods (1) natcen (1) NCapture (1) netnography (2) network (3) Networked Researcher (1) networked spaces (2) new media (2) NVivo (2) Online (2) online communities (1) online footprint (2) online interview research (2) online personas (2) online research (2) organisational management (1) ownership (1) Paolo Gerbaudo (1) phd (2) PhDBlogger (2) politics (1) power (1) privacy (4) QSR International (1) Qualitative (4) qualitative research methods (6) Quantitative (4) Recruitment (1) research (8) research methods (8) researcher (2) RSS (1) RTI International (3) rumours (1) SAGE (1) Sampling (3) semantic analysis (1) semantics (1) sentiment (1) sentiment accuracy (1) Sherry Turkle (1) small data (1) small datasets (1) social media (36) Social Media MA (10) Social Media Managment System (1) social media monitoring tools (2) social media research (12) social science (4) Social Science Space (2) social scientists (6) social tensionn (1) sociolinguistics (1) sociology (3) software (2) statistics (1) Stories (1) storify (1) surveillance (2) survey (4) teaching (2) technologies (4) tools (2) trust (1) tweet chat (11) Twitter (20) University of Westminster (13) user views (1) video interview (7) vlogging (9) web team (4) webinar (2) weighting (1) YouTube (10)
Responsive Ads Here

Recent

Recent Posts

Navigation List

Popular Posts

Blog Archive