There is something about buzzwords that attracts us all. They are catchy, most have
some rhythm of their own and just like salt and pepper, if you use the right amount,
you can spice up your conversations. Of course, there are some of us who like to use
them as a snake oil salesperson detector of sorts, pointing fingers to whoever uses
I believe the truth lies somewhere in between: buzzwords turned buzzy for a reason
and there is always some power behind those reasons. Today I am going to talk about
one Buzzword (yes, with capital B) that you may have heard a lot lately and that
does not get the love it deserves. Well, more than a buzzword, a buzz-term: "Social
Social Listening and its promises
Social Listening promises a sneak peek into real people´s lives. They may tell
a white lie or two when being presented with a survey to save face in front of
the surveyor, or may even bend their will succumbing to peer pressure on focus
groups, but what they’ll never do is be someone else when presenting an opinion
on social media. Being behind a keyboard brings a sense of security impossible
to find in other spheres of life hence why we show our most visceral (some may
argue dark) side.
It is important to note, however, "Social Listening" per se does not bring any more
value than peeping into a random stranger’s house without any objective in mind.
Yes, that specific family may have a nice red carpet with a cool lamp, and they may
have dinner at 7pm every day but Fridays, when they always go out for dinner, but so
Social Listening is just a means to get raw Social Media Data. What we do with that
data, however, is where the magic begins.
Harnessing the true power of Social Media Data: Predictive Analytics
Now what? Now we use chaos to predict outcomes, that is what. Hidden within
the depths of Social Media data there are trends impossible to discern with the
naked eye. But those trends, if we manage to identify them, can help us predict
scenarios in several different areas, leading to more informed business
decisions for whoever wants to listen. For the sake of argument, we will cover
three areas Social Media data can help us with predictions: Finance, Marketing
Using Social Media data to predict Financial outcomes
Stock Market and Crypto
A powerhouse of data prediction. That is how some members of the Academia
regard Social Media data, specially when combined with other online data
sources. (Nardo M, Petracco-Giudici M, Naltsidis M ,2016). How cool is
Research has been conducted on future credit risk prediction and with a 79.13% of
accuracy, the results are impressive. From models like these, investors can extract
valuable information. (Yang Y, Gu J, Zhou Z, 2016).
Predicting Stock Market is always a hairy topic. However, analysis of Twitter data
has shown that it is indeed possible to achieve high accuracy predictions. In this
specific case, 69.01% accuracy using regression techniques and 71.82% when training
data with LibSVM. (Pagolu, V. S., Reddy, K. N., Panda, G., & Majhi, B. ,2016)
Social Media data can even be used to predict subjects of high volatility such as
cryptocurrencies. Authors Steinert and Herff and Matta, et al. have successfully
used 2 million tweets to predict the movement of Bitcoin’s prices in a few days.
(Matta, M., Lunesu, I., & Marchesi, M. , 2015)
With the help of Twitter data, it is possible to predict fluctuations in food
prices. That is the conclusion Kim et al. came up with after creating a
predictive model using such data. Using their own models and methodologies
helped them achieve high accuracy (more than 80% in average) in their
predictions. (Kim J, Cha M, Lee JG, 2017)
Predicting things like fluctuation on localized food market prices is one thing, but how does Social Media data fares when used to predict something as complex as the crude oil price? With a multi-platform approach, that is, using data from Twitter, Google Trends, Wikipedia and the Global Data on Events, Elshendy et al. successfully predicted crude oil prices, saying that an approach like this "can lead to forecasts for crude oil prices with a reasonably high level of accuracy". (Elshendy, M., Colladon, A. F., Battistoni, E., & Gloor, P. A., 2018)
Using a mix of Twitter data (131 million tweets, mapped to 1,347 countries), a
controlled dataset of socioeconomic and demographic features and a dataset of
housing related data, Zamani and Schwartz found a substantial improvement on a
real-time indicator for financial markets such as prediction on foreclosures and
on price increases. (Zamani M, Schwartz HA ,2017)
On another study using data from Twitter, it was possible to understand the shaping
and dynamics of the city of Pittsburg in the United States. Using clustering
techniques, the researchers found out that Social Media patterns are useful to
examine the dynamics of cities in areas such as architecture, development,
demographics, geographic characteristics, neighborhood and municipality borders,
etc. (Cranshaw, J., Schwartz, R., Hong, J., & Sadeh, N. ,2012).
Social Media Predictive Analytics for Marketing
"When properly executed, SMA is in the position to deliver great value for
marketing strategy and the business in general." (Kalmer, N.P., 2015)
A method for predicting future consumer spending from Twitter data was
proposed by Pekar & Binner in 2007. The evaluation of the proposed methodology
(time series analysis models and machine learning regression models: SARIMAX,
Gradient Boosting Regression, AdaBoost Regression) demonstrated statistically
significant improvements in prediction. The researchers managed to reduce
forecast errors from 11% to 18% for a three to seven (3–7) day predicting
horizon by using exogenous variables. (Pekar, V., & Binner, J., 2017).
Trends & Entertainment
Ni et al. utilized around thirty (30) million hashtags from Twitter to predict
subway passenger flow and detect social events. Their approach, called
“Optimisation and Prediction with hybrid Loss function” that combined Linear
Regression with Seasonal Autoregressive Integrated Moving Average (SARIMA),
achieved precision 98.27% and recall 87.69% for events such as baseball games.
(Ni M, He Q, Gao J, 2017)
2.4 million tweets were gathered and analyzed to predict Spotify streams for
newly released music albums. The author used Linear Regression with
Spearman’s rank correlation coefficient (Spearman’s RHO) and concluded that the
volume of tweets for each album and artist is positively related with Spotify
streams. (Ruizendaal, R., 2016)
Hudson et al. used Tweets from France, UK and USA and applied Multiple
Regression Analysis with mean-centered brand anthropomorphism and their
interaction on Brand Relationship Quality (BRQ). Their findings demonstrated
that there is strong relation between Social Media conversations and brand
relationship quality and that "engaging customers via social media is associated
with higher consumer-brand relationships". According to the study, thorough a
methodical analysis of Social Media data can provide a mechanism for businesses
to plan their pricing, marketing, and promotion strategies. (Hudson S, Huang L,
Roth MS, Madden TJ,2016)
Ong and Ito’s work evaluated the effectiveness of a Social Media Influencers
marketing campaign that was performed by a Singaporean Tourism Organization. They
investigated whether Multimedia Tools and Applications can affect consumer attitudes
and if those shifts could be predicted using Social Media data. They concluded that,
the analysis of the aforementioned data sources can become invaluable for marketers
to create "creative interactive and engaging content". (Ong YX, Ito N, 2019)
Sociopolitical Predictions using Social Media Data
With the use of Social Media data and other sources, such as Google Trends,
Wikipedia, Polls and news outlets, MacDonald and Mao correctly predicted the
results of the 2015 Sottish and UK elections. Applying text mining techniques in
conjunction with a Vector Autoregressive (VAR) methodology they forecasted with
great precision the rank of the parties within the decimals (e.g. the mean rate
for the percentage of the Conservative party in Scotland was 14.73% while the
actual one was 14.90%). (McDonald, R., & Mao, X., 2015).
In another case, a group of researchers used thirteen (13) different features that
were available online including Tweets, Celebrity Tweets and Celebrity Sentiments,
Twitter Followers, Facebook Page Likes and Wikipedia Traffic to predict the 2016 US
Presidential Elections. The research found correlations between polls and Facebook
page likes, and between polls and Twitter. They concluded that “Machine learning
models with linear regression can produce predictions with meaningful accuracy”.
(Isotalo, V., Saari, P., Paasivaara, M., Steineker, A., & Gloor, P. A., 2016).
Subramani et al. used text mining and real time analytics on data retrieved
from Twitter, to which they applied automatic classification with logistic
regression models for predicting Hay Fever in Australia. According to their
results, predicting Hay Fever outbreaks is plausible as there is positive
correlation between Evaporation, Relative Humidity, Average Wind Speed and Hay
Fever tweeting. (Subramani, S., Michalska, S., Wang, H., Whittaker, F., &
Heyward, B., 2018)
Radzikowski et al. presented a quantitative study of Twitter narrative after a 2015
measles outbreak in the USA. They collected around 670,000 tweets from across the
globe in a 40-day period, referring to vaccinations from the 1st of February 2015.
They identified the dominant terms, the communication patterns for retweeting, the
narrative structure of the tweets, the age distribution of those involved and the
geographical patterns of participation in the vaccination debate in social media.
However, the most important result from this research was that there is a strong
connection between the engagement of Twitter users in vaccination debates and
non-medical exemption from school-entry vaccines. More specifically, they provided
evidence that “Vermont and Oregon with the highest rates of exemption from mandatory
child school entry vaccines had notably higher rates of engagement in the
vaccination discourse on Twitter”. (Radzikowski J, Stefanidis A, Jacobsen KH,
Croitoru A, Crooks A, Delamater PL, 2016)
Challenges when using Social Media for Predictive Analytics
So, when digging for information on such a rich pool of information, it is
only natural we may hit the jackpot and find all the answers we are looking for,
right? Not so fast.
While it is true that SM is rich in insights, extracting those is complex and,
without a rigorous methodology in place, it can even point in the wrong direction.
Social Media is as vast as it is hectic. According to Kantar, at least 7 out of 10
connected adults use social media at least once a day on a global scale. When
talking about the world’s 25 largest markets, that number goes to 8 out of 10. On
the other hand, Statista says that the global number
of social media monthly users is expected to grow up to 3.4 billion. That is a lot
The sheer size of the data sets makes it so important to handle it with care. By its
very nature, Social Media Research needs to address very specific challenges. The
most important are noisy data, possible biases, and the rapid shifts of the Social
Media landscape impeding generalizability.
Calming the noise
Lots of data means lots of noise as well, so probably the most critical part of the
process is to start with the right questions. What exactly is it that I am trying to
analyze? What is the outcome I am looking to get? This will define the kind of
Booleans we will end up using or, in other words, how coarse or thin our strainer
will be. Writing Booleans is an art on itself and we will discuss it in future
articles, but it should suffice to say that well-written Booleans can save us a lot
of time (and headaches). However, Booleans writing not always a one size fits all
When things go awry, it is time to bring up the big guns. Statistical techniques for
signal detection are a great example of more complex tools to tame the noise. They
work by disregarding human biases and focusing on the signals automatically
inferring which ones are important and which are not. (E. Kalampokis, E. Tambouris,
and K. Tarabanis, 2013)
Accounting for bias
Humans will be humans. Bias is an inherent part of human nature and, when dealing
with social research we are bound to find some of it to some degree. For example,
Social Media users we focus our analysis on may not be a 100% faithful
representation of the general population.
When talking about bias correction in academia, there are many approaches under the
sun. In case of elections prediction, some researchers have made no efforts to
correct those, suggesting instead that biases in data are due to “thought leaders”
acting over their networks. (A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M.
Welpe, 2010). Others have tried to rectify knowns skews by using weight schemes. (E.
T. K. Sang and J. Bos., 2012). Even then, some studies not even mention any bias at
all. (L. Shi, N. Agarwal, A. Agrawal, R. Garg, and J. Spoelstra, 2012)
In the end of the day, it does not matter which bias correction you end using, since
the vast amount of data of Social Media data is great for statistical modelling,
usually reducing it to a minimum. (Phillips, L., Dowling, C., Shaffer, K., Hodas,
N., & Volkova, S., 2017)
Finding your way on an ever-shifting landscape
When creating predictive models using data from Social Media, the most worrisome
problem is its rapid evolution. It is only fair to say that, if we invest a
considerable amount of effort in creating a model for data extracted today, we would
like that model to work for data extracted tomorrow as well.
One way researchers have overcome these issues is using data from different Social
Media platforms. This approach has been very successful in studies where
relationships between user demographics and Social Media behavior might vary from
platform to platform, such as in demographic nowcasting.
When trying to infer a user’s age and gender based on their writing style, for
instance, will work better when using data from diverse sources rather than from
only one. A model trained on only Twitter data may perform well on that platform but
may not work as well on Instagram data, for example. (M. Sap, G. Park, J. C.
Eichstaedt, M. L. Kern, D. Stillwell, M. Kosinski, L. H. Ungar, and H. A. Schwartz,
A multi-platform approach may improve robustness of the model overall, as data is
usually complementary. For example, if we use LinkedIn and Facebook data, we can get
professional achievements from one and demographic data from the other. (X. Song,
Z.-Y. Ming, L. Nie, Y.-L. Zhao, and T.-S. Chua, 2016.)
Settling the debate once and for all
As we can see, Social Media is an immense repository of information that, correctly
used, can bring huge benefits to businesses and private or public organizations
alike. It is not, however, something that can be done lightly. With the right set of
tools and the correct methodology, an experienced team of researchers can use Social
Media data to predict various types of outcomes. However, the most important part of
the discussion is whether we are asking the right questions and our approach is the
right one. Scratch that. There is another, even more important question we must ask
ourselves: given the vast amount of evidence of the amazing power Social Listening
and Social Media Data predictive analytics brings, what are we even waiting for to
begin capitalizing it?
- Rousidis, D., Koukaras, P. & Tjortjis, C. Social media prediction: a literature
review. Multimed Tools Appl 79, 6279–6311 (2020)
- Nardo M, Petracco-Giudici M, Naltsidis M (2016) Walking down Wall Street with a
tablet: A survey of stock market predictions using the web. J Econ Surv
- Yang Y, Gu J, Zhou Z (2016) Credit risk evaluation based on social media.
Environ Res 148:582–585
- Pagolu, V. S., Reddy, K. N., Panda, G., & Majhi, B. (2016). Sentiment analysis
of Twitter data for predicting stock market movements. In Signal Processing,
Communication, Power and Embedded System (SCOPES), 2016 International Conference
on (pp. 1345-1350). IEEE.
- Matta, M., Lunesu, I., & Marchesi, M. (2015). Bitcoin Spread Prediction Using
Social and Web Search Media. In UMAP Workshops (pp. 1-10).
- Kim J, Cha M, Lee JG (2017) Nowcasting commodity prices using social media.
PeerJ Comput Sci 3:e126
- Elshendy, M., Colladon, A. F., Battistoni, E., & Gloor, P. A. (2018). Using four
different online media sources to forecast the crude oil price. Journal
ofInformation Science 44(3):408–421.
- Zamani M, Schwartz HA (2017) Using Twitter Language to Predict the Real Estate
Market. EACL 2017: 28
- Cranshaw, J., Schwartz, R., Hong, J., & Sadeh, N. (2012). The livehoods project:
Utilizing social media to understand the dynamics of a city.
- Phillips, L., Dowling, C., Shaffer, K., Hodas, N., & Volkova, S. (2017). Using
social media to predict the future: a systematic literature review. arXiv
- Kalmer, N.P. (2015) The predictive power of Social Media Analytics: To what
extent can SM Analytics techniques be classified as reliable and valid
- Pekar, V., & Binner, J. (2017). Forecasting consumer spending from purchase
intentions expressed on social media. In Proceedings of the 8th Workshop on
Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
- Ni M, He Q, Gao J (2017) Forecasting the subway passenger flow under event
occurrences with social media. IEEE Trans Intell Transp Syst 18(6):1623–1632
- Ruizendaal, R. (2016). The predictive power of social media: using Twitter to
predict Spotify streams for newly released music albums (Master's thesis,
University of Twente).
- Hudson S, Huang L, Roth MS, Madden TJ (2016) The influence of social media
interactions on consumer–brand relationships: A three-country study of brand
perceptions and marketing behaviors. Int J Res Mark 33(1):27–41
- Ong YX, Ito N (2019) “I want to go there toby kimo!” Evaluating social media
influencer marketing effectiveness: a case study of Hokkaido’s DMO. In:
Information and communication technologies in tourism 2019. Springer, Cham, pp
- McDonald, R., & Mao, X. (2015). Forecasting the 2015 general election with
internet big data: An 1000 application of the TRUST framework (No. 2016_03),
Business School - Economics, University of Glasgow
- Isotalo, V., Saari, P., Paasivaara, M., Steineker, A., & Gloor, P. A. (2016).
Predicting 2016 US Presidential Election Polls with Online and Media Variables.
In: Zylka M., Fuehres H., Fronzetti Colladon A., Gloor P. (eds) Designing
Networks for Innovation and Improvisation. Springer Proceedings in Complexity.
Springer, Cham. https://doi.org/10.1007/978-3-319-42697-6_5
- Subramani, S., Michalska, S., Wang, H., Whittaker, F., & Heyward, B. (2018,
October). Text mining and real-time analytics of Twitter data: a case study of
Australian hay fever prediction. In International Conference on Health
Information Science (pp. 134-145). Springer, Cham.
- Radzikowski J, Stefanidis A, Jacobsen KH, Croitoru A, Crooks A, Delamater PL
(2016) The measles vaccination narrative in Twitter: a quantitative analysis.
JMIR Public Health Surveill 2(1)
- E. Kalampokis, E. Tambouris, and K. Tarabanis. Understanding the predictive
power of social media. Internet Research, 23(5):544–559, 2013.
- M. Sap, G. Park, J. C. Eichstaedt, M. L. Kern, D. Stillwell, M. Kosinski, L. H.
Ungar, and H. A. Schwartz. Developing age and gender predictive lexica over
social media. In Proceedings of the 2014 Conference on Empirical Methods in
Natural Language Processing, pages 1146–1151. Association for Computational
- Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections
with twitter: What 140 characters reveal about political sentiment. ICWSM,
- L. Shi, N. Agarwal, A. Agrawal, R. Garg, and J. Spoelstra. Predicting us primary
elections with twitter. URL: http://snap.stanford.edu/social2012/papers/shi.pdf,
- Kantar, Global social media trends report, 2020.