Are the Tweets paved with gold?
Paper Review on Correlating Financial Time Series with Micro-Blogging
Activity
People are always highly interested in prediction
on financial market. Several studies on correlation between social media and
the financial market have been done, for instance, DeChoudhury et al. [2] shows
that discussions on blog post correlated with directions of the stock market,
and Yi [3] presents a study to use data from Twitter to approximate the daily
closing value of a stock, which improves a simple moving average, reducing the
error. There are more studies not listed here, however, none of previous works
considers Twitter graph features until this one.
In the paper Correlating Financial Time Series with Micro-Blogging Data, Eduardo
Ruiz and other four researchers together published their findings about
correlations between Twitter features and stock markets. In this paper,
researchers study how the activity in Twitter is correlated to time series from
stock price and volume. Firstly, they carefully filter (including scanning,
searching, and anti-spamming) relevant tweets for a company. Next, they
represent the tweets during a time interval as a graph. Based on numerous
features, graphs are divided in two groups: activity-based (quantities
measurement) and graph-based (link-structure of the graph capture) features. The
financial data includes daily price change and daily traded volume, which are
downloaded from Yahoo finance. Then they use cross-correlation coefficient (CCF)
to estimate how two group Twitter graph features are related at different time
lags. They find that the correlation between Twitter and traded volume of a
stock is relatively strong, while the correlation between Twitter and stock
price is weak. They also find that other graph-based features such as PageRank
and degree are effective for financial indexes. Despite of the unsatisfactory
results on the stock price, they also develop a trading strategy based on
Twitter features, and perform simulations, find it is successful when compared
against some baselines.
In regard of the results, Mail Online
exclaimed that “The Tweets are paved with gold” [4]. Is that so? Here is my
cautious opinion: the tweets may be paved with gold, but there is far more work
to do to dig it. Although the result is not bad, there are some major
limitations still and it is not practical yet.
First, we could find that the correlation
between Tweets’ features that extracted and stock price is not strong. Why the
correlation on price is not as high as traded volume? I believe that the most
important reason is that the index of price change is two-direction, while the
index of volume is single direction; in other words, price change is
directional while volume is not. For instance, one day there is important news
for a certain stock, whatever good news or bad one, investment activity
generally will be more than usual and the volume will be increased; however,
for the price change, it will depends on it is good news or bad one, good news
encourages price up while bad one makes price down more likely. In the paper,
whether the content is good or bad for the company cannot be judged, I believe
that’s why there is weak correlation between price and Twitter features.
As for the simulation, the algorithms of
baselines are rather simple. The most complicated one among them,
auto-regressive model, is a basic linear model. On the other hand, the stock
trading process in the simulation is also much simpler than that in real market.
For example, it does not consider the possibility of selling the stocks. In
this regard, the strategy is not likely to apply for the current stage. It’s
not practically useful for now.
However, it is not meaningless; on the
contrary, the study provides vital information. The paper shows that Twitter
features have the potential of improving over other baseline strategies. I
believe that it could be practically useful someday. To overcome the
limitations mentioned above, we may consider improving the method when
correlating Twitter with stock price by adding a direction, i.e. predicting the
price change will be positive or negative.
One possibility is that we try to add an
algorithm of evaluation on tweets’ content. This method looks natural but it is
not simple, since the algorithm could hardly tell whether a tweet is good or
bad accurately. Nevertheless, we may have a try to check whether it works. Furthermore,
some complicated time series model could be added as baseline.
To conclude, this paper provides an
original study on correlation between Twitter graph-based features and
financial time series, and find that Twitter features does have the potential
of improving over other baseline strategies. Although there are a few
limitations and the algorithm is not practical enough, this work really provides
new directions and vital information for us. It at least shows us some evidence
on “Tweets are paved with gold”.
Reference
[1] Eduardo
Ruiz, Vagelis Hristidis, Carlos Castillo, Aristides Gionis and Alejandro
Jaimes, (2012). Correlating Financial Time Series with Micro-Blogging Data.
WSDM 2012.
[2] M.
DeChoudhury, H. Sundaram, A. John, and D. D. Seligmann. Can blog communication
dynamics be correlated with stock market activity? In Proceedings of the 20th
ACM conference on Hypertext and Hypermedia, 2008.
[3] A. Yi.
Stock market prediction based on public attentions: a social web mining
approach. Master’s thesis,
University of Edinburgh, 2009.
[4] Rob
Waugh. The Tweets ARE paved with gold: Twitter 'predicts' stock prices more
accurately than any investment tactic, say scientists. Mail Online 26 Mar. 2012
http://www.dailymail.co.uk/sciencetech/article-2120416/Twitter-predicts-stock-prices-accurately-investment-tactic-say-scientists.html





