Sunday, May 5, 2013

Are the Tweets paved with gold?

(This review is written on 31 Mar 2013 as homework of the course "Social Media Analysis". I read it again after one month that I perform the work in paper, and found mistakes on price evaluation. But as a good review I think, I decide to post here. I hope I could find time to fix them in near future.)


Are the Tweets paved with gold?

Paper Review on Correlating Financial Time Series with Micro-Blogging Activity


People are always highly interested in prediction on financial market. Several studies on correlation between social media and the financial market have been done, for instance, DeChoudhury et al. [2] shows that discussions on blog post correlated with directions of the stock market, and Yi [3] presents a study to use data from Twitter to approximate the daily closing value of a stock, which improves a simple moving average, reducing the error. There are more studies not listed here, however, none of previous works considers Twitter graph features until this one.
In the paper Correlating Financial Time Series with Micro-Blogging Data, Eduardo Ruiz and other four researchers together published their findings about correlations between Twitter features and stock markets. In this paper, researchers study how the activity in Twitter is correlated to time series from stock price and volume. Firstly, they carefully filter (including scanning, searching, and anti-spamming) relevant tweets for a company. Next, they represent the tweets during a time interval as a graph. Based on numerous features, graphs are divided in two groups: activity-based (quantities measurement) and graph-based (link-structure of the graph capture) features. The financial data includes daily price change and daily traded volume, which are downloaded from Yahoo finance. Then they use cross-correlation coefficient (CCF) to estimate how two group Twitter graph features are related at different time lags. They find that the correlation between Twitter and traded volume of a stock is relatively strong, while the correlation between Twitter and stock price is weak. They also find that other graph-based features such as PageRank and degree are effective for financial indexes. Despite of the unsatisfactory results on the stock price, they also develop a trading strategy based on Twitter features, and perform simulations, find it is successful when compared against some baselines.
In regard of the results, Mail Online exclaimed that “The Tweets are paved with gold” [4]. Is that so? Here is my cautious opinion: the tweets may be paved with gold, but there is far more work to do to dig it. Although the result is not bad, there are some major limitations still and it is not practical yet.
First, we could find that the correlation between Tweets’ features that extracted and stock price is not strong. Why the correlation on price is not as high as traded volume? I believe that the most important reason is that the index of price change is two-direction, while the index of volume is single direction; in other words, price change is directional while volume is not. For instance, one day there is important news for a certain stock, whatever good news or bad one, investment activity generally will be more than usual and the volume will be increased; however, for the price change, it will depends on it is good news or bad one, good news encourages price up while bad one makes price down more likely. In the paper, whether the content is good or bad for the company cannot be judged, I believe that’s why there is weak correlation between price and Twitter features.
As for the simulation, the algorithms of baselines are rather simple. The most complicated one among them, auto-regressive model, is a basic linear model. On the other hand, the stock trading process in the simulation is also much simpler than that in real market. For example, it does not consider the possibility of selling the stocks. In this regard, the strategy is not likely to apply for the current stage. It’s not practically useful for now.
However, it is not meaningless; on the contrary, the study provides vital information. The paper shows that Twitter features have the potential of improving over other baseline strategies. I believe that it could be practically useful someday. To overcome the limitations mentioned above, we may consider improving the method when correlating Twitter with stock price by adding a direction, i.e. predicting the price change will be positive or negative.
One possibility is that we try to add an algorithm of evaluation on tweets’ content. This method looks natural but it is not simple, since the algorithm could hardly tell whether a tweet is good or bad accurately. Nevertheless, we may have a try to check whether it works. Furthermore, some complicated time series model could be added as baseline.
To conclude, this paper provides an original study on correlation between Twitter graph-based features and financial time series, and find that Twitter features does have the potential of improving over other baseline strategies. Although there are a few limitations and the algorithm is not practical enough, this work really provides new directions and vital information for us. It at least shows us some evidence on “Tweets are paved with gold”.


Reference
[1] Eduardo Ruiz, Vagelis Hristidis, Carlos Castillo, Aristides Gionis and Alejandro Jaimes, (2012). Correlating Financial Time Series with Micro-Blogging Data. WSDM 2012.
[2] M. DeChoudhury, H. Sundaram, A. John, and D. D. Seligmann. Can blog communication dynamics be correlated with stock market activity? In Proceedings of the 20th ACM conference on Hypertext and Hypermedia, 2008.
[3] A. Yi. Stock market prediction based on public attentions: a social web mining approach. Masters thesis, University of Edinburgh, 2009.
[4] Rob Waugh. The Tweets ARE paved with gold: Twitter 'predicts' stock prices more accurately than any investment tactic, say scientists. Mail Online 26 Mar. 2012
http://www.dailymail.co.uk/sciencetech/article-2120416/Twitter-predicts-stock-prices-accurately-investment-tactic-say-scientists.html

No comments:

Post a Comment