Sunday, May 5, 2013

Are the Tweets paved with gold?

(This review is written on 31 Mar 2013 as homework of the course "Social Media Analysis". I read it again after one month that I perform the work in paper, and found mistakes on price evaluation. But as a good review I think, I decide to post here. I hope I could find time to fix them in near future.)


Are the Tweets paved with gold?

Paper Review on Correlating Financial Time Series with Micro-Blogging Activity


People are always highly interested in prediction on financial market. Several studies on correlation between social media and the financial market have been done, for instance, DeChoudhury et al. [2] shows that discussions on blog post correlated with directions of the stock market, and Yi [3] presents a study to use data from Twitter to approximate the daily closing value of a stock, which improves a simple moving average, reducing the error. There are more studies not listed here, however, none of previous works considers Twitter graph features until this one.
In the paper Correlating Financial Time Series with Micro-Blogging Data, Eduardo Ruiz and other four researchers together published their findings about correlations between Twitter features and stock markets. In this paper, researchers study how the activity in Twitter is correlated to time series from stock price and volume. Firstly, they carefully filter (including scanning, searching, and anti-spamming) relevant tweets for a company. Next, they represent the tweets during a time interval as a graph. Based on numerous features, graphs are divided in two groups: activity-based (quantities measurement) and graph-based (link-structure of the graph capture) features. The financial data includes daily price change and daily traded volume, which are downloaded from Yahoo finance. Then they use cross-correlation coefficient (CCF) to estimate how two group Twitter graph features are related at different time lags. They find that the correlation between Twitter and traded volume of a stock is relatively strong, while the correlation between Twitter and stock price is weak. They also find that other graph-based features such as PageRank and degree are effective for financial indexes. Despite of the unsatisfactory results on the stock price, they also develop a trading strategy based on Twitter features, and perform simulations, find it is successful when compared against some baselines.
In regard of the results, Mail Online exclaimed that “The Tweets are paved with gold” [4]. Is that so? Here is my cautious opinion: the tweets may be paved with gold, but there is far more work to do to dig it. Although the result is not bad, there are some major limitations still and it is not practical yet.
First, we could find that the correlation between Tweets’ features that extracted and stock price is not strong. Why the correlation on price is not as high as traded volume? I believe that the most important reason is that the index of price change is two-direction, while the index of volume is single direction; in other words, price change is directional while volume is not. For instance, one day there is important news for a certain stock, whatever good news or bad one, investment activity generally will be more than usual and the volume will be increased; however, for the price change, it will depends on it is good news or bad one, good news encourages price up while bad one makes price down more likely. In the paper, whether the content is good or bad for the company cannot be judged, I believe that’s why there is weak correlation between price and Twitter features.
As for the simulation, the algorithms of baselines are rather simple. The most complicated one among them, auto-regressive model, is a basic linear model. On the other hand, the stock trading process in the simulation is also much simpler than that in real market. For example, it does not consider the possibility of selling the stocks. In this regard, the strategy is not likely to apply for the current stage. It’s not practically useful for now.
However, it is not meaningless; on the contrary, the study provides vital information. The paper shows that Twitter features have the potential of improving over other baseline strategies. I believe that it could be practically useful someday. To overcome the limitations mentioned above, we may consider improving the method when correlating Twitter with stock price by adding a direction, i.e. predicting the price change will be positive or negative.
One possibility is that we try to add an algorithm of evaluation on tweets’ content. This method looks natural but it is not simple, since the algorithm could hardly tell whether a tweet is good or bad accurately. Nevertheless, we may have a try to check whether it works. Furthermore, some complicated time series model could be added as baseline.
To conclude, this paper provides an original study on correlation between Twitter graph-based features and financial time series, and find that Twitter features does have the potential of improving over other baseline strategies. Although there are a few limitations and the algorithm is not practical enough, this work really provides new directions and vital information for us. It at least shows us some evidence on “Tweets are paved with gold”.


Reference
[1] Eduardo Ruiz, Vagelis Hristidis, Carlos Castillo, Aristides Gionis and Alejandro Jaimes, (2012). Correlating Financial Time Series with Micro-Blogging Data. WSDM 2012.
[2] M. DeChoudhury, H. Sundaram, A. John, and D. D. Seligmann. Can blog communication dynamics be correlated with stock market activity? In Proceedings of the 20th ACM conference on Hypertext and Hypermedia, 2008.
[3] A. Yi. Stock market prediction based on public attentions: a social web mining approach. Masters thesis, University of Edinburgh, 2009.
[4] Rob Waugh. The Tweets ARE paved with gold: Twitter 'predicts' stock prices more accurately than any investment tactic, say scientists. Mail Online 26 Mar. 2012
http://www.dailymail.co.uk/sciencetech/article-2120416/Twitter-predicts-stock-prices-accurately-investment-tactic-say-scientists.html

Tuesday, November 27, 2012

SNA in the Age of Big Data



The Age of Big Data

It’s a revolution. We’re really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched.”
-- says Gary King, director of Harvard’s Institute for Quantitative Social Science. [1]

"They are our nuclear codes."
-- Ben LaBolt, the campaign spokesman responded so when mentioned about the data analysis help Obama win. [2]

Welcome to the Age of Big Data. Data flows everywhere in the contemporary era, and it is growing rapidly all the time. IDC, a technology research firm, estimates that the data is growing at 50 percent a year, or more than doubling every two years. It’s not just more streams of data, but entirely new ones. For example, there are now countless digital sensors worldwide in industrial equipment, automobiles, electrical meters and shipping crates. They can measure and communicate location, movement, vibration, temperature, humidity, even chemical changes in the air. [1] 

Data analysis has been widely applied in various industries, business, science, sports, advertising and public health, almost every area we could imagine. Data-driven discovery and decision-making are playing an increasingly important role in our life. Recently, An article in the Time briefly describes how Obama beat Romney by the data collected, stored and analyzed. 

Social Network Analysis

Despite various approaches for data analysis have been introduced and applied, most of them assume that what people do, think, and feel is independent of who they know. This obviously brought bias on result when the analysis relates to human behavior. In this case, Social Network Analysis, which based on the assumption that  people are all interdependent, could show its great power.

 SNA has gained significant attention in recent years, largely due to the success of social networking and media sites, and the consequent availability of a huge mass of social network data. As the concepts and major algorithms of SNA have been introduced in our lecture notes, I will not repeat them here. 

Here I'd like to borrow an example from Inside Social Network Analysis:

SNA applies to a wide range of business problems, including: [3]
  • Knowledge Management and Collaboration.  SNAs can help locate expertise, seed new communities of practice, develop cross-functional knowledge-sharing, and improve strategic decision-making across leadership teams. 
  • Team-building.  SNAs can contribute to the creation of innovative teams and facilitate post-merger integration.  For example, SNAs can reveal which individuals are most likely to be exposed to new ideas. 
  • Human Resources.  SNAs can identify and monitor the effects of workforce diversity, on-boarding and retention, and leadership development.  For instance, an SNA can reveal whether or not mentors are creating relationships between mentees and other employees.   
  • Sales and Marketing.  SNAs can help track the adoption of new products, technologies, and ideas.  They can also suggest communication strategies. 
  • Strategy.  SNAs can support industry ecosystem analysis as well as partnerships and alliances.  They can pinpoint which firms are linked to critical industry players and which are not. 

Another paper, Social Network Analysis and Mining for Business Applications also provides detail about the business application of SNA, including more applications and challenges for SNA.[4] 

Most recently since the IPO of Facebook, it seems that the investments on SNS have been cooling down. An obvious factor is the revenue models for most SNS are ambiguous. However, I believe that with more studies and applications on social network mining and analysis, situation will go much better. (One typical case is Sina Weibo) 



Reference
1. STEVE LOHR, The Age of Big Data, February 11, 2012, The New York Times.
2. Inside the Secret World of the Data Crunchers Who Helped Obama WinNov. 07, 2012
3. Kate Ehrlich & Inga Carboni, Inside Social Network Analysis.
4. Bonchi, F., Castillo, C., Gionis, A., and Jaimes, A. 2011. Social Network Analysis and Mining for Business Applications

Picture Source: Greenbookblog

Revised on 28 Nov 2012

Wednesday, November 7, 2012

Social Circles Online and Offline


In lecture 7, Prof. Chan introduced some fundamental concepts about social network analysis such as dyad and triad. As we know, digital network is a mapping and extension to social relationship in our real life.

We define a group share same experience,(say, have same hobby or situation) as a social circle. Social circles in real life could be divided into four types below:[1]

Dyad
Type 1: One to one, or point to point. This is a circle contains two persons only, just same as the dyad mentioned in the lecture. The point to point relationship is an ultra-steady state. It is our core social state and has most frequency degree.

Type 2: Point to specific points, which is a steady state. One member communicates with all members in the circle. The circle could last long, but has less importance and frequency degree than one to one.

Type 3: Communication among members in a big circle/community, could be regarded as point to unspecific points, an infra-steady state. It may transfer to ultra-steady or steady state circles, but social behaviors tend to reduce in this circle as time goes by. For instance, colleagues in same company/department, cyber-pals in a game/BBS, classmates in certain program/course (IE QQ group) could be regarded as the big circle, and any member could communicate with other members. For some certain periods, especially for the initial period we join the circle, we would like to communicate with all or most members. Once we build friendship with one or some members in the circle, we will communicate more in the new point to point (or point to specific points) circle and reduce social behaviors in the big circle. Another possible case is that a member fail to find any member I'm interested after a certain period, then he/she will also reduce communications in the circle. Generally speaking, if this kind circle has no new members, it will become inactive eventually.

Type 4: Communication among members in a huge circle/community without boundary. This could be called as point to all points, which is an unsteady state.It is temporary, discrete and random. The relationship ends as soon as the communication is over until next communication starts. In our daily life, this not going to happen, we cannot broadcast to everybody in most cases. However, this could happen online.

Each person has many circles. We share and acknowledge different information in different circles.

As I mentioned, digital network is a mapping and extension to our real social environment. Now, let's take a look at what roles current SNS products are playing on.

Relationships on Facebook is more likely type 3 mentioned above. In most cases, we share pictures and statuses to all friends. It is not that stable.But the wall and message function enables point to point communication to satisfy users' need on type 1.

Twitter/Weibo looks more likely type 4.But due to the "follow" mechanism, in most cases, we know who follow us and our twits will be read by whom (not the case for public account). Hence we could also consider it as type 3. Sina Weibo seems try to map type 2 by the "close friend(密友)" function in latest V5 version, but it looks not very successful so far.

Other knowledge-based online communities such as Wikipedia, could be considered as type 4. We could say there is little social relationship on Wikipedia.It seems difficult to build relationship in these knowledge sharing communities.

IM and email is mostly used for point to point relationship.

It seems that no SNS could map all relation types circles at same time. In this regard, WeChat may be a good try, although it is not good enough.

Somebody says it is vain to map our complicate real life to online world. What do you think?

------------Supplement-----------
Added in 19 Nov 2012:
I found that the description may be ambiguous for type two circle in the article, so add an example here: suppose there are four persons A, B, C and D familiar to each other, they often gather together, we could say this is a type two circle. Generally speaking, one person has one or two such circles. And this kind circle usually contains only a few people.

In our real life, if a circle contains more people, the circle tends to less stable, as the cost to maintain it is higher.

Reference:
1.普通人的关系缺乏“全部公开”和“点对点”之外的中间状态吗?



Tuesday, November 6, 2012

Individual and Group Cognition about Social Cloud

Picture source: Cutcaster

In the latest lecture, two questions are raised for the article Social Cloud Computing: A Vision for Socially Motivated Resource Sharing:
1. What is the definition of Social Cloud?
2. What are the possible applications of a Social Cloud?

The answers are easily found in the article:
1. A Social Cloud is a resource and service sharing framework utilizing relationships established 
between members of a social network.
2.  A Social Computation Cloud; A Social Storage Cloud; A Social Collaborative Cloud; A Social Cloud for Public Science; An Enterprise Social Cloud.

- Is there any differences in terms of individual and group epistemic cognition, how?

All of our group members got the same answer except for Guan Hao, who had more comments on the definition by searching the Internet. And we have all agreed to adopt the comments. The comments are below:
Social Cloud can provide some kinds of services and these services are actually provided and maintained by a social network instead of centralized servers. The type of the services does not matter, it can be computational work, storage, collaborative… therefore there are lots of applications listed in this article. As long as the services are provided by a social network and it utilizes the relationships established in a social network, it is considered as a Social Cloud.


- How did you approach to the problem individually and in group, respectively? Is there any differences in the processes involved?

As for the epistemic aim, I believe there is difference between two activities. For activity one, due to the time limit, I just tried to understand the article and the new concept, to answer the questions; while for activity two, we have already had answers about the social cloud, and our aims are more likely to verify whether the answers are correct and explore more about the concept.

The approach is also different. For activity one, I just try to find answer in the article. In the activity two, more approaches are added including discussing and searching the Internet. However, due to time limit, we didn't find much more about it.




Tuesday, October 16, 2012

Rumors






In week 3, Prof. Rosanna posted the picture and raised a question: which of the lines (A, B, or C) is as long as the blue line?

If your answer is B, but most of your classmates claim that it should be C, are you still stick with your original answer?

This case reminds me of rumors. I'm interested in this topic as I often receive rumors on SNS. I'm always wondering why my friends always forward untrue statements, which could be easily seen through.
A rumor is a story or statement in general circulation without confirmation or certainty as to facts.【1】
Here the statement "line C is as long as the blue line" could be regarded as a rumor before we do measurement. Of course, in this case, it is simple to verify it by measuring the four lines. After executing the measurement, we could know that the correct answer is B. However, during the lecture, we could not verify it and some classmates were influenced by the rumor and feeling uncertainty about the correct answer. Ambiguous is a fundamental factor of a rumor.

Rumors are a ubiquitous feature of our social and informational landscapes. The study of rumors has a long history as well. Nevertheless, people haven't achieved agreement from psychology perspective view. Rosnow (1991) claimed that rumors were transmitted because people needed to explain ambiguous or uncertain events, and because talking about them helped "catharsis" and reduced associated anxiety[2]. However, Bernard Guerin and Yoshihiko Miyazaki (2006) thought otherwise, they suggested that rumor tellers just utilized these very properties of anxiety and uncertainty to make a good story and improved their social relationships.[3]


Obviously, the cost of believing and forwarding a rumor is much less than that of verifying it. Laziness is human nature, that's an important reason why rumors spread quickly and hardly be stopped.

I believe most people have received the following claim:
The average person needs to drink eight glasses of water per day to avoid being "chronically dehydrated."
The statement has been broadcast many years, and many people believe it. In fact, there is no scientific proof for it. The best general advice is to rely upon your normal senses. If you feel thirsty, drink; if you don't feel thirsty, don't drink unless you want to. [4]

Drinking 8 glasses of water per day


Just like the "8 glasses water per day", some statements are difficult to tell if they are true. But if we really want to be wise and not be fooled by the rumors, it is not that difficult in this information age.

How to tell if rumors are true? The key is keeping critical thinking. Do not believe it so easily when you read a statement that is against your common sense. When you begin to doubt, you are not far from the truth. Snopes.com is an ideal website to verify the "urban legends". Another tool is the search engine. When we read breaking news from some unofficial channel, we could Google it to check (More professional way is to search it on major media by news engines, such as http://www.newsnow.co.uk/h/)


Do you believe that π is not equal to 3.14, but is 4? 

Keep in mind: the rumor stops by the wise. Let's be the wise and stop the rumors!

Comments are welcome!

Reference:
1. http://dictionary.reference.com/browse/rumor?s=t
2. ROSNOW, R. L. (1991). Inside rumor: A personal journal. American Psychologist, 46, 484-496. 
3. http://www.thefreelibrary.com/Analyzing+rumors,+gossip,+and+urban+legends+through+their...-a0142338936
4. http://www.snopes.com/medical/myths/8glasses.asp

Monday, September 24, 2012

QQ: an IM, also a Platform for Social Networking

This article is written on Sep 24, 2012 and modified significantly on Sep 25, 2012.


Recap on Lecture Notes: the Definition of Social Networking

"- The process of building online communities, often accomplished both through 'groups' and 'friends lists' that allow greater interaction on websites. 
 - Is a process of building relationship over the web."
When I was reading the definition above, I categorized email and instant messenger as social networking very naturally, since I think there are groups and lists for email and IM, and we could build relationship there. It really surprised me when I realized that many people are against this view. But there are classmates agree with me as well. Guan Hao also think email is a kind of social networking and writes a blog on it. 


What is Social Networking Exactly?

The definition on our notes is a little bit sketchy and ambiguous. To clarify it, I follow the link on lecture notes and find more detail about social networking. According to Daniel Nation's article, social networking is:
"Based on a certain structure that allow people to both express their individuality and meet people with similar interests. This structure includes having profiles, friends, blog posts, widgets..."
If we follow this specification on the architecture of social networking, perhaps we still have disagreements on whether email is social networking or not. How about IM? As there are too many different IMs, let's narrow it down and focus on Tencent QQ, which is an IM of most widely used in mainland China.


Is QQ a Kind of Social Networking?


Well,  let's check the following facts first:

1. Users have their profiles and friends.

2. In the early version, there are chatting rooms built in OICQ, they are grouped into different categories, such as teenagers' room, which enables people could meet people with similar interests. In fact, many beautiful stories about online love affairs began here at that time. 

3. Group chatting was QQ's another important function for social networking. Different from chatting room (whose categories and names are given by the Tencent administrators), groups builders and administrators are users. Each group's members may share some attribution or have same interest, say, CUHK MSc IE students 2012 group, and I met some new friends there.

4. Bound tightly with QQ, Qzone is introduced in 2005. Users could write blogs there, and any latest updates on Qzone will be displayed on their profiles of QQ. Friends' update also could be pushed to users.

5. Recently, QQ 2012 add another function called "Circle" (Does it sound familiar?). 


QQ Circle: Does it look like another "fake clone"(山寨)  from g+ circle?


As the features match all the requirements of social networking, I suppose there is no reason to deny QQ is a social networking platform.