Mining Social Media with Social Theories: A Survey

Why is it interesting?
It is interesting to see how traditional social theories can be combined with modern computational tools and data mining techniques to form a better understanding of social media data along with the fact that the nature of social media data significantly differs from the data in traditional data mining.

What is Social media mining and its challenges?
Social Media Mining is the process of representing, analyzing, and extracting actionable patterns from social media data to provide better and customized services to social media Users. The major challenges in social media mining is handling of data, which can be described as:
  1. Big Data, approx. 500 million tweets per day and around 200 billion tweets per year.
  2. Linked Data: Data (Content and Users) is not independent which contradicts traditional data mining methods.
  3. Noisy: Quality of user generated content, spammers and ambiguous connections.
  4. Unstructured: short texts, typos, spacing errors, emoticons, h r u?
  5. Incomplete: To address such privacy concerns, social media data could be incomplete and extremely sparse.
What is Social theories?
Social theories are rules of our society under which data mining techniques can be applied on social media data to form a better understanding of social media data and customize the services. In this paper, three main social theories are discussed, which are:
1. Social Correlation Theory: It states that based on behavior, attributes and activities, adjacent users shares a correlation and they have better chances of forming a connection than any other two random person. Social correlation theory can be explained by further categorization of the process as: 
  1. Homophily- tendency to connect to people who share any similarity. 
  2. Influence- Tendency to follow the behaviors of friends and adjacent users.
  3. Confounding: Forged due to external influences from environment. For example, two individuals living in the same city are more likely to become friends than two random individuals.
2. Balance Theory: This theory is based on the intuition that “the friend of my friend is my friend” and ”the enemy of my enemy is my friend”, that drives toward psychological balance.
3. Social Status Theory: It considers the position or rank of a user in a social community, and represents the degree of honor or prestige attached to the position of each individual.
To give a sense for how the differences between status and balance arise, consider the situation in which a user links positively to a user B, and B in turn links positively to a user C. If C then forms a link to A, what sign should we expect this link to have? Balance theory predicts that since C is a friend of A’s friend, we should see a positive link from C to A. Status theory, on the other hand, predicts that A regards B as having higher status, and B regards C as having higher status — so C should regard A as having low status and hence be inclined to link negatively to A. In other words, the two theories suggest opposite conclusions in this case.

Applying Social theories on Social Data
1. Social theories in User related tasks:
  1. Community detection:  It’s the process of finding implicit groups of users that are more densely connected to each other than to the rest of the network. As per social theories, Homophily suggests that similar users are likely to be linked, and influence indicates that linked users will influence each other and become more similar.
  2. User classification:  Social correlation theory suggests that the labels of linked users should be correlated and in social media and classification can be performed to infer the unknown information of users in the same network, exhibiting the similar behaviors as its correlated user.
  3. Social Spammer Detection:  Based on social correlation theory, Spammers behave differently from their neighbors as most of their neighbors are normal users but normal users perform similarly with their neighbors. Hence, two connected normal users should be close in the latent space, while spammers should be far away from their neighbors in the latent space.
2. Social Theories in Relation Related Tasks:
  1. Link Prediction: Its commonly used in friend recommendation service in social media. To establish homophily theory and predict trust relations based on their activities and behavior, plot users in latent space, the stronger homophily between two users is, the smaller distance between them in the latent space is. On the other hand, Status theory suggests new links are more likely to be attached from users with low statuses to users with high statues.
  2. Social Tie Prediction : Its intend to automatically infer the types of social relations based on user's activity and interaction with other users to provide better services as one user’s work style may be mainly influenced by her/his colleagues; while the daily life habits may be strongly affected by her/his family.
  3. Tie Strength Prediction: Among the heterogeneous relationship on social media, social correlation theory can be used in determining how strong the relation between two users is, by assigning value from 0 to 1 as continuous range rather than binary approach.
3.  Social Theories in Content Related Tasks:
  1. Social Recommendation:  Social correlation theory suggests that a user’s preference is similar to or influenced by their directly connected friends and ensemble methods use this intuition to predict missing values of a user based on its social network.
  2. Feature Selection: By analyzing user generated content with social context based on social correlation theory, a feature selection framework can be used to handle high dimensional social media data effectively.
  3. Sentiment Analysis:  Social correlation theory indicates that sentiments of two linked users are likely to be similar and sentiment labels of tweets via user-user social relations and user-tweet relations can be utilized to assign sentiment labels to unlabeled tweets.

In this article, we reviewed three key social theories, i.e., social correlation theory, balance theory and status theory and stated the possibilities of integrating these social theories with computational models. As future directions, more existing social theories, such as structural hole theory and weak tie theory could be employed or new social theories could be discovered to advance social media mining.


Post a Comment