Social influence is the phenomenon that the actions of a user can induce his/her friends to behave in a similar way. We draw a line between homophily and other external influence from the environment and its effect on behavior of any user on the social network. Existing work has established the existence of correlation between user actions and social affiliations but they do not address the source of the correlation. The focus is on analyzing the sources of the correlation which can assist in important decision making such as viral marketing campaigns.
Incompleteness of data due to privacy concerns and anonymous activity is the biggest challenge in evaluating the social influence, which makes it harder to distinguish between homophily and external influence. We propose a statistical test (called the shuffle test) based on the intuition that if influence is not a likely source of correlation in a system, timing of actions should not matter, and therefore reshuffling the time stamps of the actions should not significantly change the amount of correlation.
Models of Social correlation:
Correlation between the behaviors of affiliated agents in a social network is a well-known phenomenon. Formally, this means that for two nodes u and v that are adjacent in graph G, the events that u becomes active is correlated with v becoming active. There are three primary explanations for this phenomenon:
Homophily- tendency to connect to people who share any similarity.
Influence- Tendency to follow the behaviors of friends and adjacent users.
Confounding: Forged due to external influences from environment. For example, two individuals living in the same city are more likely to become friends than two random individuals.
Measuring social correlation: The first step in our analysis is to obtain a measure of social correlation between the actions of an individual and that of her friends in the network i.e. at each time step, calculate the probability as a function of the number of already active friends the user with the parameter as the number of friends that became active in the previous time steps. Flickr stores the actions and for most tags in the Flickr data set, a logistic function with the logarithm of the number of friends as the explanatory variable provides a good fit for the probability.
The shuffle test: The shuffle test is based on the idea that if social influence does not play a role, even though an agent's probability of activation could depend on her friends, the timing of such activation should be independent of the timing of other agents. Let G, be a social network and W is the set of activated users between a time range [0, T]. Assume a user is activated at a particular time, we use logistic regression to estimate the number of user who at the beginning of that time instant had a number of active friends but Flickr did not predicted them, likewise we estimated the users who are inactive but was predicted.
Theoretical analysis assumptions:
Distribution of the activation times is uniform over the time range [0, T].
Each future time step is chosen independently from the uniform distribution instead of using a permutation of the original time stamps.
There are enough data to gather statistics.
3. The edge-reversal test: The edge-reversal test is a test used to for distinguishing influence similar to the one used in the obesity study. We reverse the direction of all the edges and run logistic regression on the data using the new graph. It would be expected that change in social influence would not change significantly since the assumption is friends have common characteristics, are affected by the same external variables and are independent of which of these two individuals has named the other as a friend. However social influence spreads in the direction specified by the edges of the graph, and hence reversing the edges should intuitively change the estimate of the correlation.
Generative simulation model:
No-correlation model: The Network grows exactly in the same way as in the real data. In each time step, we look at the real data to see how many new agents use the tag, and pick the same number of agents uniformly at random from the set of agents that have already joined the network and have not been picked yet.
Influence model: The network, and the growth pattern of the network is kept as in the real data. In every time step, each node in the set of nodes that has joined the network but not activated yet flips a coin independently to decide if to become active in this time step.
Correlation model (no influence): we keep the network and the pattern of growth of the population the same as in the real data. Parameterized using parameter L, follows the pattern of a tag in real data. A number of centers are chosen at random before the generation of actions.
This article is analysis of Influence and Correlation in Social Networks where Shuffle test and the edge reversal test have been processed and the outcome shows the cumulative distribution and frequency distribution of both are nearly identical, further enforcing the idea of correlation and For the Flickr data set, Influence found, cannot be given higher weightage, however the difference between values in two directions for a given edge is minimal, almost zero. The current work has not focused the possibility of social status in influence. Giving the users a position or rank might organize the influence and can also be helpful in understanding the behavior of the edge reversal test.