Snowball Effect on Reddit

"The more I live, the more I regret how little I know."

- Claude Monet

Discover the Research

How far can a single event influence community behavior?

The snowball effect describes how a small action can grow into something much larger over time, like a snowball rolling downhill, picking up speed and mass as it goes. On Reddit, even a single interaction between two communities can set off a chain reaction. This project explores how positive or negative links between subreddits influence not only the communities involved, but also how others respond.

Now let’s take a look at the dataset.

Dataset

The dataset we are working with is a network of subreddit-to-subreddit hyperlinks, extracted from posts that create hyperlinks from one subreddit to another. A hyperlink originates from a post in the source community and links to a post in the target community. Each hyperlink is annotated with the timestamp of the post, the sentiment of the source community post towards the target community post (−1 for negative and +1 for neutral or positive), and the text property vector of the source post. The hyperlink network covers the period from December 2013 to April 2017.

The network is directed, signed, temporal, and attributed.

As a complement, we will utilize subreddit embeddings, vector representations of each subreddit. They were created such that community embeddings will be close together if similar users post on them.

Clustering

To make sense of the huge network, we can start by clustering subreddits into larger topical communities. Communities are defined as sets of tightly connected nodes. This can be confusing for our problem because a subreddit can also be called a "community", yet it represents only a single node in the Reddit graph.

The idea is to identify communities by maximizing modularity.
Modularity Modularity is a measure of how well a network is partitioned into groups (or communities). Given a partition of the network into groups. The Modularity of a partitioning S of graph G is : $$ Q = \frac{1}{2m} \sum_{ij} \left( A_{ij} - \frac{k_i k_j}{2m} \right) \delta(c_i, c_j) $$ Where $A_{ij}$ is the edge weight between nodes $i$ and $j$, $k_i$ and $k_j$ are the sum of the weights of the edges attached to nodes $i$ and $j$, $2m$ is the sum of all the edge weights in the graph, $c_i$ and $c_j$ are the communities of the nodes and $\delta$ is an indicator function.

Leiden Algorithm To identify communities by maximizing modularity, we can use the Leiden algorithm which is an improvement of the Louvain algorithm. It guarantees well-connected communities, converging towards a partition in which all subsets of all communities are locally optimally assigned and it is much faster than Louvain.



Here are the clusters we found.

Now we can visualize these clusters in the embedding space using the subreddit embeddings dataset. This plot lets us see how different topical groups of subreddits are arranged relative to each other. The idea behind the embedding space is simple: subreddits with similar users end up close together, while communities with very different audiences are farther apart.
How to plot embeddings? These vector embeddings are actually 300 dimensional! To make a nice plot, we first need to reduce the dimensionality. This step keeps the most important structure in the data while projecting everything down to two dimensions.

PCA We start with PCA (Principal Component Analysis). PCA is a linear method that finds the directions in the data with the most variance and projects the embeddings onto those directions. Using PCA helps compress the embeddings and remove some noise, and it also makes later visualization steps faster and more stable.

t-SNE After PCA, we use t-SNE to create the final 2D visualization. t-SNE works by turning similarities between subreddits into probabilities and then trying to preserve those similarities in a lower-dimensional space. It does this by minimizing the Kullback–Leibler divergence between the original high-dimensional data and the 2D embedding. We apply PCA first because the original embeddings have a lot of features, and t-SNE doesn’t work well when the dimensionality is too high.


t-SNE projection of subreddit embeddings. Each dot represents a community; clusters indicate shared topical interests and interaction patterns.

We observe that highly connected groups of subreddits are not necessarily close in embedding space. Some topical groups form clear clusters in embedding space, meaning their users are similar: Gaming, Pornography & Music are good examples of these. Other groups are much more spread out: Popular/memes, News, Politics & Conspiracies, Religion & Philosophy are good examples.
This makes sense because although subreddits in these might link each other often (eg: r/capitalism and r/communism) this does not mean that their users will be similar, leading to a spread out group in embedding space.

We can also analyse which clusters communicate the most between each other.

Sentiment analysis

What is the share of positive to negative hyperlinks and how can we define them? The data is labeled with a link sentiment value which is either +1 if the post is neutral to positive or -1 if the post is negative.

Fun fact

The authors of the paper originally had three categories: positive, negative, and neutral but they had so few positives that they combined them with the neutral class.



Let's look at the distribution of link sentiment in the dataset.

It looks like a large share of links are neutral to positive. We can also see how the share of positive/neural hyperlinks evolves over time for each cluster, to get a btter idea of the distribution.

The issue with this classification is that it lacks precision. We want to be able to distinguish strongly postive and negative posts from neutral ones. Luckily, we still have some tools we can use. Among the text proporties of each post, we have a couple of useful metrics:
VADER

Positive, Negative, Compound

LIWC

Posemo, Negemo, Anx, Anger, Sad

LIWC and VADER are lexicon-based tools for measuring sentiment and affect in text. LIWC computes normalized frequencies of words associated with psychological and emotional categories, such as negative emotion or anger, while VADER produces a continuous sentiment polarity score by combining word-level valence with rules for negation, intensifiers, and punctuation, making it well suited for social media text.

We can use them to define a continuous sentiment score between -1 and 1, which allows us to quantify sentiment type (negative or positive) as well as strength.

We combine the LIWC and VADER outputs into a single signed sentiment score using principal component analysis (PCA). PCA is applied directly to the LIWC and VADER features, and the first principal component, which captures the dominant shared variation across the lexicon-based measures, is used as a continuous sentiment axis. This signed score provides a compact measure of sentiment polarity and strength, enabling rapid assessment and comparison of sentiment intensity across posts.

PCA Sentiment Analysis Cluster

The large spike near zero represents hyperlinks with neutral metrics, effectively identifying objective or non-emotive content.



What parameters were most important?
Loadings To answer this, we can look at the PCA loadings for the first principal component. A loading tells us how much each original feature contributes to that component. Features with larger absolute values matter more, because they have a bigger influence on the direction of the component. They show how strongly each feature lines up with the main axis of variation in the data.

PCA weights

Most of the sentiment signal comes from overall positive and negative tone, with finer-grained emotions playing a much smaller role.


What are Shock Events?


    In our framework, we decided to define two particular cases of shock events:
  • Sentiment Shock Event: A sentiment shock event happens when a subreddit receives an incoming link with sentiment that is unusually extreme, either negative or positive, compared to what it normally receives. These moments stand out from everyday activity as spikes in emotional intensity.
  • Repetitive Shock Event: A repetitive shock event happens when a subreddit receives unusually large bursts of incoming links several times in a short period, compared to its normal past activity.
PCA Sentiment Analysis Cluster

So what happens after a sentiment shock event ?


Detected negative events in askreddit

Detected negative events for r/clashoflclans. Orange points mark days where the negative deviation is strong enough to be classified as an event.

Here, we watch r/clashofclans over time and, each day, grab the single most negative link it gets. Instead of staring at raw scores, we turn that into a standardized “how different was this for them?” number so you can see when an interaction really stands out from their usual tone.
How is the standardized score computed?
To quantify how unusual an incoming interaction is for a given subreddit, we standardize sentiment values relative to that subreddit’s typical behavior.

For a subreddit s, let $x_{s,t}$ denote the sentiment score of an incoming interaction observed on day $t$. We compute the standardized score as: $$ z_{s,t} = \frac{x_{s,t} - \mu_s}{\sigma_s} $$ where:
  • $\mu_s$ is the mean incoming sentiment for subreddit s,
  • $\sigma_s$ is the corresponding standard deviation.
This transformation expresses sentiment in units of standard deviation, allowing us to compare how extreme an interaction is relative to what the subreddit usually receives. Strongly negative values of $z_{s,t}$ therefore indicate unusually hostile incoming interactions.


Now that we have set up the detection of our events, we can now try and look for a potential snowball effect !
To verify that such effect exist, we need to be rigourous and do some statistics...


So, how do we test for a snowball effect?
Once events have been detected, we test whether they are followed by a change in the subreddit’s outgoing behavior. For each event, defined by a subreddit s and an event day t, we proceed as follows:
  • We collect all outgoing links written by subreddit s in a short time window before and after the event day t.
  • We compare the distributions of the continuous sentiment score (score_sentiment_pca) in the pre- and post-event windows.
  • To account for unequal variances and sample sizes, we use Welch’s t-test, which yields a p-value quantifying whether the change is statistically significant.
In addition to the p-value, we compute a signed effect size: $$ \Delta = \overline{\text{post}} - \overline{\text{pre}} $$ A negative value of $\Delta$ indicates that the subreddit’s outgoing sentiment became more negative after the event, while a positive value corresponds to a shift toward more positive sentiment.
Welch’s t-test Welch’s t-test is used to determine whether the difference between the means of two groups is due to random variation or reflects a real difference between the populations, particularly when the variances of the groups are unequal. The test works by computing a t-value that relates the difference in sample means to the variability in the data. Welch’s t-test defines the statistic \( t \) by the following formula: \[ t = \frac{\Delta \bar{X}}{s_{\Delta \bar{X}}} = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{s_{\bar{X}_1}^2 + s_{\bar{X}_2}^2}}, \] with: \[ s_{\bar{X}_i} = \frac{s_i}{\sqrt{N_i}}. \] Here, \(\bar{X}_i\) and \(s_{\bar{X}_i}\) denote the \(i\)-th sample mean and its standard error, respectively. The quantity \(s_i\) represents the corrected sample standard deviation, and \(N_i\) is the sample size. Unlike Student’s t-test, the denominator is not based on a pooled variance estimate. The degrees of freedom \(\nu\) associated with this variance estimate are approximated using the Welch–Satterthwaite equation: \[ \nu \approx \frac{\left(\frac{s_1^2}{N_1} + \frac{s_2^2}{N_2}\right)^2} {\frac{s_1^4}{N_1^2 \nu_1} + \frac{s_2^4}{N_2^2 \nu_2}}, \] where: \[ \nu_i = N_i - 1. \]
P-Value A p-value is the probability of seeing a result at least this extreme if the null hypothesis were actually true. Smaller p-values mean the result is less likely to be just random variation.

For our tests, we compare it to a threshold with value 0.05: below that, we call the result statistically significant and reject the null hypothesis.

Let's see what results we get...

Positive shock events

Negative shock events

Monet clueless

Even Monet looks unsure here.

In these waffle plots we are hunting for what could be the beginning of a snowball effect. We try to observe if after a subreddit takes a hit (or a praise), there would be a change in the sentiment of its outgoing links.

Something stands out from these plots! The proportion of events where we compute a significant shift in the sentiment of the outgoing links, is really small compared to the overall events. Therefore, we cannot make any generalization on the shift of the sentiment after such events.

We also inspected whether a subreddit hyperlinks more or less than usual after one of those shock events.
Details

Outgoing count shifts after negative events

Outgoing count shifts after positive events

Same reasoning as before, there are still not enough significant shifts to make concluse here!

What can we tell about these events?

First, let's see how our detected events are distributed withing the different topical clusters. Is the reaction conditioned on the type of subreddit? Do some subreddits react more positively or more negatively? Let's take a look.

Across topical clusters, reactions are broadly similar, with only modest differences in average direction and spread. News and politics-related subreddits tend to cluster slightly more on the negative side, while gaming and entertainment communities show somewhat more positive shifts.

What about the strength of the incoming link: does a larger sentiment score impact the reaction even within the events we already classified as strong?

Even among events we classify as strong, higher sentiment z-scores do not translate into systematically stronger reactions. The distributions largely overlap, with no clear monotonic relationship between sentiment intensity and subsequent change. This suggests that emotional extremeness, by itself, is not a reliable driver of how communities respond.

Baseline matters! Visual comparisons are useful, but they can also be misleading: subreddits do not start from the same baseline. Some tend to be more positive or more negative to begin with, which strongly constrains how much they can change. To separate these baseline effects from topic- or sentiment-specific reactions, we turn to a simple regression analysis.
Regression 101
Ordinary Least Squares Regression (OLS)
OLS regression is a statistical method for estimating the parameters of a linear relationship between a dependent variable and one or more independent variables. The method selects coefficient estimates that minimize the sum of squared residuals, where each residual represents the difference between an observed value and the value predicted by the linear model.
Details
OLS


Once we account for a subreddit’s prior sentiment level, most of the variation in how it reacts is explained by regression to the mean: subreddits that were already highly positive tend to cool off, while more negative ones tend to rebound. Topic still matters, but only at the margins. News, politics, and conspiracy-focused communities show systematically weaker reactions, while gaming-related subreddits tend to react slightly more positively on average. By contrast, the strength of the incoming sentiment signal (how extreme it is) adds little explanatory power once the baseline is known. In other words, where a community starts matters far more than how emotionally charged the triggering content is.

What about the repetitive shock events?


How do we detect those bursts in the incoming links?
To detect repetitive shock events, we look at one-day windows and flag days when a subreddit receives far more incoming links than usual for that specific subreddit. Hyperlink activity is heavy-tailed (a few extreme days dominate), so we avoid mean/variance thresholds and instead mix a high percentile cutoff with an absolute floor to keep events rare, substantial, and meaningful.

In simple terms, a day is labeled as a shock event when a subreddit receives a surge of incoming links that is rare compared to its past, much larger than its usual activity, and large enough in absolute size to be meaningful.
Formal Definition For each subreddit s, we define a threshold that determines when incoming links are unusually high: \[ \text{threshold}_s = \max \left( \text{percentile}_{s,q},\ k_0 \right) \] A day is labeled as a repetitive shock event if the number of incoming links exceeds this threshold.
  • percentileₛ,q: captures rare events by focusing on the extreme tail of historical activity (we use q = 0.99).
  • k₀: avoids triggering events for very small subreddits due to noise (we use k₀ = 5 links).
By taking the maximum of these two values, we ensure that detected bursts are unusual, clearly elevated, and substantial.
How strict is this detection? The detection is intentionally strict: most days have very little activity (often just one link for a subreddit), so we only flag truly unusual spikes and end up marking just a small fraction of time bins as shock events. After performing parameter sweeping and qualitative inspection, we selected a baseline configuration that best represents meaningful repetitive linking activity.
  • Time window: 1 day
  • Percentile threshold (q): 0.99
  • Minimum absolute threshold (k₀): 5 links
  • Decision rule: thresholdₛ = max(percentileₛ,q, k₀)

r/SubredditDrama is one of these reddits where people collect and retell the funniest and most chaotic fights happening across Reddit, so you can follow the drama without being part of it. We decided to show you that subreddit's activity (number of times it has been hyperlinked per day) to bettter understand how we detect these events.
Frequency event detection
Seems like a big drama was going on September 2015 !

Here’s how the clusters are involved in those shock events.
Details

Once these repetitive shock events are detected for every subreddit, we apply the same logic as done for our sentiment events. We compare a subreddit’s behavior before and after to answer the following questions:
Questions
  • Does the sentiment of outgoing and incoming links change?
  • Does the volume of outgoing and incoming links increase or decrease?

So, we measured shifts on each metric and ran the same Welch-style t-tests as before to see whether these bursts produced any statistically significant movement for each detected event. You can see below our results ...

Only a sliver of shocks show up as significant, and when they do it’s mostly in incoming metrics than outgoing ones. Note that “untested” wedges are just days with no usable links around the shock, so there was no sentiment to measure.

Apparently Monet was seen losing his mind somewhere in France in the search for a snowball effect. Let’s not give up now; we still have one card left.

Wanna know more about those small portions of the significant effects before moving to the next part?
If you wonder what happens for those significant effects detected, you are at the right section! Note that, as they only represent a small portion of the overall events, their characteristics should not be generalized but it's still interesting to see it they would look like to what we would expect. Click here to see the details !

But does it spread?

Now the aim would be to investigate whether a highly negative emotional interaction between two subreddits affects not only those two communities, but also other subreddits that interact with them. In other words, we ask whether emotional signals propagate through the subreddit network beyond their point of origin.

We use our highly emotional detected events as seeds, potential starting points of emotional diffusion.
Monet spread
Great painitngs guys!
Network distance Network distance is a measure that quantifies the separation between two nodes in a network as the length of the shortest path connecting them, where length is defined as the minimum number of edges required to traverse from one node to the other.


From the detected cascades, we compute two key indicators of emotional diffusion:
Metrics
  • Reach:
    Number of subreddits that show an emotional shift after the seed event
  • Radius:
    Maximum network distance between the seed subreddit and affected subreddits
plot spread
Reach: 5   Radius: 3
Reach: 3   Radius: 1


Considering the seed as the first subreddit who started the shock event, we focused on all the subreddits which have been linked by it in the defined short timespan after the highly emotional event. As earlier, we try and observe if their average outgoing link sentiment has a significant change in value.
If cascades exhibit a radius greater than one and a non-trivial reach, this provides evidence that emotions do not remain localized to a single interaction, but instead spread to related subreddits through the network.
Maximum reach 4
Average reach 2.21
Maximum radius of sentiment 2
Average radius of sentiment 1.03
For this last analysis of our research, the resuls confirm what was seen from the start. Most events do not result in a significative reach. Not noticing any significative effect on the propagation of the sentient through the network is coherent with our previous finds on the detection of the Snowball effect.

So, what have we learned?

Monet snowball
After many distribution plots, t-tests, and statistical detours, we find little evidence of a snowball effect in hyperlinks between subreddits.

Which might actually be good news!🥳
However, this does not mean that the snowball effect does not exist on Reddit. One important limitation is that hyperlinks capture only explicit references between subreddits, while many interactions happen without links and therefore remain invisible in this dataset. It could still appear in other forms, such as through comments, user activity, or coordinated behavior within and across subreddits. To push this analysis further and find out for sure, we would need richer data on post content and on users active in both the source and target communities.
FIND OUT MORE 🤓

There are many ways in which users interact! If you are interested in the way negative hyperlinks can create mobilizations where users from a source subreddit and how users from that target react after an attack, check out the paper below.

Read the paper