Introduction

Figure 1: emotion photo from psychology-spot.com

We are drawn to stories—those moments of joy, regret, and awkwardness that make us human. Reddit, often referred to as “the front page of the internet,” is a sprawling digital ecosystem where users gather to share stories, insights, and opinions on virtually any topic imaginable. With over 100,000 active communities, or subreddits, Reddit serves as a unique platform for storytelling and interaction, blending humor, emotion, and controversy. Among its countless corners, two subreddits stand out for their raw, personal narratives: r/TIFU (Today I F*ed Up)** and r/confession.

r/TIFU and r/confession

On r/TIFU (“Today I F**ked Up”) and r/confession, users lay their emotions bare, creating digital spaces rich with storytelling potential.

  • r/TIFU (Today I F*ed Up):** A haven for tales of mishaps and blunders, r/TIFU is where users recount their moments of failure—often with a humorous or self-deprecating twist. From embarrassing workplace incidents to misunderstandings with friends and family, these stories bring levity to everyday struggles, inviting readers to laugh, empathize, and occasionally cringe.

  • r/confession: In contrast, r/confession offers a more somber and introspective space where users share personal secrets and inner conflicts. Posts here range from light-hearted admissions to deeply emotional revelations, fostering a sense of vulnerability and connection among the community.

Both subreddits thrive on storytelling, but they differ in tone and purpose: one embraces humor in failure, while the other seeks catharsis through disclosure. But what makes one story resonate while another fades into obscurity?

The Project: What Makes a Story Stick?

At the heart of this project lies a central question: What makes a story resonate with its audience? We set out to explore this through the lens of emotional engagement, blending exploratory data analysis (EDA), natural language processing (NLP), and machine learning (ML). By diving into the posts and comments of r/TIFU and r/confession, we aim to uncover the elements that make certain stories memorable, engaging, or emotionally impactful.

We began our exploration by addressing key business questions to guide our analysis:

Idea 1: Understand Who and What Drives Engagement

  • Business goal: Identify the most active contributors, dominant themes, and notable characters in these subreddits to uncover the primary drivers of engagement.
  • Questions:
    • Who are the Top Contributors? Understand how individual users shape community dynamics.
    • What Do Users Talk About the Most? Categorize posts into recurring themes, highlighting the key storytelling trends within the communities.
    • Which Characters Capture the Audience’s Attention? Investigate the role of specific characters (e.g., “mom,” “boss”) in enhancing engagement and storytelling.
  • Technical approach:
    • Analyze user activity metrics such as the number of posts, average engagement per post, and overall influence. Contributors will be ranked to uncover patterns in participation and their effect on the subreddits.
    • Perform word frequency analysis and topic modeling on post content.
    • Use named entity recognition (NER) to identify character mentions in posts and analyze their relationship with engagement metrics.

Idea 2: Explore the Role of Titles and Words in Storytelling

  • Business goal: Understand how titles and people-related words (e.g., “friend,” “partner”) influence audience interest and emotional connection.
  • Questions:
    • What Are Most of the Words Mentioned in the Title? Understanding the most common words in titles can reveal what initially draws readers in.
    • Which People-Related Words Capture the Most Attention? Explore how personal and relational terms (e.g., “friend,” “partner”) impact emotional engagement and audience resonance.
  • Technical approach:
    • Conduct text analysis to identify frequently used words in titles and their relationship with post engagement metrics like upvotes and comments.
    • Extract and analyze the frequency of people-related terms in posts and examine their correlation with upvotes, comments, and sentiment.

Idea 3: Examine Controversiality and its Impact on Engagement

  • Business goal: Investigate how controversial posts and comments impact engagement, including scores and discussion volumes.
  • Questions:
    • Does Being Controversial Affect a Post’s Score? Examine how polarizing content influences user interaction, particularly in terms of scores and comment volumes.
    • How Common Are Controversial Comments? Understand the prevalence of controversial comments and their relationship to post engagement.
  • Technical approach:
    • Use Reddit’s metadata to classify posts as controversial or non-controversial. Compare engagement metrics using statistical tests to measure the impact of controversy.
    • Analyze Reddit’s metadata to identify controversial comments. Measure their frequency and assess their correlation with post scores and comment counts.

Idea 4: Unpack the Role of Sentiment in Engagement

  • Business goal: Analyze the emotional dynamics of posts, including the type of sentiment that resonates most, alignment between post and audience sentiment, and alignment between titles and content.
  • Questions:
    • What Type of Sentiment Attracts the Most Upvotes? Explore how different emotional tones, such as positivity or negativity, impact audience engagement.
    • Do Audiences Mirror the Author’s Sentiment? Assess the emotional alignment between the sentiment of posts and their corresponding comments.
    • Are Titles and Content Sentiment Aligned? Investigate whether emotional alignment between titles and content affects how stories are received.
  • Technical approach:
    • Perform sentiment analysis on post content using NLP tools. Aggregate sentiment scores and compare them with engagement metrics like upvotes to identify trends.
    • Analyze sentiment scores of posts and comments to determine alignment. Compare aligned and misaligned sentiment cases to measure their effect on engagement.
    • Calculate sentiment scores for both titles and content. Measure the degree of alignment and its relationship to upvotes and comments.

Idea 5: Highlight the Impact of Recurring Emotional Themes

  • Business goal: Investigate why emotions like embarrassment resonate strongly and how they influence audience engagement.
  • Question:
    • The Spread of the Emotion: Embarrassment Examine why embarrassment is a recurring theme and how it influences resonance with the audience.
  • Technical approach:
    • Isolate posts with high occurrences of embarrassment-related words. Analyze engagement metrics to determine how this emotion drives resonance.

Idea 6: Can We Predict Post Popularity?

  • Business goal: Develop a model to predict post popularity using features such as emotional scores, sentiment, and metadata (e.g., NSFW, edited status).
  • Questions:
    • What factors influence whether a post becomes popular?
  • Technical approach:
    • Train a machine learning model on a balanced dataset, including text features and metadata.
    • Address data imbalance (e.g., more non-popular posts) using sampling techniques.
    • Evaluate the model’s performance and refine features to improve accuracy.

Idea 7: Investigate the Relationship Between Comments and Scores

  • Business goal: Explore how the volume of comments and sentiment agreement influence post scores and resonance.
  • Questions:
    • Do More Comments Mean Higher Scores? Investigate whether posts with more comments tend to achieve higher scores, and assess the relationship between these metrics.
    • What Factors Influence Resonance Beyond Upvotes? MACHINE LEARNING PART!!!
  • Technical approach:
    • Calculate the correlation between the number of comments and scores using statistical methods. Develop an interaction_score to create a unified measure of user engagement.

By combining data-driven insights with storytelling dynamics, this project sheds light on the subtle art of crafting stories that connect, whether through laughter, empathy, or introspection. In the end, we hope to provide a deeper understanding of what makes a story not just popular, but unforgettable.