Introduction
Reddit functions as a massive virtual gathering where individuals from diverse backgrounds come together to share their stories, photos, and ideas within specialized communities called subreddits. Examining graph 1 below confirms Reddit’s popularity in major places like the USA, Canada, the UK, India, and more, where people engage in diverse conversations and shape the platform by voting on content, creating a dynamic and human-driven experience that mirrors the diversity of our online world. Plus, looking at graph 2 below, it’s safe to say that a lot of Reddit users are from Gen Z – that’s the young crowd. According to the blog, Entertainment is among the top indexing interest groups for Gen Z on Reddit. And since movies, TV series, and anime are often made with them in mind, we thought it’d be cool to take a closer look at what people are posting and talking about on Reddit’s big entertainment spots: the movies, television, and anime subreddits.
Additionally, we were curious about what regular folks on Reddit think about the movies, television, and anime compared to what the professional critics say. For this purpose, we combined external dataset which we will discuess about in more detail in our next section.
Now, let’s explore the key subreddits central to this project: r/anime, r/television, r/televisionsuggestions, r/movies, r/Animesuggest, and r/MovieSuggestions.
- r/anime: This subreddit is dedicated to anime-related discussions, encompassing a wide range of topics, including reviews, recommendations, and general discourse about the anime medium.
- r/television: This subreddit caters to discussions about television shows, spanning various genres and formats. Users engage in conversations about current and past TV series, share recommendations, and discuss industry news.
- r/movies: This subreddit covers a wide range of topics related to films, including discussions about specific movies, industry news, and analyses of cinematic techniques.
- r/Animesuggest: Similar to r/televisionsuggestions but for anime, this subreddit is dedicated to helping users discover new anime titles based on their preferences.
- r/MovieSuggestions: Serving as a space specifically for movie recommendations, this subreddit is an ideal place for users seeking personalized suggestions for their next film to watch.
- r/televisionsuggestions: As a subreddit dedicated explicitly to TV show recommendations, this community serves as a hub for users seeking and providing suggestions for what to watch next.
Our analysis examines these subreddits, aiming not just to observe trends but to uncover the rhythms, blending data science techniques, visualizations, and machine learning to understand user sentiments and interactions.
Data Overview
Reddit Data
Diving into a whopping 412 GB of posts and 918 GB of comments, the dataset is like a big storybook with 109 million entries. It’s carefully organized by simplifying things—removing some details, changing time info to something we can understand, and putting it in a format that’s easy to explore, like preparing a canvas before painting. Now, it’s all set up and ready for us to start looking through and discovering what’s inside.
External Data
The project utilizes datasets from Kaggle’s “Clapper: Massive Rotten Tomatoes Movies and Reviews” and “MyAnimeList” collections. These datasets offer a rich source of information for analyzing trends, preferences, and sentiments in the domain of movies and anime. This approach aims to delve into the intricacies of viewer engagement and sentiment patterns across diverse content genres, providing a comprehensive understanding of audience behavior in the entertainment industry.
The Rotten Tomatoes dataset offers a detailed compilation of movie reviews and ratings, while the MyAnimeList dataset provides insights into anime popularity and viewer preferences. Together, these datasets form a robust foundation for cross-platform analysis.
Data Source 1 - Rotten Tomatoes Movie Review
Data Source 2 - MyAnimeList Anime Review
Objective
Our mission is clear: to decipher the language of entertainment discussions on Reddit, extracting findings that resonate with both seasoned enthusiasts and those new to the subject matter. We will focus on the first two objectives — revealing what the data says and interpreting its significance by providing answers to the following questions.
Business Goal:
Optimize content strategy by understanding peak engagement times. Knowing when users are most active can guide when to post for maximum visibility and interaction.Technical Proposal:
Using EDA, we will conduct a time series analysis on the hour_of_day and day_of_week_str data for each subreddit. By aggregating post frequencies based on these time variables, we can visualize trends and identify peak activity periods. This analysis will provide insights into the most favorable times to post content on each subreddit, helping optimize the content strategy for enhanced user interaction and visibility.Business Goal:
Identifying key contributors to help recognize influential users and understand their content preferences. This insight is valuable for community management and targeted marketing strategies.Technical Proposal:
Using EDA, analyze author data to quantify post frequency and engagement (likes, comments). Apply statistical measures to highlight top contributors and content analysis to understand the themes and types of their posts.Business Goal:
Understand how the content influences user engagement, guiding content creators on the most engaging content.Technical Proposal:
Using EDA, we will compare different engagement metrics like score, number of comments to see how they vary with the length of the post, presence of media, and other features. This analysis will provide insights into the types of content that drive the most engagement, helping content creators tailor their content to maximize user interaction.Business Goal:
Discovering emerging trends in movie and anime preferences among online communities to drive innovative content creation.Technical Proposal:
Utilizing advanced NLP techniques to analyze discussion threads and posts on Reddit. Employing topic modeling and sentiment analysis to identify key themes and emotional responses in discussions about movies and anime. This can highlight what elements (genres, themes, storytelling techniques) are resonating with audiences.Business Goal:
Crafting targeted marketing strategies for films and anime series by understanding the nuances of community preferences and discussions.Technical Proposal:
Implementing a text analysis pipeline to dissect the language and topics prevalent in movie and anime subreddits. Using machine learning models to parse through large volumes of text data, extracting prevalent keywords, phrases, and sentiment trends. This data can inform marketing language, promotional themes, and engagement strategies.Business Goal:
Identifying the gap between critic reviews and audience opinions to better align future productions with audience expectations.Technical Proposal:
Comparing sentiment analysis results from professional review aggregators (like Rotten Tomatoes for movies and MyAnimeList for anime) with community-driven platforms (like Reddit). This involves parsing review text and user comments to assess how audience and critic sentiments diverge or align, providing insights into audience preferences and expectations.Business Goal:
Enhancing audience engagement by understanding the drivers behind community discussions and recommendations in online platforms.Technical Proposal:
Implementing a correlation analysis that links engagement metrics (like upvotes, comment counts) on Reddit with specific themes or sentiments expressed in posts. This would involve statistical modeling to understand which topics or sentiments drive the most engagement, helping to tailor content and discussions that resonate more deeply with the audience.Business Goal:
Enhancing content targeting and user engagement strategies on Reddit by accurately categorizing posts into relevant interest groups such as Anime and Movies.Technical Proposal:
Developing a classifier model using features like post length, presence of media, time of posting, and NLP-derived text features. Machine learning techniques like regression or classification can be employed to classify submissions into categories.Business Goal:
Maximizing content visibility and user interaction on Reddit by predicting the likelihood of posts becoming popular within the community.Technical Proposal:
Creating a model to predict the likelihood of a post becoming popular or not using features such as time posted, text related features using TF-IDF, post length and so on. This involves applying classification algorithms to correlate content features with popularity.Business Goal:
Developing advanced analytics tools for content creators and marketers to understand and leverage the engagement dynamics of Reddit posts.Technical Proposal:
To tackle the challenges posed by Reddit’s scoring system, we recommend a two-pronged approach. Firstly, employing a more complex regression model, such as Decision Tree or Ensemble Regressors, could better handle the intricacies of Reddit scores. Secondly, integrating sentiment analysis and topic modeling into the prediction framework can provide a better view of engagement, capturing not just the quantitative score but also the qualitative aspects of user interaction. This approach will help in creating a more robust tool for analyzing and predicting user engagement on Reddit posts.