The popularity of anime is described by a set of statistics. It is estimated that over a third of the world population watches anime and about 33% of American adults have a favorable impression of anime. Moreover, anime is the 3rd most in-demand subgenre worldwide with a 5.5% demand share. Thus, the fact that anime is such a popular form of film and television piqued our curiosity to try and gain a deeper understanding of it from the online community Reddit.
The anime subreddit on Reddit is a popular online community for anime lovers to communicate and share, often referred to as ‘Paradise.’ Members share ideas, recommend or look for anime, make jokes, express love, create memes about certain anime scenes or characters, and more. The dynamics of such a strong and popular community are what we aim to study and understand.
Our project explores anime fandom using massive data collected from January 1, 2021, to March 1, 2023 within Anime subreddit.
Preliminary Cleaned Submission Data Example
Preliminary Cleaned Submission Data
| 0 |
anime |
PsychologicalGift299 |
NaN |
2021-04-19 20:42:46 |
anime movies for 4/20 |
so as my fellow ouid smokers know, tomorrow is... |
12 |
0 |
0 |
0 |
... |
2021-04-19 |
20 |
2 |
4 |
2021 |
anime movies for 420 |
4 |
so as my fellow ouid smokers know tomorrow is ... |
64 |
0 |
| 1 |
anime |
Tuttles4ever |
NaN |
2021-04-19 20:48:42 |
i need a very specific type of anime but i'm a... |
are there any animes about a person who recent... |
7 |
0 |
0 |
0 |
... |
2021-04-19 |
20 |
2 |
4 |
2021 |
i need a very specific type of anime but im al... |
15 |
are there any animes about a person who recent... |
42 |
0 |
| 2 |
anime |
nemifloras |
NaN |
2021-04-19 20:52:42 |
any atmospheric animes like haibane renmei and... |
i finished reassisting haibane r i would like ... |
9 |
0 |
0 |
0 |
... |
2021-04-19 |
20 |
2 |
4 |
2021 |
any atmospheric animes like haibane renmei and... |
10 |
i finished reassisting haibane r i would like ... |
18 |
0 |
3 rows × 22 columns
Preliminary Cleaned Comment Data Example
Preliminary Cleaned Comment Data
| 0 |
anime |
DonaldJenkins |
NaN |
2021-11-14 04:39:47 |
i sent it to ya ;) |
0 |
1 |
t1_hk0whi9 |
0 |
t3_ov07rq |
hkjr7uj |
2021-11-14 |
4 |
1 |
11 |
2021 |
i sent it to ya |
6 |
0 |
| 1 |
anime |
DonMo999 |
:MAL:https://myanimelist.net/profile/Weltweiser |
2021-11-14 04:40:25 |
displate has some neat anime style good qualit... |
0 |
1 |
t3_qtgc12 |
0 |
t3_qtgc12 |
hkjralc |
2021-11-14 |
4 |
1 |
11 |
2021 |
displate has some neat anime style good qualit... |
16 |
0 |
| 2 |
anime |
OrangeBanana38 |
:AMQ::STAR::AL:https://anilist.co/user/OrangeB... |
2021-11-14 04:41:01 |
that sounds like work tho\r\n\r\n[](#hardthink) |
0 |
3 |
t1_hkjq6wn |
0 |
t3_qryjfm |
hkjrd4w |
2021-11-14 |
4 |
1 |
11 |
2021 |
that sounds like work tho hardthink |
6 |
0 |
We focus on the ‘anime’ subreddit and top-tier anime subreddits like Pokémon, One Piece, Naruto, Gundam, and Attack on Titan. The ‘anime’ subreddit has 8.6M members. This is a place where it is great for anime fans to share and post anime related information, and it is also a great place for us to obtain basic information about anime, like the latest trend and hot topics. Furthermore, some famous anime are specified to serve as detailed subreddits, providing more targeted information towards a certain anime.
The reddit data we have is basic data for a particular posting (submission or comment) within the community, such as date published, author, text content, posting status, etc. Additionally, in order to add a financial element into the analysis, the stock prices of some anime production companies are taken into account, offering a deeper understanding of the interplay between anime fandom engagement and sentiment with the market dynamics within the anime industry.
External Dataset - Stock Price
| 0 |
15.740000 |
100.070000 |
7.735437 |
2021-01-04 |
| 1 |
16.176001 |
103.110001 |
7.793598 |
2021-01-05 |
| 2 |
15.738000 |
101.080002 |
7.968082 |
2021-01-06 |
| 3 |
15.802000 |
102.000000 |
8.026241 |
2021-01-07 |
| 4 |
16.010000 |
103.989998 |
8.113484 |
2021-01-08 |
Our aim is to conduct an in-depth analytical study on the anime-related discussions through exploratory data analysis (EDA), natural language processing (NLP) and machine learning (ML) models. We seek insights that can help us better understand the attributes of the community. Learning why it can attract a lot of discussion will provide valuable information to understand and maintain its community prevalence. Additionally, insights gained can be beneficial for anime enthusiasts and stakeholders in the anime industry.
Initially, we start with an EDA phase with visualization and interpretation of the dataset to provide a preliminary understanding of the data. Following the EDA, we shift our focus to employ NLP and ML to explore the dataset further.
Leveraging EDA, NLP, and ML techniques, we will uncover insights across 10 key areas:
10 Topics
EDA
1 Popularity Trend
Business goal: Understand trends within the anime subreddit and top franchise subreddits, with a focus on assessing the popularity of anime. Identify spikes in interest, active discussions, and periods of decline over time.
Technical proposal: Use time-series analysis to track the total submissions and comments under anime subreddit, as well as top mentioned frequency of each franchise subreddit over time. Extract date and/or year-month information from the UTC date. Calculate daily and/or monthly totals of submissions and comments for all selected subreddits. Visualize the daily and monthly counts of submissions and comments over time. Analyze the data trends to gain an understanding of the patterns within these subreddits, and correlate them with external events like new season releases or movie adaptations etc to uncover the factors influencing spikes and declines along the way.
2 Discussion Patterns
Business goal: Compare discussion patterns across hours of the day and days of the week within and across the ‘anime’ subreddit and top franchise subreddits to assess the similarities and differences in discussion intensity regarding various franchises.
Technical proposal: In both comments and submissions datasets, extract hour and day-of-week data from the UTC timestamps. Group the data based on both hours of the day and day of the week, and aggregate the counts of user posts within each time range. Visualize the discussion volumes to compare and examine how the discussion patterns vary throughout the day and days of the week, both within and across the “anime” subreddit and top franchise subreddits.
3 User Engagement with Content metrics
Business goal: Understand the user engagement indicators and patterns based on content characteristics like length and quality in subreddits, and identify variations in user interactions.
Technical proposal: Clean text and remove stopwords. Implement data transformation to extract additional content characteristics such as word counts, the presence of specific keywords, and more. Compare various content metrics for both the submissions including scores, submission and comments counts, stickied post status, word counts, etc. Use visualizations to highlight the differences in user engagement patterns across these content characteristics.
NLP
4 Sentiment Analysis
Business goal: Analyze the sentiment expressed in user posts within anime subreddit to discern the overall tone of these submissions and comments, categorizing them as positive or negative.
Technical proposal: Tokenize and normalize user posts from both submissions and comments datasets in anime subreddit. Apply pre-trained sentiment analysis models through NLP techniques to classify submission titles, submission contents, and comments into positive and negative categories.
5 Analysis of Anime Themes in Subreddit Discussions
Business goal: Investigate various anime themes as they appear in subreddit discussions. Categorize these themes and analyze the conversations happening in different anime-related subreddits.
Technical proposal: Employ regular expressions (regex) to identify and categorize five distinct anime themes within subreddit texts. The text processing will focus on simplifying the language (lemmatization) and removing irrelevant words (stopwords). NLP tools will be used to assess the sentiments expressed in these texts. The method involves converting subreddit posts into simple word lists (bag-of-words) and using topic modeling to uncover the main themes or topics. The findings will be organized in a table, allowing for easy comparison. The analysis will use percentages to compare themes and sentiments across various anime subreddits, considering the different popularity levels of each franchise.
6 In-Depth Study of a Specific Anime Theme or Franchise - Relationships Between Reddit User Responses and Key Events
Business goal: Following the broader analysis in the previous topic, this part focuses on a single anime theme or franchise identified as particularly notable. The aim is to gain a deeper understanding of the trends, conversations, and sentiments within its subreddit community.
Technical proposal: Select one theme or franchise from the earlier analysis for a detailed study. The same NLP tool will be used for sentiment analysis to ensure consistency. A time series analysis of subreddit interactions, including both comments and posts, will be conducted. This analysis will look at patterns based on the time of day and week when these interactions occur. The goal is to go beyond just understanding the general sentiment, aiming to uncover how these sentiments vary over time and in different discussion contexts within the subreddit. Research will also be done on the web to check what key events happen during the sentiment peaks and valleys and to find potential relationships between the volume on Reddit and triggered events.
7 Comment’s Controversiality Classification
Business goal: Identify the level of controversiality in comments in anime subreddit.
Technical proposal: Perform text tokenization. Apply the NLP technique for text classification to analyze the content of anime subreddit. Classify comments into “Controversial” and “Non-Controversial” groups.
ML
8 Predicting Comments’ Controversiality
Business goal: Develop predictive models to estimate the controversiality of comments in r/anime subreddit, which aim to help identify comments likely to generate debates and discussions based on various content features.
Technical proposal: Train classification models using features selected for target variable controversiality. All the predictors include variables related to the sentiment of the published text, like sentiment category, and basic information about the text, such as the number of words, stickied status, etc. Evaluate the model’s performance using ROC curve and confusion matrix.
9 Pedicting Reddit Post Scores
Business goal: Establish forecast model to estimate the score of submissions in Anime subreddit, which help to identify the possible submission which may generate high engagement.
Technical proposal: Apply vectorizer, and other transformers to features selected. Log transforms the target variable score to avoid negative prediction. Create a regression model using features like word counts, number of comments, etc. Evaluate the model’s performance using metrics like mean absolute error, R square to determine the effectiveness of score forest based on feature selected.
10 Predicting Stock Prices Based on Franchise Popularity
Business goal: Forecast the stock prices of anime studios by analyzing the popularity trends of related franchise subreddits.
Technical proposal: Use metrics such as the number of comments and submissions on these subreddits to predict stock market fluctuations over time. Implement time series analysis techniques that capture both trends and seasonal variations in subreddit engagement metrics. Our analysis will include relevant features such as the time of day, day of the week, and historical popularity data. We will then integrate these factors with the external stock price data of corresponding anime studios. Our primary modeling approach involves utilizing linear regression and ARIMA models to establish a relationship between subreddit popularity metrics and stock prices.