Introduction

The popularity of anime is described by a set of statistics. It is estimated that over a third of the world population watches anime and about 33% of American adults have a favorable impression of anime. Moreover, anime is the 3rd most in-demand subgenre worldwide with a 5.5% demand share. Thus, the fact that anime is such a popular form of film and television piqued our curiosity to try and gain a deeper understanding of it from the online community Reddit.

The anime subreddit on Reddit is a popular online community for anime lovers to communicate and share, often referred to as ‘Paradise.’ Members share ideas, recommend or look for anime, make jokes, express love, create memes about certain anime scenes or characters, and more. The dynamics of such a strong and popular community are what we aim to study and understand.

Our project explores anime fandom using massive data collected from January 1, 2021, to March 1, 2023 within Anime subreddit.

Preliminary Cleaned Submission Data Example

Preliminary Cleaned Submission Data
subreddit author author_flair_text created_utc title selftext num_comments num_crossposts over_18 score ... created_date created_hour created_week created_month created_year cleaned_title title_wordCount cleaned_selftext selftext_wordCount contain_pokemon
0 anime PsychologicalGift299 NaN 2021-04-19 20:42:46 anime movies for 4/20 so as my fellow ouid smokers know, tomorrow is... 12 0 0 0 ... 2021-04-19 20 2 4 2021 anime movies for 420 4 so as my fellow ouid smokers know tomorrow is ... 64 0
1 anime Tuttles4ever NaN 2021-04-19 20:48:42 i need a very specific type of anime but i'm a... are there any animes about a person who recent... 7 0 0 0 ... 2021-04-19 20 2 4 2021 i need a very specific type of anime but im al... 15 are there any animes about a person who recent... 42 0
2 anime nemifloras NaN 2021-04-19 20:52:42 any atmospheric animes like haibane renmei and... i finished reassisting haibane r i would like ... 9 0 0 0 ... 2021-04-19 20 2 4 2021 any atmospheric animes like haibane renmei and... 10 i finished reassisting haibane r i would like ... 18 0

3 rows × 22 columns

Preliminary Cleaned Comment Data Example

Preliminary Cleaned Comment Data
subreddit author author_flair_text created_utc body controversiality score parent_id stickied link_id id created_date created_hour created_week created_month created_year cleaned body_wordCount contain_pokemon
0 anime DonaldJenkins NaN 2021-11-14 04:39:47 i sent it to ya ;) 0 1 t1_hk0whi9 0 t3_ov07rq hkjr7uj 2021-11-14 4 1 11 2021 i sent it to ya 6 0
1 anime DonMo999 :MAL:https://myanimelist.net/profile/Weltweiser 2021-11-14 04:40:25 displate has some neat anime style good qualit... 0 1 t3_qtgc12 0 t3_qtgc12 hkjralc 2021-11-14 4 1 11 2021 displate has some neat anime style good qualit... 16 0
2 anime OrangeBanana38 :AMQ::STAR::AL:https://anilist.co/user/OrangeB... 2021-11-14 04:41:01 that sounds like work tho\r\n\r\n[](#hardthink) 0 3 t1_hkjq6wn 0 t3_qryjfm hkjrd4w 2021-11-14 4 1 11 2021 that sounds like work tho hardthink 6 0

We focus on the ‘anime’ subreddit and top-tier anime subreddits like Pokémon, One Piece, Naruto, Gundam, and Attack on Titan. The ‘anime’ subreddit has 8.6M members. This is a place where it is great for anime fans to share and post anime related information, and it is also a great place for us to obtain basic information about anime, like the latest trend and hot topics. Furthermore, some famous anime are specified to serve as detailed subreddits, providing more targeted information towards a certain anime.

The reddit data we have is basic data for a particular posting (submission or comment) within the community, such as date published, author, text content, posting status, etc. Additionally, in order to add a financial element into the analysis, the stock prices of some anime production companies are taken into account, offering a deeper understanding of the interplay between anime fandom engagement and sentiment with the market dynamics within the anime industry.

External Dataset - Stock Price
NTDOY SONY TYO Date
0 15.740000 100.070000 7.735437 2021-01-04
1 16.176001 103.110001 7.793598 2021-01-05
2 15.738000 101.080002 7.968082 2021-01-06
3 15.802000 102.000000 8.026241 2021-01-07
4 16.010000 103.989998 8.113484 2021-01-08

Our aim is to conduct an in-depth analytical study on the anime-related discussions through exploratory data analysis (EDA), natural language processing (NLP) and machine learning (ML) models. We seek insights that can help us better understand the attributes of the community. Learning why it can attract a lot of discussion will provide valuable information to understand and maintain its community prevalence. Additionally, insights gained can be beneficial for anime enthusiasts and stakeholders in the anime industry.

Initially, we start with an EDA phase with visualization and interpretation of the dataset to provide a preliminary understanding of the data. Following the EDA, we shift our focus to employ NLP and ML to explore the dataset further.

Leveraging EDA, NLP, and ML techniques, we will uncover insights across 10 key areas:

10 Topics

EDA

1 Popularity Trend

Business goal: Understand trends within the anime subreddit and top franchise subreddits, with a focus on assessing the popularity of anime. Identify spikes in interest, active discussions, and periods of decline over time.

Technical proposal: Use time-series analysis to track the total submissions and comments under anime subreddit, as well as top mentioned frequency of each franchise subreddit over time. Extract date and/or year-month information from the UTC date. Calculate daily and/or monthly totals of submissions and comments for all selected subreddits. Visualize the daily and monthly counts of submissions and comments over time. Analyze the data trends to gain an understanding of the patterns within these subreddits, and correlate them with external events like new season releases or movie adaptations etc to uncover the factors influencing spikes and declines along the way.

2 Discussion Patterns

Business goal: Compare discussion patterns across hours of the day and days of the week within and across the ‘anime’ subreddit and top franchise subreddits to assess the similarities and differences in discussion intensity regarding various franchises.

Technical proposal: In both comments and submissions datasets, extract hour and day-of-week data from the UTC timestamps. Group the data based on both hours of the day and day of the week, and aggregate the counts of user posts within each time range. Visualize the discussion volumes to compare and examine how the discussion patterns vary throughout the day and days of the week, both within and across the “anime” subreddit and top franchise subreddits.

3 User Engagement with Content metrics

Business goal: Understand the user engagement indicators and patterns based on content characteristics like length and quality in subreddits, and identify variations in user interactions.

Technical proposal: Clean text and remove stopwords. Implement data transformation to extract additional content characteristics such as word counts, the presence of specific keywords, and more. Compare various content metrics for both the submissions including scores, submission and comments counts, stickied post status, word counts, etc. Use visualizations to highlight the differences in user engagement patterns across these content characteristics.

NLP

4 Sentiment Analysis

Business goal: Analyze the sentiment expressed in user posts within anime subreddit to discern the overall tone of these submissions and comments, categorizing them as positive or negative.

Technical proposal: Tokenize and normalize user posts from both submissions and comments datasets in anime subreddit. Apply pre-trained sentiment analysis models through NLP techniques to classify submission titles, submission contents, and comments into positive and negative categories.

5 Analysis of Anime Themes in Subreddit Discussions

Business goal: Investigate various anime themes as they appear in subreddit discussions. Categorize these themes and analyze the conversations happening in different anime-related subreddits.

Technical proposal: Employ regular expressions (regex) to identify and categorize five distinct anime themes within subreddit texts. The text processing will focus on simplifying the language (lemmatization) and removing irrelevant words (stopwords). NLP tools will be used to assess the sentiments expressed in these texts. The method involves converting subreddit posts into simple word lists (bag-of-words) and using topic modeling to uncover the main themes or topics. The findings will be organized in a table, allowing for easy comparison. The analysis will use percentages to compare themes and sentiments across various anime subreddits, considering the different popularity levels of each franchise.

6 In-Depth Study of a Specific Anime Theme or Franchise - Relationships Between Reddit User Responses and Key Events

Business goal: Following the broader analysis in the previous topic, this part focuses on a single anime theme or franchise identified as particularly notable. The aim is to gain a deeper understanding of the trends, conversations, and sentiments within its subreddit community.

Technical proposal: Select one theme or franchise from the earlier analysis for a detailed study. The same NLP tool will be used for sentiment analysis to ensure consistency. A time series analysis of subreddit interactions, including both comments and posts, will be conducted. This analysis will look at patterns based on the time of day and week when these interactions occur. The goal is to go beyond just understanding the general sentiment, aiming to uncover how these sentiments vary over time and in different discussion contexts within the subreddit. Research will also be done on the web to check what key events happen during the sentiment peaks and valleys and to find potential relationships between the volume on Reddit and triggered events.

7 Comment’s Controversiality Classification

Business goal: Identify the level of controversiality in comments in anime subreddit.

Technical proposal: Perform text tokenization. Apply the NLP technique for text classification to analyze the content of anime subreddit. Classify comments into “Controversial” and “Non-Controversial” groups.

ML

8 Predicting Comments’ Controversiality

Business goal: Develop predictive models to estimate the controversiality of comments in r/anime subreddit, which aim to help identify comments likely to generate debates and discussions based on various content features.

Technical proposal: Train classification models using features selected for target variable controversiality. All the predictors include variables related to the sentiment of the published text, like sentiment category, and basic information about the text, such as the number of words, stickied status, etc. Evaluate the model’s performance using ROC curve and confusion matrix.

9 Pedicting Reddit Post Scores

Business goal: Establish forecast model to estimate the score of submissions in Anime subreddit, which help to identify the possible submission which may generate high engagement.

Technical proposal: Apply vectorizer, and other transformers to features selected. Log transforms the target variable score to avoid negative prediction. Create a regression model using features like word counts, number of comments, etc. Evaluate the model’s performance using metrics like mean absolute error, R square to determine the effectiveness of score forest based on feature selected.

10 Predicting Stock Prices Based on Franchise Popularity

Business goal: Forecast the stock prices of anime studios by analyzing the popularity trends of related franchise subreddits.

Technical proposal: Use metrics such as the number of comments and submissions on these subreddits to predict stock market fluctuations over time. Implement time series analysis techniques that capture both trends and seasonal variations in subreddit engagement metrics. Our analysis will include relevant features such as the time of day, day of the week, and historical popularity data. We will then integrate these factors with the external stock price data of corresponding anime studios. Our primary modeling approach involves utilizing linear regression and ARIMA models to establish a relationship between subreddit popularity metrics and stock prices.