Executive summary

In our analysis of the r/anime community, we discovered a balanced mix of positive and negative sentiments. Surprisingly, over half of the posts show a positive tone. This suggests a generally upbeat and harmonious atmosphere in the community. Interestingly, we found that the number of non-controversial comments far outweighs the number of controversial ones, highlighting the community’s preference for calm discussions despite their large size.

Our study also ventured into sentiment analysis over time for popular anime series like One Piece, Pokémon, and Naruto. We observed a declining trend in positive sentiment for some series, indicating changing interests or perceptions among fans. For Pokémon, despite high engagement, there was a notable amount of negative sentiment, suggesting complex fan relationships with the series.

Furthermore, our analysis of Pokémon subreddit sentiments over time revealed interesting patterns. We noticed more positive sentiments in comments than in submissions, suggesting a difference in how users express satisfaction or criticism. Our findings also highlighted certain events that caused significant shifts in sentiment, like the Pokémon Go event featuring Latias and Latios and the Eevee Day events, which influenced fan discussions and reactions.

Lastly, our examination of sentiment by day of the week and hour of the day for Pokémon-related discussions uncovered patterns linked to the community’s engagement rhythms. Weekdays generally saw more positive interactions, while weekends had more critical discussions. Similarly, positive sentiments were more frequent in the afternoon and early evening, while late evenings saw an increase in negative sentiments.

These insights into the anime community’s sentiment dynamics offer valuable understanding into the nature of fan engagement and the factors influencing their perceptions and discussions.

Analysis report

Basic Text Analysis

Table1. Submission Title Length
text_length count
0 <=05 28704
1 <=10 52709
2 <=15 18678
3 <=20 6262
4 <=25 2171
5 <=30 838
6 <=40 596
7 <=50 183
8 >50 106

Figure1. Submission Title Length Distribution
Table2. Submission Selftext Length
text_length count
0 <=10 7524
1 <=20 11460
2 <=30 12833
3 <=40 11887
4 <=50 9311
5 <=60 7986
6 >60 49246

Figure2. Submission Selftext Length Distribution
Table3. Comment Body Length
text_length count
0 <=010 2384465
1 <=030 2282496
2 <=050 881447
3 <=070 414939
4 <=100 365355
5 <=140 232981
6 <=200 182853
7 >200 134583

Figure3. Comment Body Length Distribution

Figure 1 illustrates the distribution of submission title length, categorized into nine groups. The distribution is skewed to the right, indicating that the majority of titles are less than or equal to 10 words. The title length between 5 and 10 words is the most common one. Titles with fewer than five words from the second common group. This pattern contrasts with submission selftext and comments, which is not surprising. Typically, users try to keep titles concise and to the point, opting to provide more detailed information in the selftext. Figure 2 illustrates the distribution of submission selftext length, categorized into seven groups. Unlike its title, most of the selftext is in more than 60 words. Since the title is short to the main idea, the self text user tends to write more to fully explain its submissions to other users. Figure 3 illustrates the distribution of comments body length, categorized into eight groups. The first two groups which represent less than or equal to 30 words are the most common length for comments. More than 30 words and less than or equal to 50 words length are also very common. Given that comments often serve as a platform for discussions related to submission topics, it is not surprising to observe varied word lengths. However, these comments tend to be somewhat short, fostering a back-and-forth dynamic.

The analysis of submission title, selftext, and comment body length distributions reveals distinct patterns. Submission titles predominantly consist of concise phrases, with the majority being less than or equal to 10 words, while selftext tends to be longer, exceeding 60 words in most cases. Comments, serving as discussion platforms, commonly fall within the 30 to 50 words range, fostering interactive and concise exchanges.

Submission Title

Figure4. Top 20 Words

Figure5. Top 20 Words via TFIDF

Figure 4 is the top 20 common words in the submission title. The most frequently used words in submission titles are closely associated with recommendations. Common words such as ‘like,’ ‘good,’ ‘rewatch,’ ‘recommendations,’ and ‘best’ suggest a general theme of users seeking or providing anime suggestions. The presence of words like ‘anime,’ ‘episode,’ ‘watch,’ and ‘season’ among the top 20 words reflect a strong connection to the anime content itself.

While the common words generated via TFIDF for submission titles in Figure 5 are different from the most common words, and are highly related to Japanese words and culture.

Following is part of the most common words:

  • Dango: a japanese cuisine.
  • Muri (Japanese): “muri” (無理) means “impossible” or “unreasonable.
  • Toaru (Japanese): Toaru Majutsu no Indekkusu (A Certain Magical Index), a popular light novel series.
  • Kougeki (Japanese): “Kougeki” (攻撃) means “attack”.
  • Zutto (Japanese): “zutto” (ずっと) means “always” or “forever”.

Submission Selftext

Figure6. Top 20 Words

Figure7. Top 20 Words via TFIDF

The Figure 6 is the 20 most common words from submission selftext. They are very similar to the submission titles. It is also highly related to recommendations with additional words like ‘please’.

Figure 7 is the common words generated via TFIDF for submission selftext. It also contains many Japanese words:

  • Kita (Japanese): “kita” (来た) means “came” or “arrived”.
  • Ryo (Japanese):
    • “ryo” (良) as a given name.
    • “ryō” (両) was a gold currency unit.
  • Gagumber (Japanese): a character in “Gagumber the Gale” (疾風のガガンバー)
  • Saori (Japanese): “Saori” as a Japanese given name.
  • What’s more the presence of adjective words like ‘irony’ and ‘situational,’ indicates users’ exploration of content and preferences.

Comment Body

Figure8. Top 20 Words

Figure9. Top 20 Words via TFIDF

In contrast, the 20 most comment body words in figure 8 exhibit a more generalized set of commonly used words except those words like ‘anime,’ ‘watch,’ and ‘episode’ that are also in submissions titles and selftext. This suggests that comment sections may serve as spaces for more broad, divergent and inclusive conversations about various anime-related topics than submissions. While the 20 common words generated via TFIDF for comment body in figure 9 contains some japanese word, and anime, but also notably is the words that are considered over 18 content.

The analysis of text data from submission titles, selftext, and comment bodies reveals distinct patterns. Submission titles predominantly focus on recommendations, with users frequently using words like ‘like,’ ‘good,’ and ‘rewatch.’ The selftext of submissions follows a similar trend but tends to be more detailed, providing additional context. Comment bodies, on the other hand, exhibit a broader and more inclusive vocabulary, suggesting that the comment sections foster diverse discussions about various anime-related topics.

External Data Analysis

Data source: yahoo finance

Table4. External Data
week total_submissions total_comments NTDOY SONY TYO
108 2023-01-22 707 48227.0 10.734 89.502000 11.827641
109 2023-01-29 640 37987.0 10.772 91.054001 11.796592
110 2023-02-05 634 52538.0 10.146 90.430002 12.403984
111 2023-02-12 626 50620.0 10.036 87.999998 12.794034
112 2023-02-19 598 51840.0 9.840 82.777500 13.246667
113 2023-02-26 524 47569.0 9.394 83.927998 13.407248
114 2023-03-05 248 39966.0 9.402 86.634000 13.191847
115 2023-03-12 249 39501.0 9.486 85.529999 12.037221
116 2023-03-19 267 44612.0 9.570 88.206000 11.796666
117 2023-03-26 159 24835.0 9.670 88.058000 11.973750

Figure10. External Data

As mentioned during the EDA phase for the external data, we selected stock prices of anime production companies such as the Nintendo ADR (NTDOY) for Pokémon and the Tokyo Stock Exchange (TYO) to represent the broader anime production industry. Notably, we decided to exclude TOEAF and NCBDF from our external stock prices dataset due to their status in the US stock market, and SONY as well due to its broader association with electronics rather than anime production.

The graph shows weekly total submissions and comments compared with the average adjusted stock prices of NTDOY and TYO from January 2021 to March 2023. The x-axis denotes time, and the secondary y-axis on the right side represents stock prices. Both total submissions and comments exhibit a decreasing trend, mirroring the trend observed in NTDOY stock prices. While TYO in green started to increase after March 2022. Since TYO is a more general stock index, it serves as a reference rather than a direct representation of anime production companies’ status in our case. Given the similarities in trends between Nintendo’s stock prices and anime subreddit activity, we aim to dig into the Pokemon subreddit under anime to gain valuable insights.

Note that We choose a weekly basis due to the nature of stock exchange trading days. In 2023, there are exactly 250 trading days, and 105 out of 365 days are weekend days when the stock exchanges are closed. So a weekly basis is a better aggregation unit for comparing stock price fluctuations with anime subreddit activity over time.

NLP

Text Cleaning Pipline

We Build a SparkNLP pipline to clean our text. This pipline include 6 stages in total as follow:

  1. DocumentAssembler(): Transforms raw texts to document annotation
  2. DocumentNormalizer(): Removes all dirty characters from text following a regex pattern and transforms
  3. Tokenizer(): Identifies tokens with tokenization open standards
  4. LemmatizerModel.pretrained(): Find lemmas out of words with the objective of returning a base dictionary word
  5. Stemmer(): Find stems out of words with the objective
  6. StopWordsCleaner(): Drops all the stop words from the input sequences

Sentiment Analysis and Controversiality for r/anime

Text Category Positive Negative
submission title 52.05% 47.95%
submission selftext 49.42% 50.58%
comment body 45.53% 54.47%

The table reveals there is no substantial discrepancy between the proportions of posts portraying positive and negative emotions within the r/anime community. Notably, over half of the submission titles conveyed positive sentiments, and the distribution between positive and negative sentiment in submission contents was nearly equivalent. However, a higher proportion of comments leaned towards negativity, accounting for approximately 54.47%.

This overall scenario suggests a generally harmonious atmosphere within this community. The prevalence of submissions expressing positive emotions indicates a tendency among individuals to initiate discussions with a positive tone rather than with negative sentiments.

Controversiality Count
Non-controversial 6768222
Controversial 110897

The count of non-controversial comments surpasses the count of controversial ones by more than sixty-fold. This is consistent with the fact as displayed in the sentiment analysis for r/anime, that the overall vibe of this community is calm and harmonious rather than extreme, despite the high number of people involved in this community. This could be attributed to the universality of the topic of anime to the general public, where people engage in anime mostly for hobbies and leisure, and rarely hold offensive views.

Sentiment Analysis for Top-tier anime subreddits

Sentiment over time by anime series

sentiment OnePiece Pokemon Naruto OnePunchMan YuGiOh OnePiece_Perc (%) Pokemon_Perc (%) Naruto_Perc (%) OnePunchMan_Perc (%) YuGiOh_Perc (%)
0 negative 12461 39728 12535 2726 1616 25.213978 37.590599 24.892269 18.128616 31.118814
1 neutral 3367 7497 3373 652 396 6.812893 7.093655 6.698175 4.335971 7.625650
2 positive 33593 58461 34449 11659 3181 67.973129 55.315747 68.409556 77.535413 61.255536

The time series graphs present an intriguing comparative sentiment analysis for five beloved anime franchises: One Piece, Pokémon, Naruto, One Punch Man, and YuGiOh. A discernible declining trend in positive sentiment over time is evident for One Piece, Naruto, and One Punch Man, hinting at a possible waning interest or reception among their fan bases. In contrast, Pokémon and YuGiOh exhibit pronounced fluctuations, with noticeable peaks and troughs suggesting episodic events triggering varied fan reactions. Notably, YuGiOh experienced a significant sentiment bifurcation in July 2022, likely indicating a polarizing event that split the fanbase into starkly positive and negative camps. Despite the general downtrend for three franchises, an upsurge in positive sentiment across all series in October 2022 could point to seasonal releases or events that traditionally energize the anime community, perhaps correlating with new content releases or other significant developments that tend to cluster around the autumn season. This uptick might also imply a rejuvenation of the audience demographic, perhaps through strategic initiatives aimed at broadening viewership or revitalizing interest in these storied series.

Sentiment counts and percentages across anime series

The grouped bar chart of sentiment counts across anime series reveals that Pokémon garners the highest volume of commentary within the anime subreddit, but when evaluating sentiment percentages, a different narrative emerges. Pokémon stands out for having a disproportionately high percentage of negative feedback relative to its comment volume, coupled with the lowest positive sentiment percentage. This juxtaposition of high engagement and lower relative satisfaction is quite striking, especially when visualized in the line and area plots, indicating a complexity in fan engagement that warrants further exploration. When considering the reasons behind the sentiment distribution for Pokémon, the high negative sentiment could be indicative of a vocal minority within a large fan base or it might reflect recent controversies or disappointments in the franchise. The line plot and area chart amplify this story, revealing that despite the breadth of its audience, there’s a nuanced undercurrent of dissatisfaction that could be tied to specific developments or decisions within the Pokémon series. To unpack these dynamics, a focused analysis of the Pokémon subreddit could offer insights into specific issues or events that have influenced fan sentiment.

Sentiment distribution across anime series

The sunburst chart illustrates the sentiment distribution across different anime series, with each series segmented into positive, neutral, and negative sentiments. This visual suggests that while Pokémon has a substantial share of sentiment, it is not overwhelmingly positive. In contrast, series like One Piece and Naruto appear to have a more balanced sentiment distribution, with significant portions of both positive and negative feedback. The chart helps to quickly identify which series are more polarizing versus those with a more balanced or positive sentiment profile. It also indicates the relative volume of discussion for each anime, as larger segments suggest more comments and a higher level of engagement within the community.