Discussion

Modifications based on feedbacks

🌟 Goal 1 was relatively simple.

βœ… As the initial goal of our EDA, we aim to establish a broad overview, allowing for deeper investigation at later steps. So, we keep the topic 1 goal simple, but add more process to the technical proposal.


🌟 Goal 2 depth could have been a little bit better.

βœ… Adding analysis angles to goal 2.


🌟 Scatter plots for topic 3 are washed out due to too many dots.

βœ… The intention of using scatter plots for topic 3 is to get an general idea of the correlation between two variables. Due to no obvious correlation between the variables and a huge amount of data, at the very beginning the distribution of all points looks like a L-shape. Since we believe that the most intuitive graph to show the correlation between two variables is a scatterplot, only x and y axis are log scaled so that the graphs did not look overly disproportionate.


🌟 Goals 5 and 6 are repeats in the prose.

βœ… We have adjusted goals 5 and 6 to diversify their content. Goal 5 primarily focuses on identifying regular patterns in sentiment percentages across different top anime franchises. Goal 6 concentrates on diving into the subreddit of an unusual franchise to discover each peak in sentiment, the boom in negative or positive reactions, and the related events that might have triggered these changes.


🌟 Mentioned IQR but did not use a box plot to represent the distribution, which would have been better.

βœ… Box plots are suitable for visualizing the IQR, which is why we used β€˜sns.boxplot’ to draw our plots - even the previous one. Initially, our plots with individual data points didn’t clearly show the IQR. To solve this, we switched from plots with dots to standard boxes, which provide a clearer visualization of the IQR.


🌟 Goal 8 should include extracted text content features.

βœ… Goal 8 utilized the new feature that derived from the original text data. The result of sentiment analysis is applied into the predictive models.


🌟 Unavailable code on website.

βœ… All code are stored as HTML files. They can be found under tab Source Code.


🌟 Lack description about data check and cleaning process.

βœ… Data check and data cleaning process are added into page β€˜EDA’.


🌟 Website font size is a bit small, and graphs lack interactivity.

βœ… The website font size is adjusted to a bit larger throught style.css file. Most of the graphs are displayed via plotly currently giving better interactivity to get detailed information by hovering the mouse pointer.


🌟 The metadata for the page title in the browser isn’t configured properly.

βœ… Thank you for pointing out. Solved.


🌟 In page NLP, some graphs that have the <figure> printout.

βœ… Redundant printout are removed.


🌟 In page NLP, don’t need many different types of the same chart.

βœ… Choose the one that conveys information most clear and intuitive.


🌟 In page ML, don’t need all the decimals and will help with the table width.

βœ… All decimals in these tables that are too wide to view are rounded to 1 place.

Other Changes

βœ… Goal 9 changed: To better serve the project goal with various models and approaches, instead of coverting scores to categorical label, predict it as continuous number with regression model. Change the approach accordingly.