Feedback Discussion

Project plans

Peer Feedback

The business goals do not clearly articulate the purpose and rationale for undertaking the project.

We added a section in the introduction regarding the project goals, which provides a clearer explanation of how the goals fit together. We also update the text of the individual goals to reflect their rationale better.

Additionally, the business goals include some technical terms, focusing more on how to address the question rather than fully connecting with what the audience can gain from it. The technical approach, on the other hand, is clearly directed towards a technical audience, employing technical language to explain the implementation.

We revised the business goals with this issue and removed the technical terms. We ensured that the technical terms were only included in the technical section.

EDA work

Peer Feedback

The data quality checks need to be explicitly mentioned on the website, ensuring basic checks like missing values, outliers, and data distributions are performed. Suggest including a section on data quality checks and potential issues identified.

This is included in the project, please see the following link.

For the plot in data cleaning, consider replacing “Validity” with the name of the subreddit for clarity.

The subreddits are included in the top of the plot. The “Validity” text was removed for better visualization.

Instructional Team Feedback

None of the business goals are really business goals. They are descriptions of the analyses you plan to do. What is desired is why you want to do these analysis and what you expect to get out of them.

We addressed this concern by revising our business goals to be less technical, as well as more explicit about what we expect to get out of them.

A bit more narrative needed, but the EDA plots are well done.

In response to this feedback, we added more narrative structure throughout our website’s EDA page.

NLP work

Peer Feedback

A brief discussion about the external dataset and its relevance to project goals would enhance the project plan’s completeness.

We added language to the section discussing how books from external sources will improve the RNN model to improve on this point.

Instructional Team Feedback

Figure 4 needs to be labeled as “men”, “women” or “male”, “female”. “m” and “f” is not acceptable.

This point was addressed by reformatting the plot with the appropriate naming convention.

The caption of Figure 5 does not appear to be aligned with the figure.

To address this, we modified this caption to ensure that it aligned with the COVID-19-related visualization and not the gender-related visualization above it.

What is the story of the last section? It’s quite unclear.

We provided more narrative flow to our NLP section in response to this feedback, allowing this section to more clearly present our goals.

ML work

Instructional Team Feedback

Your RF models basically predicts that all the data lies in two categories, on the whole. Comment, and if needed, justify why these models are still reasonable.

This is a very important note and we addressed it in two ways. First, we describe this particular downfall of the Random Forest model, explaining why it raises concern. Additionally, we build both a baseline model and a Random Forest built on a balanced version of our dataset in order to provide points of comparison for the original Random Forest model.

The last few sections need to be filled in with narrative to understand what you’re trying to do.

Similar to the previous sections, we added more narrative flow to the ML page, better tying in the analyses with our overall goals for the project.

Website/results

The inability to view code on the website is a significant issue (We saw that you sent the code and data file separately by email, but it is better to be available on the website). Ensure that all notebooks are accessible to the audience in the later submission, as transparency in the coding process is crucial.

We requested the instructional team to make our GitHub repo public. To keep the rendered pages clean, we prefer not to fill up the body of the website with code. In each section, we provided links to the code in GitHub, which is now publicly accessible.

Be cautious about the professor’s advice on using only 10,000 rows in plotting; it seems that most graphs and tables contain over 10,000 data points.

We verified that this is not the case in any of the plots.